Last week, I talked about an AutoML method for regression and classification implemented in Python package nnetsauce. This week, my post is about the same AutoML method, applied this time to multivariate time series (MTS) forecasting.

In the examples below, keep in mind that VAR (Vector Autoregression) and VECM (Vector Error Correction Model) forecasting models aren’t thoroughly trained. nnetsauce.MTS isn’t really tuned either; this is just a demo. To finish, a probabilistic error metric (instead of the Root Mean Squared Error, RMSE) is better suited for models capturing forecasting uncertainty.

Contents

  • 1 - Install
  • 2 - MTS
  • 2 - 1 nnetsauce.MTS
  • 2 - 2 statsmodels VAR
  • 2 - 3 statsmodels VECM

1 - Install

!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict
import nnetsauce as ns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
from statsmodels.tsa.base.datetools import dates_from_str
from sklearn.linear_model import LassoCV
from statsmodels.tsa.api import VAR
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.vector_ar.vecm import VECM, select_order
from statsmodels.tsa.base.datetools import dates_from_str

2 - MTS

Macro data

# some example data
mdata = sm.datasets.macrodata.load_pandas().data

# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)

quarterly = dates["year"] + "Q" + dates["quarter"]

quarterly = dates_from_str(quarterly)

mdata = mdata[['realgovt', 'tbilrate']]

mdata.index = pd.DatetimeIndex(quarterly)

data = np.log(mdata).diff().dropna()

display(data)
df = data

df.index.rename('date')

idx_train = int(df.shape[0]*0.8)
idx_end = df.shape[0]
df_train = df.iloc[0:idx_train,]
df_test = df.iloc[idx_train:idx_end,]

regr_mts = ns.LazyMTS(verbose=1, ignore_warnings=True, custom_metric=None,
                      lags = 1, n_hidden_features=3, n_clusters=0, random_state=1)
models, predictions = regr_mts.fit(df_train, df_test)
model_dictionary = regr_mts.provide_models(df_train, df_test)
display(models)
RMSE MAE MPL Time Taken
Model
LassoCV 0.22 0.12 0.06 0.20
ElasticNetCV 0.22 0.12 0.06 0.19
LassoLarsCV 0.22 0.12 0.06 0.08
LarsCV 0.22 0.12 0.06 0.08
DummyRegressor 0.22 0.12 0.06 0.06
ElasticNet 0.22 0.12 0.06 0.07
LassoLars 0.22 0.12 0.06 0.06
Lasso 0.22 0.12 0.06 0.07
ExtraTreeRegressor 0.22 0.14 0.07 0.12
KNeighborsRegressor 0.22 0.12 0.06 0.09
SVR 0.22 0.12 0.06 0.13
HistGradientBoostingRegressor 0.23 0.13 0.06 0.79
NuSVR 0.23 0.13 0.06 0.20
ExtraTreesRegressor 0.24 0.13 0.07 0.87
GradientBoostingRegressor 0.24 0.13 0.07 0.25
RandomForestRegressor 0.26 0.16 0.08 2.06
AdaBoostRegressor 0.28 0.19 0.10 0.45
DecisionTreeRegressor 0.28 0.18 0.09 0.06
BaggingRegressor 0.28 0.19 0.10 0.20
GaussianProcessRegressor 8.26 5.90 2.95 0.17
BayesianRidge 11774168792.68 3129885640.50 1564942820.25 0.08
TweedieRegressor 1066305878860.67 263521546472.00 131760773236.00 0.12
LassoLarsIC 10841414830181.57 2665022282527.50 1332511141263.75 0.08
PassiveAggressiveRegressor 200205325611502239744.00 40689888595970097152.00 20344944297985048576.00 0.17
SGDRegressor 1383750703550277812748288.00 269310062772019343130624.00 134655031386009671565312.00 0.13
LinearSVR 6205416599219790202011648.00 1189414936788171753521152.00 594707468394085876760576.00 0.06
OrthogonalMatchingPursuitCV 18588484112627753604349952.00 3542235944300533382119424.00 1771117972150266691059712.00 0.23
OrthogonalMatchingPursuit 18588484112627753604349952.00 3542235944300533382119424.00 1771117972150266691059712.00 0.20
HuberRegressor 50554040814422644093913571262464.00 9061839427591544042390898606080.00 4530919713795772021195449303040.00 0.09
RidgeCV 1788858960353426286932811384356864.00 317940467527547291488891451736064.00 158970233763773645744445725868032.00 0.23
RANSACRegressor 352805899757804849079011831705501696.00 61914238966205227684888230708117504.00 30957119483102613842444115354058752.00 1.44
LinearRegression 13408548756595947978849418193194188800.00 2316276205868561893698967459810246656.00 1158138102934280946849483729905123328.00 0.06
TransformedTargetRegressor 13408548756595947978849418193194188800.00 2316276205868561893698967459810246656.00 1158138102934280946849483729905123328.00 0.11
Lars 13408548756596845228481163425784791040.00 2316276205868715960905471081985343488.00 1158138102934357980452735540992671744.00 0.08
Ridge 27935786184657480745080678989281886208.00 4824713257018197525713060327109689344.00 2412356628509098762856530163554844672.00 0.12
KernelRidge 27935786184685139645570846501298503680.00 4824713257022931107816326787730767872.00 2412356628511465553908163393865383936.00 0.09
MLPRegressor 64247413650209509837810706524366567768365621314... 10088348458681313437051396009759695398571807517... 50441742293406567185256980048798476992859037587... 0.42
model_dictionary['LassoCV']
MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

2 - 1 - nnetsauce.MTS

regr = ns.MTS(obj = LassoCV(random_state=1),
              lags = 1, n_hidden_features=3,
              n_clusters=0, replications = 250,
              kernel = "gaussian", verbose = 1)
regr.fit(df_train)
 Adjusting LassoCV to multivariate time series... 
 


100%|██████████| 2/2 [00:00<00:00,  6.22it/s]



 Simulate residuals using gaussian kernel... 


 Best parameters for gaussian kernel: {'bandwidth': 0.04037017258596558} 
MTS(kernel='gaussian', n_clusters=0, n_hidden_features=3,
    obj=LassoCV(random_state=1), replications=250, verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
res = regr.predict(h=df_test.shape[0], level=95)
100%|██████████| 250/250 [00:00<00:00, 3686.16it/s]
100%|██████████| 250/250 [00:00<00:00, 6971.82it/s]
regr.plot("realgovt")
regr.plot("tbilrate")

image-title-here

image-title-here

2 - 2 - VAR

model = VAR(df_train)
results = model.fit(maxlags=5, ic='aic')
lag_order = results.k_ar
VAR_preds = results.forecast(df_train.values[-lag_order:], df_test.shape[0])
results.plot_forecast(steps = df_test.shape[0]);

image-title-here

2 - 3 - VECM

model = VECM(df_train, k_ar_diff=2, coint_rank=2)
vecm_res = model.fit()
vecm_res.gamma.round(4)
vecm_res.summary()
vecm_res.predict(steps=df_test.shape[0])
forecast, lower, upper = vecm_res.predict(df_test.shape[0], 0.05)
vecm_res.plot_forecast(steps = df_test.shape[0])

image-title-here

out-of-sample errors

display([("nnetsauce.MTS+"+models.index[i], models["RMSE"].iloc[i]) for i in range(3)])
display(('VAR', mean_squared_error(df_test.values, VAR_preds, squared=False)))
display(('VECM', mean_squared_error(df_test.values, forecast, squared=False)))
[('nnetsauce.MTS+LassoCV', 0.22102547609924011),
 ('nnetsauce.MTS+ElasticNetCV', 0.22103106562991648),
 ('nnetsauce.MTS+LassoLarsCV', 0.22103468506703655)]
('VAR', 0.22128770514262763)
('VECM', 0.22170093788693065)

Comments powered by Talkyard.