Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb. 100 API requests are now (and forever) offered to every user every month, no matter the pricing tier.

Last week, I talked about an AutoML method for regression and classification implemented in Python package nnetsauce. This week, my post is about the same AutoML method, applied this time to multivariate time series (MTS) forecasting.

In the examples below, keep in mind that VAR (Vector Autoregression) and VECM (Vector Error Correction Model) forecasting models aren’t thoroughly trained. nnetsauce.MTS isn’t really tuned either; this is just a demo. To finish, a probabilistic error metric (instead of the Root Mean Squared Error, RMSE) is better suited for models capturing forecasting uncertainty.

Contents

1 - Install
2 - MTS
2 - 1 nnetsauce.MTS
2 - 2 statsmodels VAR
2 - 3 statsmodels VECM

1 - Install

!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict

import nnetsauce as ns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
from statsmodels.tsa.base.datetools import dates_from_str
from sklearn.linear_model import LassoCV
from statsmodels.tsa.api import VAR
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.vector_ar.vecm import VECM, select_order
from statsmodels.tsa.base.datetools import dates_from_str

2 - MTS

Macro data

# some example data
mdata = sm.datasets.macrodata.load_pandas().data

# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)

quarterly = dates["year"] + "Q" + dates["quarter"]

quarterly = dates_from_str(quarterly)

mdata = mdata[['realgovt', 'tbilrate']]

mdata.index = pd.DatetimeIndex(quarterly)

data = np.log(mdata).diff().dropna()

display(data)

df = data

df.index.rename('date')

idx_train = int(df.shape[0]*0.8)
idx_end = df.shape[0]
df_train = df.iloc[0:idx_train,]
df_test = df.iloc[idx_train:idx_end,]

regr_mts = ns.LazyMTS(verbose=1, ignore_warnings=True, custom_metric=None,
                      lags = 1, n_hidden_features=3, n_clusters=0, random_state=1)
models, predictions = regr_mts.fit(df_train, df_test)
model_dictionary = regr_mts.provide_models(df_train, df_test)

display(models)

	RMSE	MAE	MPL	Time Taken
Model
LassoCV	0.22	0.12	0.06	0.20
ElasticNetCV	0.22	0.12	0.06	0.19
LassoLarsCV	0.22	0.12	0.06	0.08
LarsCV	0.22	0.12	0.06	0.08
DummyRegressor	0.22	0.12	0.06	0.06
ElasticNet	0.22	0.12	0.06	0.07
LassoLars	0.22	0.12	0.06	0.06
Lasso	0.22	0.12	0.06	0.07
ExtraTreeRegressor	0.22	0.14	0.07	0.12
KNeighborsRegressor	0.22	0.12	0.06	0.09
SVR	0.22	0.12	0.06	0.13
HistGradientBoostingRegressor	0.23	0.13	0.06	0.79
NuSVR	0.23	0.13	0.06	0.20
ExtraTreesRegressor	0.24	0.13	0.07	0.87
GradientBoostingRegressor	0.24	0.13	0.07	0.25
RandomForestRegressor	0.26	0.16	0.08	2.06
AdaBoostRegressor	0.28	0.19	0.10	0.45
DecisionTreeRegressor	0.28	0.18	0.09	0.06
BaggingRegressor	0.28	0.19	0.10	0.20
GaussianProcessRegressor	8.26	5.90	2.95	0.17
BayesianRidge	11774168792.68	3129885640.50	1564942820.25	0.08
TweedieRegressor	1066305878860.67	263521546472.00	131760773236.00	0.12
LassoLarsIC	10841414830181.57	2665022282527.50	1332511141263.75	0.08
PassiveAggressiveRegressor	200205325611502239744.00	40689888595970097152.00	20344944297985048576.00	0.17
SGDRegressor	1383750703550277812748288.00	269310062772019343130624.00	134655031386009671565312.00	0.13
LinearSVR	6205416599219790202011648.00	1189414936788171753521152.00	594707468394085876760576.00	0.06
OrthogonalMatchingPursuitCV	18588484112627753604349952.00	3542235944300533382119424.00	1771117972150266691059712.00	0.23
OrthogonalMatchingPursuit	18588484112627753604349952.00	3542235944300533382119424.00	1771117972150266691059712.00	0.20
HuberRegressor	50554040814422644093913571262464.00	9061839427591544042390898606080.00	4530919713795772021195449303040.00	0.09
RidgeCV	1788858960353426286932811384356864.00	317940467527547291488891451736064.00	158970233763773645744445725868032.00	0.23
RANSACRegressor	352805899757804849079011831705501696.00	61914238966205227684888230708117504.00	30957119483102613842444115354058752.00	1.44
LinearRegression	13408548756595947978849418193194188800.00	2316276205868561893698967459810246656.00	1158138102934280946849483729905123328.00	0.06
TransformedTargetRegressor	13408548756595947978849418193194188800.00	2316276205868561893698967459810246656.00	1158138102934280946849483729905123328.00	0.11
Lars	13408548756596845228481163425784791040.00	2316276205868715960905471081985343488.00	1158138102934357980452735540992671744.00	0.08
Ridge	27935786184657480745080678989281886208.00	4824713257018197525713060327109689344.00	2412356628509098762856530163554844672.00	0.12
KernelRidge	27935786184685139645570846501298503680.00	4824713257022931107816326787730767872.00	2412356628511465553908163393865383936.00	0.09
MLPRegressor	64247413650209509837810706524366567768365621314...	10088348458681313437051396009759695398571807517...	50441742293406567185256980048798476992859037587...	0.42

model_dictionary['LassoCV']

MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

2 - 1 - `nnetsauce.MTS`

regr = ns.MTS(obj = LassoCV(random_state=1),
              lags = 1, n_hidden_features=3,
              n_clusters=0, replications = 250,
              kernel = "gaussian", verbose = 1)

regr.fit(df_train)

 Adjusting LassoCV to multivariate time series... 
 
100%|██████████| 2/2 [00:00<00:00,  6.22it/s]

 Simulate residuals using gaussian kernel... 

 Best parameters for gaussian kernel: {'bandwidth': 0.04037017258596558}

MTS(kernel='gaussian', n_clusters=0, n_hidden_features=3,
    obj=LassoCV(random_state=1), replications=250, verbose=1)

res = regr.predict(h=df_test.shape[0], level=95)

100%|██████████| 250/250 [00:00<00:00, 3686.16it/s]
100%|██████████| 250/250 [00:00<00:00, 6971.82it/s]

regr.plot("realgovt")
regr.plot("tbilrate")

image-title-here

2 - 2 - VAR

model = VAR(df_train)
results = model.fit(maxlags=5, ic='aic')
lag_order = results.k_ar
VAR_preds = results.forecast(df_train.values[-lag_order:], df_test.shape[0])

results.plot_forecast(steps = df_test.shape[0]);

image-title-here

2 - 3 - VECM

model = VECM(df_train, k_ar_diff=2, coint_rank=2)
vecm_res = model.fit()
vecm_res.gamma.round(4)
vecm_res.summary()
vecm_res.predict(steps=df_test.shape[0])
forecast, lower, upper = vecm_res.predict(df_test.shape[0], 0.05)

vecm_res.plot_forecast(steps = df_test.shape[0])

image-title-here

out-of-sample errors

display([("nnetsauce.MTS+"+models.index[i], models["RMSE"].iloc[i]) for i in range(3)])
display(('VAR', mean_squared_error(df_test.values, VAR_preds, squared=False)))
display(('VECM', mean_squared_error(df_test.values, forecast, squared=False)))

[('nnetsauce.MTS+LassoCV', 0.22102547609924011),
 ('nnetsauce.MTS+ElasticNetCV', 0.22103106562991648),
 ('nnetsauce.MTS+LassoLarsCV', 0.22103468506703655)]
('VAR', 0.22128770514262763)
('VECM', 0.22170093788693065)

AutoML in nnetsauce (randomized and quasi-randomized nnetworks) Pt.2: multivariate time series forecasting

1 - Install

2 - MTS

2 - 1 - nnetsauce.MTS

2 - 2 - VAR

2 - 3 - VECM

2 - 1 - `nnetsauce.MTS`