Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization
Last week, I talked about an AutoML method for regression and classification implemented in Python package nnetsauce
. This week, my post is about the same AutoML method, applied this time to multivariate time series (MTS) forecasting.
In the examples below, keep in mind that VAR (Vector Autoregression) and VECM (Vector Error Correction Model) forecasting models aren’t thoroughly trained. nnetsauce.MTS
isn’t really tuned either; this is just a demo. To finish, a probabilistic error metric (instead of the Root Mean Squared Error, RMSE) is better suited for models capturing forecasting uncertainty.
Contents
- 1 - Install
- 2 - MTS
- 2 - 1 nnetsauce.MTS
- 2 - 2 statsmodels VAR
- 2 - 3 statsmodels VECM
1 - Install
!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict
import nnetsauce as ns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
from statsmodels.tsa.base.datetools import dates_from_str
from sklearn.linear_model import LassoCV
from statsmodels.tsa.api import VAR
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.vector_ar.vecm import VECM, select_order
from statsmodels.tsa.base.datetools import dates_from_str
2 - MTS
Macro data
# some example data
mdata = sm.datasets.macrodata.load_pandas().data
# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
quarterly = dates_from_str(quarterly)
mdata = mdata[['realgovt', 'tbilrate']]
mdata.index = pd.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()
display(data)
df = data
df.index.rename('date')
idx_train = int(df.shape[0]*0.8)
idx_end = df.shape[0]
df_train = df.iloc[0:idx_train,]
df_test = df.iloc[idx_train:idx_end,]
regr_mts = ns.LazyMTS(verbose=1, ignore_warnings=True, custom_metric=None,
lags = 1, n_hidden_features=3, n_clusters=0, random_state=1)
models, predictions = regr_mts.fit(df_train, df_test)
model_dictionary = regr_mts.provide_models(df_train, df_test)
display(models)
RMSE | MAE | MPL | Time Taken | |
---|---|---|---|---|
Model | ||||
LassoCV | 0.22 | 0.12 | 0.06 | 0.20 |
ElasticNetCV | 0.22 | 0.12 | 0.06 | 0.19 |
LassoLarsCV | 0.22 | 0.12 | 0.06 | 0.08 |
LarsCV | 0.22 | 0.12 | 0.06 | 0.08 |
DummyRegressor | 0.22 | 0.12 | 0.06 | 0.06 |
ElasticNet | 0.22 | 0.12 | 0.06 | 0.07 |
LassoLars | 0.22 | 0.12 | 0.06 | 0.06 |
Lasso | 0.22 | 0.12 | 0.06 | 0.07 |
ExtraTreeRegressor | 0.22 | 0.14 | 0.07 | 0.12 |
KNeighborsRegressor | 0.22 | 0.12 | 0.06 | 0.09 |
SVR | 0.22 | 0.12 | 0.06 | 0.13 |
HistGradientBoostingRegressor | 0.23 | 0.13 | 0.06 | 0.79 |
NuSVR | 0.23 | 0.13 | 0.06 | 0.20 |
ExtraTreesRegressor | 0.24 | 0.13 | 0.07 | 0.87 |
GradientBoostingRegressor | 0.24 | 0.13 | 0.07 | 0.25 |
RandomForestRegressor | 0.26 | 0.16 | 0.08 | 2.06 |
AdaBoostRegressor | 0.28 | 0.19 | 0.10 | 0.45 |
DecisionTreeRegressor | 0.28 | 0.18 | 0.09 | 0.06 |
BaggingRegressor | 0.28 | 0.19 | 0.10 | 0.20 |
GaussianProcessRegressor | 8.26 | 5.90 | 2.95 | 0.17 |
BayesianRidge | 11774168792.68 | 3129885640.50 | 1564942820.25 | 0.08 |
TweedieRegressor | 1066305878860.67 | 263521546472.00 | 131760773236.00 | 0.12 |
LassoLarsIC | 10841414830181.57 | 2665022282527.50 | 1332511141263.75 | 0.08 |
PassiveAggressiveRegressor | 200205325611502239744.00 | 40689888595970097152.00 | 20344944297985048576.00 | 0.17 |
SGDRegressor | 1383750703550277812748288.00 | 269310062772019343130624.00 | 134655031386009671565312.00 | 0.13 |
LinearSVR | 6205416599219790202011648.00 | 1189414936788171753521152.00 | 594707468394085876760576.00 | 0.06 |
OrthogonalMatchingPursuitCV | 18588484112627753604349952.00 | 3542235944300533382119424.00 | 1771117972150266691059712.00 | 0.23 |
OrthogonalMatchingPursuit | 18588484112627753604349952.00 | 3542235944300533382119424.00 | 1771117972150266691059712.00 | 0.20 |
HuberRegressor | 50554040814422644093913571262464.00 | 9061839427591544042390898606080.00 | 4530919713795772021195449303040.00 | 0.09 |
RidgeCV | 1788858960353426286932811384356864.00 | 317940467527547291488891451736064.00 | 158970233763773645744445725868032.00 | 0.23 |
RANSACRegressor | 352805899757804849079011831705501696.00 | 61914238966205227684888230708117504.00 | 30957119483102613842444115354058752.00 | 1.44 |
LinearRegression | 13408548756595947978849418193194188800.00 | 2316276205868561893698967459810246656.00 | 1158138102934280946849483729905123328.00 | 0.06 |
TransformedTargetRegressor | 13408548756595947978849418193194188800.00 | 2316276205868561893698967459810246656.00 | 1158138102934280946849483729905123328.00 | 0.11 |
Lars | 13408548756596845228481163425784791040.00 | 2316276205868715960905471081985343488.00 | 1158138102934357980452735540992671744.00 | 0.08 |
Ridge | 27935786184657480745080678989281886208.00 | 4824713257018197525713060327109689344.00 | 2412356628509098762856530163554844672.00 | 0.12 |
KernelRidge | 27935786184685139645570846501298503680.00 | 4824713257022931107816326787730767872.00 | 2412356628511465553908163393865383936.00 | 0.09 |
MLPRegressor | 64247413650209509837810706524366567768365621314... | 10088348458681313437051396009759695398571807517... | 50441742293406567185256980048798476992859037587... | 0.42 |
model_dictionary['LassoCV']
MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')
LassoCV(random_state=1)
LassoCV(random_state=1)
2 - 1 - nnetsauce.MTS
regr = ns.MTS(obj = LassoCV(random_state=1),
lags = 1, n_hidden_features=3,
n_clusters=0, replications = 250,
kernel = "gaussian", verbose = 1)
regr.fit(df_train)
Adjusting LassoCV to multivariate time series...
100%|██████████| 2/2 [00:00<00:00, 6.22it/s]
Simulate residuals using gaussian kernel...
Best parameters for gaussian kernel: {'bandwidth': 0.04037017258596558}
MTS(kernel='gaussian', n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), replications=250, verbose=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MTS(kernel='gaussian', n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), replications=250, verbose=1)
LassoCV(random_state=1)
LassoCV(random_state=1)
res = regr.predict(h=df_test.shape[0], level=95)
100%|██████████| 250/250 [00:00<00:00, 3686.16it/s]
100%|██████████| 250/250 [00:00<00:00, 6971.82it/s]
regr.plot("realgovt")
regr.plot("tbilrate")
2 - 2 - VAR
model = VAR(df_train)
results = model.fit(maxlags=5, ic='aic')
lag_order = results.k_ar
VAR_preds = results.forecast(df_train.values[-lag_order:], df_test.shape[0])
results.plot_forecast(steps = df_test.shape[0]);
2 - 3 - VECM
model = VECM(df_train, k_ar_diff=2, coint_rank=2)
vecm_res = model.fit()
vecm_res.gamma.round(4)
vecm_res.summary()
vecm_res.predict(steps=df_test.shape[0])
forecast, lower, upper = vecm_res.predict(df_test.shape[0], 0.05)
vecm_res.plot_forecast(steps = df_test.shape[0])
out-of-sample errors
display([("nnetsauce.MTS+"+models.index[i], models["RMSE"].iloc[i]) for i in range(3)])
display(('VAR', mean_squared_error(df_test.values, VAR_preds, squared=False)))
display(('VECM', mean_squared_error(df_test.values, forecast, squared=False)))
[('nnetsauce.MTS+LassoCV', 0.22102547609924011),
('nnetsauce.MTS+ElasticNetCV', 0.22103106562991648),
('nnetsauce.MTS+LassoLarsCV', 0.22103468506703655)]
('VAR', 0.22128770514262763)
('VECM', 0.22170093788693065)
Comments powered by Talkyard.