Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization
ahead
is an R, Python and Julia package for univariate and multivariate time series forecasting with uncertainty quantification (including predictive simulation). Matlab is next. The aim is always to make these implementations of ahead
as similar as possible, but that’s a looot of work as you might guess. Do not hesitate to submit a pull request to these repositories if you want to help and contribute. You can also signal bugs (constructively).
The main changes in ahead
’s v0.10.0
are:
- Fast calibration of
ridge2f
using a Leave-One-Out Cross-Validation (LOOCV) trick (R for now) - Univariate forecasting with
ridge2f
with external regressors - Plotting functionality in Python version
Read on for more details. You can also use this notebook if you want to reproduce the experiments.
1. Fast calibration of ridge2f
:
The fast calibration of ridge2f
uses a remarkable result available for linear models’ LOOCV:
Where \(\hat{f}_{-i}\left(\mathbf{z}_i\right)\) is a model statistical learning model \(f\)’s prediction without the \(i\)-th observation in the training set. \(\mathbf{z}_i\) is the \(i\)-th observation, and \(\mathbf{S}_{i i}\) is the \(i\)-th diagonal element of the hat matrix (a.k.a smoother, a matrix so that \(\hat{y} = \mathbf{S} y\)). Keep in mind that time series cross-vaidation will give different results.
The LOOCV result can be extended further and approximated by Generalized Cross-Validation (GCV), still with a closed-form formula available. GCV is used in ahead::ridge2f
. Indeed, even though ridge2
is not – strictly speaking – linear, it possesses the structure of a ridge regression model with 2 regularization parameters; a regularization parameter for the original explanative variables, and a regularization parameter for new, engineered features. These engineered features transform the linear model into a non-linear one.
In the following R example, it’s worth mentioning that only the 2 regularization parameters are calibrated. Other hyperparameters such as the number of time series lags or the number of nodes in the hidden layer are set to their default values. Feel free try and share other experiments including them.
%load_ext rpy2.ipython
%%R
options(repos = c(
techtonique = 'https://techtonique.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
install.packages("ahead")
install.packages("ggplot2")
install.packages("forecast")
install.packages("dfoptim")
%%R
library(ahead)
library(dfoptim)
# Convert to R DataFrame
%R -i df
%R df_ts <- as.ts(df)
%%R
objective_function <- function(xx)
{
ahead::loocvridge2f(df_ts,
h = 20,
type_pi="blockbootstrap",
lambda_1=10^xx[1],
lambda_2=10^xx[2],
show_progress = FALSE,
)$loocv
}
start <- proc.time()[3]
(opt <- dfoptim::nmkb(fn=objective_function, lower=c(-10,-10), upper=c(10,10), par=c(0.1, 0.1)))
print(proc.time()[3]-start)
elapsed
6.657
%%R
start <- proc.time()[3]
res <- ahead::ridge2f(df_ts, h = 20,
type_pi="blockbootstrap",
lambda_1=10^opt$par[1], # 'optimal' parameters
lambda_2=10^opt$par[2]) # 'optimal' parameters
print(proc.time()[3]-start)
|======================================================================| 100%
elapsed
0.805
%%R
par(mfrow=c(2, 2))
plot(res, "realgovt", type = "sims", main="50 predictive simulations \n of realgovt")
plot(res, "realgovt", type = "dist", main="predictive distribution \n of realgovt")
plot(res, "realgdp", type = "sims", main="50 predictive simulations \n of realgdp")
plot(res, "realgdp", type = "dist", main="predictive distribution \n of realgovt")
2. Univariate forecasting with ridge2f
and external regressors:
ridge2f
was initially designed for multivariate time series forecasting. Nothing prevents it from being used for univariate forecasting. This functionality is now available in ahead
(R for now). In addition, as in the multivariate case, it is possible to use external regressors in the forecasting process.
%%R
library(ggplot2)
library(forecast)
x <- fdeaths
xreg <- ahead::createtrendseason(x)
(z <- ahead::ridge2f(x, xreg = xreg, h=20))
autoplot(z)
3. Plotting functionality in Python version:
Install and import packages
!pip install ahead
import numpy as np
import pandas as pd
import statsmodels.api as sm
from time import time
from ahead import Ridge2Regressor # may take some time to run, ONLY the 1st time it's run
from statsmodels.tsa.base.datetools import dates_from_str
from time import time
Import data
# some example data
mdata = sm.datasets.macrodata.load_pandas().data
# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
quarterly = dates_from_str(quarterly)
mdata = mdata[['realgovt', 'tbilrate', 'cpi', 'realgdp']]
mdata.index = pd.DatetimeIndex(quarterly)
df = np.log(mdata).diff().dropna()
display(df)
realgovt | tbilrate | cpi | realgdp | |
---|---|---|---|---|
1959-06-30 | 0.023664 | 0.088193 | 0.005849 | 0.024942 |
1959-09-30 | 0.020481 | 0.215321 | 0.006838 | -0.001193 |
1959-12-31 | -0.014781 | 0.125317 | 0.000681 | 0.003495 |
1960-03-31 | -0.046197 | -0.212805 | 0.005772 | 0.022190 |
1960-06-30 | -0.003900 | -0.266946 | 0.000338 | -0.004685 |
... | ... | ... | ... | ... |
2008-09-30 | 0.031005 | -0.396881 | -0.007904 | -0.006781 |
2008-12-31 | 0.015732 | -2.277267 | -0.021979 | -0.013805 |
2009-03-31 | -0.010967 | 0.606136 | 0.002340 | -0.016612 |
2009-06-30 | 0.026975 | -0.200671 | 0.008419 | -0.001851 |
2009-09-30 | 0.019888 | -0.405465 | 0.008894 | 0.006862 |
202 rows × 4 columns
regr = Ridge2Regressor(h = 20, date_formatting = "original",
type_pi="blockbootstrap", # type of simulations for predictive inference
B=50, # number of simulations
seed=1)
start = time()
regr.forecast(df)
print(f"Elapsed: {time() - start} s")
|======================================================================| 100%
Elapsed: 0.49063634872436523 s
regr2 = Ridge2Regressor(h = 20, date_formatting = "original",
type_pi="gaussian", # Gaussian prediction intervals
seed=1)
start = time()
regr2.forecast(df)
print(f"Elapsed: {time() - start} s")
Elapsed: 0.038446664810180664 s
regr.plot("realgovt")
regr.plot("realgdp")
regr2.plot("realgovt")
regr2.plot("realgdp")
Comments powered by Talkyard.