Next week, I’ll present `nnetsauce`

’s (univariate and multivariate probabilistic) time series forecasting capabilities at the 44th International Symposium on Forecasting (ISF) (ISF) 2024. ISF is the **premier forecasting conference, attracting the world’s leading forecasting (I don’t only do forecasting though) researchers, practitioners, and students**. I hope to see you there!

In this post, I illustrate how to obtain predictive simulations with `nnetsauce`

’s `MTS`

class using XGBoost as base learner, and I give some intuition behind the method `"kde"`

employed for uncertainty quantification in this case. The time series used here is the `USAccDeaths`

dataset, a univariate time series of monthly deaths from motor vehicle traffic in the United States from 1973 to 1978.

**(Command line)**

```
!pip install nnetsauce --upgrade --no-cache-dir
```

**Import Python packages**

```
import nnetsauce as ns
import numpy as np
import pandas as pd
import xgboost as xgb
import matplotlib.pyplot as plt
import seaborn as sns
# import data
url = "https://raw.githubusercontent.com/Techtonique/datasets/main/time_series/univariate/USAccDeaths.csv"
df = pd.read_csv(url)
df.index = pd.DatetimeIndex(df.date)
df.drop(columns=['date'], inplace=True)
```

**Adjusting XGBoost regressors with different number of estimators**

```
# number of estimators for the base learner
n_estimators_list = [5, 20, 50, 100]
estimators = []
residuals = []
for n_estimators in n_estimators_list:
# XGBoost regressor as base learner
regr_xgb = ns.MTS(obj=xgb.XGBRegressor(n_estimators=n_estimators,
learning_rate=0.1),
n_hidden_features=50,
replications=1000,
kernel='gaussian',
lags=25)
regr_xgb.fit(df)
# in sample residuals
residuals.append(regr_xgb.residuals_.ravel())
# out-of-sample predictions
regr_xgb.predict(h=30)
estimators.append(regr_xgb)
residuals_df = pd.DataFrame(np.asarray(residuals).T,
columns=["n5", "n20", "n50", "n100"])
```

```
100%|██████████| 1/1 [00:00<00:00, 4.55it/s]
100%|██████████| 1000/1000 [00:00<00:00, 1797.56it/s]
100%|██████████| 1000/1000 [00:00<00:00, 1583.30it/s]
100%|██████████| 1/1 [00:00<00:00, 2.30it/s]
100%|██████████| 1000/1000 [00:01<00:00, 582.86it/s]
100%|██████████| 1000/1000 [00:01<00:00, 882.61it/s]
100%|██████████| 1/1 [00:01<00:00, 1.83s/it]
100%|██████████| 1000/1000 [00:01<00:00, 808.35it/s]
100%|██████████| 1000/1000 [00:00<00:00, 1555.30it/s]
100%|██████████| 1/1 [00:02<00:00, 2.37s/it]
100%|██████████| 1000/1000 [00:01<00:00, 578.83it/s]
100%|██████████| 1000/1000 [00:00<00:00, 1436.02it/s]
```

```
sns.set_theme(style="darkgrid")
for est in estimators:
est.plot(type_plot="spaghetti")
```

```
for i in range(4):
sns.kdeplot(residuals_df.iloc[:,i], fill=True, color="red")
plt.show()
```

In order to obtain predictive simulations with `"kde"`

method (and as seen last week in #143), a Kernel Density Estimator (KDE) is adjusted to in-sample residuals. The most intuitive piece I found on KDEs is the following presentation: https://scholar.harvard.edu/files/montamat/files/nonparametric_estimation.pdf.

When using a high number of `estimators`

(with the other parameters kept constant), the `XGBRegressor`

base learner will overfit the training set, so that the in-sample residuals will be very small, and the uncertainty can’t be captured/estimated adequately: the predictions will consist of point forecasts. A compromise needs to be found by using cross-validation on the base learner’s hyperparameters, with an uncertainty quantification metric. Other types of predictions intervals/predictive simulation methods will be available in future versions of `nnetsauce`

.