There is always some uncertainty attached to Statistical/Machine Learning (ML) model’s predictions. In this post, we examine a few ways to obtain prediction intervals for nnetsauce’s ML models. As a reminder, every model in the nnetsauce is based on a component g(XW+b) where:
- X is a matrix containing some explanatory variables and optional clustering information (taking into account input data’s heterogeneity).
- W creates additional explanatory variables from X and is drawn from various random and quasirandom sequences.
- b is an optional bias parameter.
- g is an activation function such as the hyperbolic tangent or the sigmoid function.
If we consider again our example with tomatoes and apples, the y below (figure) contains our model’s decisions: either “observation is a tomato” or “observation is an apple”). The X’s are each fruit’s characteristics (color, shape, etc.).
This figure presents the structure of a linear nnetsauce model but it could be anything including nonlinear models, as long as the model has methods
predict. There are many ways to obtain prediction intervals for nnetsauce models. Here, we present 3 ways for doing that:
- Using a bayesian linear model as a basis for our nnetsauce model
- Using a uniform distribution for W
- Using the randomness of dropout regularization technique on g(XW+b)
Here are the Python packages needed for our demo:
import nnetsauce as ns import numpy as np from sklearn import datasets from sklearn import linear_model from sklearn.model_selection import train_test_split
Loading dataset for model training:
# Boston Housing data boston = datasets.load_boston() X = boston.data y = np.log(boston.target) # split data in training/test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
Using a bayesian model as a basis for our nnetsauce model
Fitting a Bayesian Ridge regression model:
# base model for nnetsauce regr = linear_model.BayesianRidge() # create nnetsauce model based on the bayesian linear model fit_obj = ns.CustomRegressor(obj=regr, n_hidden_features=10, activation_name='relu', n_clusters=2) # model fitting fit_obj.fit(X_train, y_train) # mean prediction and # standard deviation of the predictions mean_preds1, std_preds1 = fit_obj.predict(X_test, return_std=True)
The model’s prediction interval (using the mean prediction and the standard deviation of our predictions) is depicted below:
Red line represents values of
y_test; actual observed houses prices on unseen data (data not used for training the model). The black line represents model predictions on unseen data, and the shaded region is our prediction interval.
Using a uniform distribution for W
For doing that, we simulate the W in a nnetsauce
CustomRegressor 100 times. With
# number of model simulations n = 100 # create nnetsauce models fit_obj2 = [ns.CustomRegressor(obj=regr, n_hidden_features=10, activation_name='relu', n_clusters=2, nodes_sim="uniform", seed=(i*1000 + 2)) for i in range(n)] # model fitting fit_obj2_fits = [fit_obj2[i].fit(X_train, y_train) for i in range(n)] # models predictions preds2 = np.asarray([fit_obj2_fits[i].predict(X_test) for i in range(n)]).T # mean prediction mean_preds2 = preds2.mean(axis=1) # standard deviation of the predictions std_preds2 = preds2.std(axis=1)
We obtain the following prediction interval:
Using the dropout regularization technique on g(XW+b)
The dropout technique randomly drops some elements in the matrix g(XW+b). We can use its randomness to incorporate some uncertainty into our model:
# create nnetsauce models fit_obj3 = [ns.CustomRegressor(obj=regr, n_hidden_features=10, activation_name='relu', n_clusters=2, nodes_sim="sobol", dropout=0.1, seed=(i*1000 + 2)) for i in range(n)] # models fitting fit_obj3_fits = [fit_obj3[i].fit(X_train, y_train) for i in range(n)] # model predictions preds3 = np.asarray([fit_obj3_fits[i].predict(X_test) for i in range(n)]).T # mean prediction mean_preds3 = preds3.mean(axis=1) # standard deviation of the predictions std_preds3 = preds3.std(axis=1)
Using the dropout technique, we obtain:
Using a uniform distribution for W seems to be the best way to obtain prediction intervals on this dataset and for this model. However, it’s worth mentioning that the base model (a bayesian linear model) is not calibrated. Calibrating the base model could lead to different results than those displayed here.
Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!
Under License Creative Commons Attribution 4.0 International.