Last week in #134, I talked about mlsauce’s v0.12.0, and LSBoost in particular. As shown in the post, it’s now possible to obtain prediction intervals for the regression model, notably by employing Split Conformal Prediction.

Right now (looking for ways to fix it), the best way to install the package, is to use the development version:

pip install git+ --verbose

Now, in v0.13.0, it’s possible to add explanatory variables’ heterogeneity to the mix; through clustering (K-means and Gaussian Mixtures models). This means that, a priori, and in order to assess the conditional expectation of the variable of interest as a function of our covariates, we explicitly tell the model to take into account similarities between individual observations. Some examples of use of this new feature can be found here, here and here. Keep in mind however: these examples only show that it’s possible to overfit the training set (hence reducing the loss function’s magnitude) by adding some clusters. The whole model’s hyperparameters need to be ‘fine-tuned’, for example by using  GPopt (tuning the number of clusters in conjunction with the number of columns at each iteration, col_sample is not a good idea though).