Disclaimer: Updated on 2025-06-28

Bayesian optimization (BO) is a popular (and clever, and elegant, and beautiful, and efficient) optimization method for hyperparameter tuning in Machine Learning and Deep Learning. BO is based on the use of a surrogate model that approximates the objective function (the function to be minimized) in a probabilistic way. It optimizes a cheaper acquisition function that allows to select the next point to evaluate.

The most common surrogate model in BO is the Gaussian process regressor, a Bayesian model with a Gaussian prior, and the most common acquisition function is the Expected Improvement (EI). The idea of EI is to select the next point to evaluate based on the expected improvement relative to the current best point.

Conformal Prediction is a framework allowing, among other things, to make supervised learning predictions with prediction intervals. For more details on Bayesian optimization and Conformal Prediction, see the following references:

In this post, I’ll show how to use conformalized surrogates for optimization, thanks to GPopt and nnetsauce. With this approach, any surrogate model can be used for optimization, and there’s no more constraint on the choice of a prior (Gaussian, Laplace, etc.). The acquisition function is the lower confidence bound (LCB) of the conformalized surrogate model.

A future post will show how to use conformalized surrogates for Machine Learning and Deep Learning hyperparameter tuning.

pip install GPopt nnetsauce
import GPopt as gp
import nnetsauce as ns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.kernel_ridge import KernelRidge
from sklearn.svm import SVR
from sklearn.model_selection import cross_val_score
from scipy.optimize import minimize
from statsmodels.nonparametric.smoothers_lowess import lowess


# Six-Hump Camel Function (Objective function, to be minimized)
def six_hump_camel(x):
    """
    Six-Hump Camel Function:
    - Global minima located at:
      (0.0898, -0.7126),
      (-0.0898, 0.7126)
    - Function value at the minima: f(x) = -1.0316
    """
    x1 = x[0]
    x2 = x[1]
    term1 = (4 - 2.1 * x1**2 + (x1**4) / 3) * x1**2
    term2 = x1 * x2
    term3 = (-4 + 4 * x2**2) * x2**2
    return term1 + term2 + term3
import matplotlib.pyplot as plt
import numpy as np
# Generate a grid of points in the input space
x = np.linspace(-3, 3, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)

# Evaluate the objective function at each point in the grid
Z = np.zeros_like(X)
for i in range(X.shape[0]):
    for j in range(X.shape[1]):
        Z[i, j] = six_hump_camel([X[i, j], Y[i, j]])

# Plot the contour map
plt.figure(figsize=(8, 6))
contour = plt.contourf(X, Y, Z, levels=50, cmap='viridis')
plt.colorbar(contour, label='Objective function value')
plt.title('Contour plot of the Six-Hump Camel function')
plt.xlabel('x1')
plt.ylabel('x2')
plt.show()

xxx

from sklearn.utils import all_estimators
from tqdm import tqdm

# Get all available scikit-learn estimators
estimators = all_estimators(type_filter='regressor')

results = []

# Loop through all regressors
for name, RegressorClass in tqdm(estimators):
    try:
        # Instantiate the regressor (you might need to handle potential exceptions or required parameters)
        regressor = RegressorClass()
        print(f"\n Successfully instantiated regressor: {name} ----------")
        # GPopt for Bayesian optimization
        gp_opt = gp.GPOpt(objective_func=six_hump_camel,
                          lower_bound = np.array([-3, -2]),
                          upper_bound = np.array([3, 2]),
                          acquisition="ucb",
                          method="splitconformal",
                          surrogate_obj=ns.PredictionInterval(regressor), # Any surrogate model can be used, thanks to nnetsauce
                          n_init=10,
                          n_iter=190,
                          seed=432)
        print(f"gp_opt.method: {gp_opt.method}")
        res = gp_opt.optimize(verbose=1, ucb_tol=1e-6)
        print(f"\n\n result: {res}")
        display(res.best_params)
        display(res.best_score)
        results.append((name, res))

    except Exception as e:
        print(f"Could not instantiate regressor {name}: {e}")

import pandas as pd

results_df = pd.DataFrame(columns=['Regressor', 'Best Params', 'Best Score'])

for name, res in results:
    best_params = res.best_params
    best_score = res.best_score
    results_df = pd.concat([results_df, pd.DataFrame({'Regressor': [name], 'Best Params': [best_params], 'Best Score': [best_score]})], ignore_index=True)

results_df.sort_values(by='Best Score', ascending=True, inplace=True)
results_df.reset_index(drop=True, inplace=True)

results_df.style.format({'Best Score': "{:.5f}"})
  Regressor Best Params Best Score
0 BaggingRegressor [ 0.09649658 -0.71691895] -1.03133
1 GaussianProcessRegressor [ 0.09649658 -0.71691895] -1.03133
2 NuSVR [ 0.09649658 -0.71691895] -1.03133
3 SVR [ 0.09649658 -0.71691895] -1.03133
4 MLPRegressor [-0.09155273 0.69482422] -1.02905
5 GradientBoostingRegressor [ 0.04907227 -0.71142578] -1.02514
6 KNeighborsRegressor [ 0.08203125 -0.6640625 ] -1.01372
7 ExtraTreeRegressor [ 0.08203125 -0.6640625 ] -1.01372
8 RandomForestRegressor [ 0.08203125 -0.6640625 ] -1.01372
9 DecisionTreeRegressor [ 0.08203125 -0.6640625 ] -1.01372
10 HistGradientBoostingRegressor [-0.00732422 -0.72167969] -0.99277
11 AdaBoostRegressor [ 0.09375 -0.8125 ] -0.93858
12 ExtraTreesRegressor [-0.05877686 -0.66418457] -0.93331
13 ElasticNet [-0.06650758 -0.66453519] -0.92451
14 ARDRegression [-0.06650758 -0.66453519] -0.92451
15 ElasticNetCV [-0.06650758 -0.66453519] -0.92451
16 KernelRidge [-0.06650758 -0.66453519] -0.92451
17 HuberRegressor [-0.06650758 -0.66453519] -0.92451
18 Lars [-0.06650758 -0.66453519] -0.92451
19 LarsCV [-0.06650758 -0.66453519] -0.92451
20 LassoLars [-0.06650758 -0.66453519] -0.92451
21 LassoLarsCV [-0.06650758 -0.66453519] -0.92451
22 Lasso [-0.06650758 -0.66453519] -0.92451
23 LassoCV [-0.06650758 -0.66453519] -0.92451
24 LinearRegression [-0.06650758 -0.66453519] -0.92451
25 LassoLarsIC [-0.06650758 -0.66453519] -0.92451
26 LinearSVR [-0.06650758 -0.66453519] -0.92451
27 OrthogonalMatchingPursuit [-0.06650758 -0.66453519] -0.92451
28 OrthogonalMatchingPursuitCV [-0.06650758 -0.66453519] -0.92451
29 PLSRegression [-0.06650758 -0.66453519] -0.92451
30 DummyRegressor [-0.06650758 -0.66453519] -0.92451
31 BayesianRidge [-0.06650758 -0.66453519] -0.92451
32 QuantileRegressor [-0.06650758 -0.66453519] -0.92451
33 PassiveAggressiveRegressor [-0.06650758 -0.66453519] -0.92451
34 RadiusNeighborsRegressor [-0.06650758 -0.66453519] -0.92451
35 RANSACRegressor [-0.06650758 -0.66453519] -0.92451
36 Ridge [-0.06650758 -0.66453519] -0.92451
37 RidgeCV [-0.06650758 -0.66453519] -0.92451
38 SGDRegressor [-0.06650758 -0.66453519] -0.92451
39 TheilSenRegressor [-0.06650758 -0.66453519] -0.92451
40 TransformedTargetRegressor [-0.06650758 -0.66453519] -0.92451
41 TweedieRegressor [-0.06650758 -0.66453519] -0.92451
# Michalewicz Function
def michalewicz(x, m=10):
    """
    Michalewicz Function (for n=2 dimensions):
    """
    return -sum(np.sin(xi) * (np.sin((i + 1) * xi**2 / np.pi))**(2 * m) for i, xi in enumerate(x))


import matplotlib.pyplot as plt
import numpy as np
# Generate a grid of points in the input space
x = np.linspace(0, 2, 100)
y = np.linspace(np.pi, 2, 100)
X, Y = np.meshgrid(x, y)

# Evaluate the objective function at each point in the grid
Z = np.zeros_like(X)
for i in range(X.shape[0]):
    for j in range(X.shape[1]):
        Z[i, j] = michalewicz([X[i, j], Y[i, j]])

# Plot the contour map
plt.figure(figsize=(8, 6))
contour = plt.contourf(X, Y, Z, levels=50, cmap='viridis')
plt.colorbar(contour, label='Objective function value')
plt.title('Contour plot of the Six-Hump Camel function')
plt.xlabel('x1')
plt.ylabel('x2')
plt.show()

xxx{:class=”img-responsive”

from sklearn.utils import all_estimators
from tqdm import tqdm

# Get all available scikit-learn estimators
estimators = all_estimators(type_filter='regressor')

results = []

# Loop through all regressors
for name, RegressorClass in tqdm(estimators):
    try:
        # Instantiate the regressor (you might need to handle potential exceptions or required parameters)
        regressor = RegressorClass()
        print(f"\n Successfully instantiated regressor: {name} ----------")
        # GPopt for Bayesian optimization
        gp_opt = gp.GPOpt(objective_func=michalewicz,
                          lower_bound = np.array([0, np.pi]),
                          upper_bound = np.array([2, 2]),
                          acquisition="ucb",
                          method="splitconformal",
                          surrogate_obj=ns.PredictionInterval(regressor), # Any surrogate model can be used, thanks to nnetsauce
                          n_init=10,
                          n_iter=190,
                          seed=432)
        print(f"gp_opt.method: {gp_opt.method}")
        res = gp_opt.optimize(verbose=1, ucb_tol=1e-6)
        print(f"\n\n result: {res}")
        display(res.best_params)
        display(res.best_score)
        results.append((name, res))

    except Exception as e:
        print(f"Could not instantiate regressor {name}: {e}")

import pandas as pd

results_df = pd.DataFrame(columns=['Regressor', 'Best Params', 'Best Score'])

for name, res in results:
    best_params = res.best_params
    best_score = res.best_score
    results_df = pd.concat([results_df, pd.DataFrame({'Regressor': [name], 'Best Params': [best_params], 'Best Score': [best_score]})], ignore_index=True)

results_df.sort_values(by='Best Score', ascending=True, inplace=True)
results_df.reset_index(drop=True, inplace=True)

results_df.style.format({'Best Score': "{:.5f}"})
  Regressor Best Params Best Score
0 BaggingRegressor [1.9989624 2.71631734] -0.77895
1 GradientBoostingRegressor [1.9989624 2.71631734] -0.77895
2 GaussianProcessRegressor [1.9989624 2.71631734] -0.77895
3 AdaBoostRegressor [1.99511719 2.70736381] -0.76882
4 MLPRegressor [1.99978638 2.68494514] -0.74841
5 RandomForestRegressor [1.99978638 2.68494514] -0.74841
6 ExtraTreesRegressor [1.97668457 2.67872644] -0.67143
7 ExtraTreeRegressor [1.9453125 2.68227998] -0.60804
8 HuberRegressor [1.93724655 2.67858092] -0.58092
9 KNeighborsRegressor [1.93724655 2.67858092] -0.58092
10 KernelRidge [1.93724655 2.67858092] -0.58092
11 ElasticNetCV [1.93724655 2.67858092] -0.58092
12 LarsCV [1.93724655 2.67858092] -0.58092
13 LassoCV [1.93724655 2.67858092] -0.58092
14 Lars [1.93724655 2.67858092] -0.58092
15 ARDRegression [1.93724655 2.67858092] -0.58092
16 OrthogonalMatchingPursuitCV [1.93724655 2.67858092] -0.58092
17 PLSRegression [1.93724655 2.67858092] -0.58092
18 NuSVR [1.93724655 2.67858092] -0.58092
19 OrthogonalMatchingPursuit [1.93724655 2.67858092] -0.58092
20 LinearRegression [1.93724655 2.67858092] -0.58092
21 LassoLarsIC [1.93724655 2.67858092] -0.58092
22 LinearSVR [1.93724655 2.67858092] -0.58092
23 LassoLarsCV [1.93724655 2.67858092] -0.58092
24 PassiveAggressiveRegressor [1.93724655 2.67858092] -0.58092
25 QuantileRegressor [1.93724655 2.67858092] -0.58092
26 SGDRegressor [1.93724655 2.67858092] -0.58092
27 RidgeCV [1.93724655 2.67858092] -0.58092
28 Ridge [1.93724655 2.67858092] -0.58092
29 RadiusNeighborsRegressor [1.93724655 2.67858092] -0.58092
30 RANSACRegressor [1.93724655 2.67858092] -0.58092
31 BayesianRidge [1.93724655 2.67858092] -0.58092
32 TweedieRegressor [1.93724655 2.67858092] -0.58092
33 TransformedTargetRegressor [1.93724655 2.67858092] -0.58092
34 TheilSenRegressor [1.93724655 2.67858092] -0.58092
35 DecisionTreeRegressor [1.8515625 2.73579214] -0.47178
36 SVR [0.76176453 2.71127445] -0.41275
37 DummyRegressor [0.75 2.71349541] -0.41257
38 ElasticNet [0.75 2.71349541] -0.41257
39 HistGradientBoostingRegressor [0.75 2.71349541] -0.41257
40 Lasso [0.75 2.71349541] -0.41257
41 LassoLars [0.75 2.71349541] -0.41257

xxx

Comments powered by Talkyard.