Model-agnostic 'Bayesian' optimization (for hyperparameter tuning) using conformalized surrogates in GPopt

Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb. 100 API requests are now (and forever) offered to every user every month, no matter the pricing tier.

Disclaimer: Updated on 2025-06-28

Bayesian optimization (BO) is a popular (and clever, and elegant, and beautiful, and efficient) optimization method for hyperparameter tuning in Machine Learning and Deep Learning. BO is based on the use of a surrogate model that approximates the objective function (the function to be minimized) in a probabilistic way. It optimizes a cheaper acquisition function that allows to select the next point to evaluate.

The most common surrogate model in BO is the Gaussian process regressor, a Bayesian model with a Gaussian prior, and the most common acquisition function is the Expected Improvement (EI). The idea of EI is to select the next point to evaluate based on the expected improvement relative to the current best point.

Conformal Prediction is a framework allowing, among other things, to make supervised learning predictions with prediction intervals. For more details on Bayesian optimization and Conformal Prediction, see the following references:

In this post, I’ll show how to use conformalized surrogates for optimization, thanks to GPopt and nnetsauce. With this approach, any surrogate model can be used for optimization, and there’s no more constraint on the choice of a prior (Gaussian, Laplace, etc.). The acquisition function is the lower confidence bound (LCB) of the conformalized surrogate model.

A future post will show how to use conformalized surrogates for Machine Learning and Deep Learning hyperparameter tuning.

pip install GPopt nnetsauce

import GPopt as gp
import nnetsauce as ns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.kernel_ridge import KernelRidge
from sklearn.svm import SVR
from sklearn.model_selection import cross_val_score
from scipy.optimize import minimize
from statsmodels.nonparametric.smoothers_lowess import lowess


# Six-Hump Camel Function (Objective function, to be minimized)
def six_hump_camel(x):
    """
    Six-Hump Camel Function:
    - Global minima located at:
      (0.0898, -0.7126),
      (-0.0898, 0.7126)
    - Function value at the minima: f(x) = -1.0316
    """
    x1 = x[0]
    x2 = x[1]
    term1 = (4 - 2.1 * x1**2 + (x1**4) / 3) * x1**2
    term2 = x1 * x2
    term3 = (-4 + 4 * x2**2) * x2**2
    return term1 + term2 + term3

import matplotlib.pyplot as plt
import numpy as np
# Generate a grid of points in the input space
x = np.linspace(-3, 3, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)

# Evaluate the objective function at each point in the grid
Z = np.zeros_like(X)
for i in range(X.shape[0]):
    for j in range(X.shape[1]):
        Z[i, j] = six_hump_camel([X[i, j], Y[i, j]])

# Plot the contour map
plt.figure(figsize=(8, 6))
contour = plt.contourf(X, Y, Z, levels=50, cmap='viridis')
plt.colorbar(contour, label='Objective function value')
plt.title('Contour plot of the Six-Hump Camel function')
plt.xlabel('x1')
plt.ylabel('x2')
plt.show()

xxx

from sklearn.utils import all_estimators
from tqdm import tqdm

# Get all available scikit-learn estimators
estimators = all_estimators(type_filter='regressor')

results = []

# Loop through all regressors
for name, RegressorClass in tqdm(estimators):
    try:
        # Instantiate the regressor (you might need to handle potential exceptions or required parameters)
        regressor = RegressorClass()
        print(f"\n Successfully instantiated regressor: {name} ----------")
        # GPopt for Bayesian optimization
        gp_opt = gp.GPOpt(objective_func=six_hump_camel,
                          lower_bound = np.array([-3, -2]),
                          upper_bound = np.array([3, 2]),
                          acquisition="ucb",
                          method="splitconformal",
                          surrogate_obj=ns.PredictionInterval(regressor), # Any surrogate model can be used, thanks to nnetsauce
                          n_init=10,
                          n_iter=190,
                          seed=432)
        print(f"gp_opt.method: {gp_opt.method}")
        res = gp_opt.optimize(verbose=1, ucb_tol=1e-6)
        print(f"\n\n result: {res}")
        display(res.best_params)
        display(res.best_score)
        results.append((name, res))

    except Exception as e:
        print(f"Could not instantiate regressor {name}: {e}")

import pandas as pd

results_df = pd.DataFrame(columns=['Regressor', 'Best Params', 'Best Score'])

for name, res in results:
    best_params = res.best_params
    best_score = res.best_score
    results_df = pd.concat([results_df, pd.DataFrame({'Regressor': [name], 'Best Params': [best_params], 'Best Score': [best_score]})], ignore_index=True)

results_df.sort_values(by='Best Score', ascending=True, inplace=True)
results_df.reset_index(drop=True, inplace=True)

results_df.style.format({'Best Score': "{:.5f}"})

	Regressor	Best Params	Best Score
0	BaggingRegressor	[ 0.09649658 -0.71691895]	-1.03133
1	GaussianProcessRegressor	[ 0.09649658 -0.71691895]	-1.03133
2	NuSVR	[ 0.09649658 -0.71691895]	-1.03133
3	SVR	[ 0.09649658 -0.71691895]	-1.03133
4	MLPRegressor	[-0.09155273 0.69482422]	-1.02905
5	GradientBoostingRegressor	[ 0.04907227 -0.71142578]	-1.02514
6	KNeighborsRegressor	[ 0.08203125 -0.6640625 ]	-1.01372
7	ExtraTreeRegressor	[ 0.08203125 -0.6640625 ]	-1.01372
8	RandomForestRegressor	[ 0.08203125 -0.6640625 ]	-1.01372
9	DecisionTreeRegressor	[ 0.08203125 -0.6640625 ]	-1.01372
10	HistGradientBoostingRegressor	[-0.00732422 -0.72167969]	-0.99277
11	AdaBoostRegressor	[ 0.09375 -0.8125 ]	-0.93858
12	ExtraTreesRegressor	[-0.05877686 -0.66418457]	-0.93331
13	ElasticNet	[-0.06650758 -0.66453519]	-0.92451
14	ARDRegression	[-0.06650758 -0.66453519]	-0.92451
15	ElasticNetCV	[-0.06650758 -0.66453519]	-0.92451
16	KernelRidge	[-0.06650758 -0.66453519]	-0.92451
17	HuberRegressor	[-0.06650758 -0.66453519]	-0.92451
18	Lars	[-0.06650758 -0.66453519]	-0.92451
19	LarsCV	[-0.06650758 -0.66453519]	-0.92451
20	LassoLars	[-0.06650758 -0.66453519]	-0.92451
21	LassoLarsCV	[-0.06650758 -0.66453519]	-0.92451
22	Lasso	[-0.06650758 -0.66453519]	-0.92451
23	LassoCV	[-0.06650758 -0.66453519]	-0.92451
24	LinearRegression	[-0.06650758 -0.66453519]	-0.92451
25	LassoLarsIC	[-0.06650758 -0.66453519]	-0.92451
26	LinearSVR	[-0.06650758 -0.66453519]	-0.92451
27	OrthogonalMatchingPursuit	[-0.06650758 -0.66453519]	-0.92451
28	OrthogonalMatchingPursuitCV	[-0.06650758 -0.66453519]	-0.92451
29	PLSRegression	[-0.06650758 -0.66453519]	-0.92451
30	DummyRegressor	[-0.06650758 -0.66453519]	-0.92451
31	BayesianRidge	[-0.06650758 -0.66453519]	-0.92451
32	QuantileRegressor	[-0.06650758 -0.66453519]	-0.92451
33	PassiveAggressiveRegressor	[-0.06650758 -0.66453519]	-0.92451
34	RadiusNeighborsRegressor	[-0.06650758 -0.66453519]	-0.92451
35	RANSACRegressor	[-0.06650758 -0.66453519]	-0.92451
36	Ridge	[-0.06650758 -0.66453519]	-0.92451
37	RidgeCV	[-0.06650758 -0.66453519]	-0.92451
38	SGDRegressor	[-0.06650758 -0.66453519]	-0.92451
39	TheilSenRegressor	[-0.06650758 -0.66453519]	-0.92451
40	TransformedTargetRegressor	[-0.06650758 -0.66453519]	-0.92451
41	TweedieRegressor	[-0.06650758 -0.66453519]	-0.92451

# Michalewicz Function
def michalewicz(x, m=10):
    """
    Michalewicz Function (for n=2 dimensions):
    """
    return -sum(np.sin(xi) * (np.sin((i + 1) * xi**2 / np.pi))**(2 * m) for i, xi in enumerate(x))


import matplotlib.pyplot as plt
import numpy as np
# Generate a grid of points in the input space
x = np.linspace(0, 2, 100)
y = np.linspace(np.pi, 2, 100)
X, Y = np.meshgrid(x, y)

# Evaluate the objective function at each point in the grid
Z = np.zeros_like(X)
for i in range(X.shape[0]):
    for j in range(X.shape[1]):
        Z[i, j] = michalewicz([X[i, j], Y[i, j]])

# Plot the contour map
plt.figure(figsize=(8, 6))
contour = plt.contourf(X, Y, Z, levels=50, cmap='viridis')
plt.colorbar(contour, label='Objective function value')
plt.title('Contour plot of the Six-Hump Camel function')
plt.xlabel('x1')
plt.ylabel('x2')
plt.show()

xxx {:class=”img-responsive”

from sklearn.utils import all_estimators
from tqdm import tqdm

# Get all available scikit-learn estimators
estimators = all_estimators(type_filter='regressor')

results = []

# Loop through all regressors
for name, RegressorClass in tqdm(estimators):
    try:
        # Instantiate the regressor (you might need to handle potential exceptions or required parameters)
        regressor = RegressorClass()
        print(f"\n Successfully instantiated regressor: {name} ----------")
        # GPopt for Bayesian optimization
        gp_opt = gp.GPOpt(objective_func=michalewicz,
                          lower_bound = np.array([0, np.pi]),
                          upper_bound = np.array([2, 2]),
                          acquisition="ucb",
                          method="splitconformal",
                          surrogate_obj=ns.PredictionInterval(regressor), # Any surrogate model can be used, thanks to nnetsauce
                          n_init=10,
                          n_iter=190,
                          seed=432)
        print(f"gp_opt.method: {gp_opt.method}")
        res = gp_opt.optimize(verbose=1, ucb_tol=1e-6)
        print(f"\n\n result: {res}")
        display(res.best_params)
        display(res.best_score)
        results.append((name, res))

    except Exception as e:
        print(f"Could not instantiate regressor {name}: {e}")

import pandas as pd

results_df = pd.DataFrame(columns=['Regressor', 'Best Params', 'Best Score'])

for name, res in results:
    best_params = res.best_params
    best_score = res.best_score
    results_df = pd.concat([results_df, pd.DataFrame({'Regressor': [name], 'Best Params': [best_params], 'Best Score': [best_score]})], ignore_index=True)

results_df.sort_values(by='Best Score', ascending=True, inplace=True)
results_df.reset_index(drop=True, inplace=True)

results_df.style.format({'Best Score': "{:.5f}"})

	Regressor	Best Params	Best Score
0	BaggingRegressor	[1.9989624 2.71631734]	-0.77895
1	GradientBoostingRegressor	[1.9989624 2.71631734]	-0.77895
2	GaussianProcessRegressor	[1.9989624 2.71631734]	-0.77895
3	AdaBoostRegressor	[1.99511719 2.70736381]	-0.76882
4	MLPRegressor	[1.99978638 2.68494514]	-0.74841
5	RandomForestRegressor	[1.99978638 2.68494514]	-0.74841
6	ExtraTreesRegressor	[1.97668457 2.67872644]	-0.67143
7	ExtraTreeRegressor	[1.9453125 2.68227998]	-0.60804
8	HuberRegressor	[1.93724655 2.67858092]	-0.58092
9	KNeighborsRegressor	[1.93724655 2.67858092]	-0.58092
10	KernelRidge	[1.93724655 2.67858092]	-0.58092
11	ElasticNetCV	[1.93724655 2.67858092]	-0.58092
12	LarsCV	[1.93724655 2.67858092]	-0.58092
13	LassoCV	[1.93724655 2.67858092]	-0.58092
14	Lars	[1.93724655 2.67858092]	-0.58092
15	ARDRegression	[1.93724655 2.67858092]	-0.58092
16	OrthogonalMatchingPursuitCV	[1.93724655 2.67858092]	-0.58092
17	PLSRegression	[1.93724655 2.67858092]	-0.58092
18	NuSVR	[1.93724655 2.67858092]	-0.58092
19	OrthogonalMatchingPursuit	[1.93724655 2.67858092]	-0.58092
20	LinearRegression	[1.93724655 2.67858092]	-0.58092
21	LassoLarsIC	[1.93724655 2.67858092]	-0.58092
22	LinearSVR	[1.93724655 2.67858092]	-0.58092
23	LassoLarsCV	[1.93724655 2.67858092]	-0.58092
24	PassiveAggressiveRegressor	[1.93724655 2.67858092]	-0.58092
25	QuantileRegressor	[1.93724655 2.67858092]	-0.58092
26	SGDRegressor	[1.93724655 2.67858092]	-0.58092
27	RidgeCV	[1.93724655 2.67858092]	-0.58092
28	Ridge	[1.93724655 2.67858092]	-0.58092
29	RadiusNeighborsRegressor	[1.93724655 2.67858092]	-0.58092
30	RANSACRegressor	[1.93724655 2.67858092]	-0.58092
31	BayesianRidge	[1.93724655 2.67858092]	-0.58092
32	TweedieRegressor	[1.93724655 2.67858092]	-0.58092
33	TransformedTargetRegressor	[1.93724655 2.67858092]	-0.58092
34	TheilSenRegressor	[1.93724655 2.67858092]	-0.58092
35	DecisionTreeRegressor	[1.8515625 2.73579214]	-0.47178
36	SVR	[0.76176453 2.71127445]	-0.41275
37	DummyRegressor	[0.75 2.71349541]	-0.41257
38	ElasticNet	[0.75 2.71349541]	-0.41257
39	HistGradientBoostingRegressor	[0.75 2.71349541]	-0.41257
40	Lasso	[0.75 2.71349541]	-0.41257
41	LassoLars	[0.75 2.71349541]	-0.41257

xxx

Comments powered by Talkyard.

@misc{ tmoudiki20241209, author = { T. Moudiki }, title = { Model-agnostic 'Bayesian' optimization (for hyperparameter tuning) using conformalized surrogates in GPopt }, url = { https://thierrymoudiki.github.io/blog/2024/12/09/python/bayesconfoptim }, year = { 2024 } }