Due to the way it mixes several – relatively – simple concepts, Bayesian optimization (BO) is one of the **most elegant** mathematical tool I’ve encountered so far. In this post, I introduce `GPopt`

, a tool for BO that I implemented in Python (no technical docs yet, but coming soon). The examples of `GPopt`

’s use showcased here are based on Gaussian Processes (GP) and Expected Improvement (EI): **what does that mean?**

GPs are Bayesian statistical/machine learning (ML) models which create a distribution on functions, and especially on black-box functions. If we let \(f\) be the black-box – and expensive-to-evaluate – function whose minimum is searched, a GP is firstly adjusted (in a supervised learning way) to a small set of points at which \(f\) is evaluated. This small set of points is the *initial design*. Then the GP, thanks to its probabilistic nature, will **exploit** its knowledge of previous points at which \(f\) has already been evaluated, to **explore** new points: potential minimas maximizing an EI criterion. It’s a reinforcement learning question!

For more details on Bayesian optimization applied to hyperparameters calibration in ML, you can read Chapter 6 of this document. In this post, a Branin (2D) and a Hartmann (3D) functions will be used as examples of objective functions \(f\), and Matérn 5/2 is the GP’s covariance.

**Installing and importing the packages**:

```
!pip install GPopt
```

```
!pip install matplotlib==3.1.3
```

```
import GPopt as gp
import time
import sys
import numpy as np
import matplotlib.pyplot as plt
from os import chdir
from scipy.optimize import minimize
```

The objective function to be minimized:

```
# branin function
def branin(x):
x1 = x[0]
x2 = x[1]
term1 = (x2 - (5.1*x1**2)/(4*np.pi**2) + (5*x1)/np.pi - 6)**2
term2 = 10*(1-1/(8*np.pi))*np.cos(x1)
return (term1 + term2 + 10)
```

```
# A local minimum for verification
print("\n")
res = minimize(branin, x0=[0, 0], method='Nelder-Mead', tol=1e-6)
print(res.x)
print(branin(res.x))
```

```
[3.14159271 2.27500017]
0.397887357729795
```

Firstly, I **demonstrate the convergence of the optimizer** towards a minimum (and show that the optimizer can pick up the procedure where it left):

```
# n_init is the number of points in the initial design
# n_iter is the number of iterations of the optimizer
gp_opt = gp.GPOpt(objective_func=branin,
lower_bound = np.array([-5, 0]),
upper_bound = np.array([10, 15]),
n_init=10, n_iter=10)
gp_opt.optimize(verbose=1)
```

Plotting the **changes in expected improvement as a function of the number of iterations**.

```
print(gp_opt.x_min) # current minimum
print(gp_opt.y_min) # current minimum
plt.plot(np.diff(gp_opt.max_ei))
```

```
[9.31152344 1.68457031]
0.9445903336427559
```

Adding **more iterations** to the optimizer:

```
gp_opt.optimize(verbose=1, n_more_iter=10)
```

```
...Done.
Optimization loop...
10/10 [██████████████████████████████] - 2s 186ms/step
(array([3.22692871, 2.63122559]), 0.6107733232129569)
```

Plotting the **changes in expected improvement as a function of the number of iterations** (again).

```
print(gp_opt.x_min) # current minimum
print(gp_opt.y_min) # current minimum
plt.plot(np.diff(gp_opt.max_ei))
```

```
[3.22692871 2.63122559]
0.6107733232129569
```

Adding more iterations to the optimizer (again):

```
gp_opt.optimize(verbose=1, n_more_iter=80)
```

Plotting the **changes in expected improvement as a function of the number of iterations** (again).

```
print(gp_opt.x_min) # current minimum
print(gp_opt.y_min) # current minimum
plt.plot(np.diff(gp_opt.max_ei))
```

```
[9.44061279 2.48199463]
0.3991320518189241
```

The 3 previous graphs suggest the possibility of stopping the optimizer *earlier*, when the **algorithm is not improving on previous points’ results anymore**:

```
# # early stopping
# abs_tol is the parameter that controls early stopping
gp_opt2 = gp.GPOpt(objective_func=branin,
lower_bound = np.array([-5, 0]),
upper_bound = np.array([10, 15]),
n_init=10, n_iter=190)
gp_opt2.optimize(verbose=2, abs_tol=1e-4)
print("\n")
```

We can observe that only 58 iterations are necessary when `abs_tol = 1e-4`

```
print(gp_opt2.n_iter)
print(gp_opt2.x_min)
print(gp_opt2.y_min)
```

```
58
[9.44061279 2.48199463]
0.3991320518189241
```

Illustrating convergence:

```
plt.plot(gp_opt2.max_ei)
```

We notice that in this example, GPopt falls into a local minimum but is very close to the previous minimum found with gradient-free optimizer (Nelder-Mead). The **opposite situation can occur too**:

```
# [0, 1]^3
def hartmann3(xx):
alpha = np.array([1.0, 1.2, 3.0, 3.2])
A = np.array([3.0, 10, 30,
0.1, 10, 35,
3.0, 10, 30,
0.1, 10, 35]).reshape(4, 3)
P = 1e-4 * np.array([3689, 1170, 2673,
4699, 4387, 7470,
1091, 8732, 5547,
381, 5743, 8828]).reshape(4, 3)
xxmat = np.tile(xx,4).reshape(4, 3)
inner = np.sum(A*(xxmat-P)**2, axis = 1)
outer = np.sum(alpha * np.exp(-inner))
return(-outer)
```

```
# Fails, but may work with multiple restarts
print("\n")
res = minimize(hartmann3, x0=[0, 0, 0], method='Nelder-Mead', tol=1e-6)
print(res.x)
print(hartmann3(res.x))
```

```
[0.36872308 0.11756145 0.26757363]
-1.00081686355956
```

```
# hartmann 3D
gp_opt4 = gp.GPOpt(objective_func=hartmann3,
lower_bound = np.repeat(0, 3),
upper_bound = np.repeat(1, 3),
n_init=20, n_iter=280)
gp_opt4.optimize(verbose=2, abs_tol=1e-4)
```

```
print(gp_opt4.n_iter)
print(gp_opt4.x_min)
print(gp_opt4.y_min)
print("\n")
```

```
51
[0.07220459 0.55792236 0.85662842]
-3.8600590626769904
```

The question is, **in the case of BO applied to ML cross-validation, do we really want to find the global minimum of the objective** function?