 ## AdaOpt (a probabilistic classifier based on a mix of multivariable optimization and nearest neighbors) for R

22 May 2020 | RSS | Back to list of posts

Last week on this blog, I presented `AdaOpt` for Python on a handwritten digits classification task. `AdaOpt` is a novel probabilistic classifier, based on a mix of multivariable optimization and a nearest neighbors algorithm. It’s still very new and only time will allow to fully appreciate all of its features.

The tool is fast due to Cython, and to the ubiquitous (and mighty) numpy, which both help in bringing C/C++ -like performances to Python. There are also a few tricks available in `AdaOpt`, that allow to make it faster to train on bigger datasets. More details about the algorithm can be found in this (short) paper. `AdaOpt` is now available to R users, and I used reticulate for porting it (as I did for nnetsauce). An R documentation for the package can be found in this repo.

Here is an example of use:

## 1 - Install packages

`AdaOpt`’s development code is available on GitHub, and the package can be installed by using `devtools` in R console:

``````library(devtools)
``````
``````devtools::install_github("thierrymoudiki/mlsauce/R-package/")
``````
``````library(mlsauce)
``````

The package `datasets` is also used for training the model:

``````library(datasets)
``````

## 2 - Dataset for classification

The `iris` dataset (from package `datasets`) is used for this simple demo:

``````# import iris dataset

X <- as.matrix(iris[, 1:4])
y <- as.integer(iris[, 5]) - 1L # the classifier will accept numbers starting at 0
n <- dim(X)
p <- dim(X)

# create training set and test set
set.seed(21341)
train_index <- sample(x = 1:n, size = floor(0.8*n),
replace = TRUE)
test_index <- -train_index

X_train <- as.matrix(iris[train_index, 1:4])
y_train <- as.integer(iris[train_index, 5]) - 1L # the classifier will accept numbers starting at 0
X_test <- as.matrix(iris[test_index, 1:4])
y_test <- as.integer(iris[test_index, 5]) - 1L # the classifier will accept numbers starting at 0
``````

## 3 - Model fitting and score on test set

Now, we create an `AdaOpt`object and print its attributes:

``````# create AdaOpt object with default parameters

# print object attributes
print(obj\$get_params())
``````
``````## \$batch_size
##  100
##
## \$cache
##  TRUE
##
## \$eta
##  0.01
##
## \$gamma
##  0.01
##
## \$k
##  3
##
## \$learning_rate
##  0.3
##
## \$n_clusters
##  0
##
## \$n_iterations
##  50
##
## \$reg_alpha
##  0.5
##
## \$reg_lambda
##  0.1
##
## \$row_sample
##  1
##
## \$seed
##  123
##
## \$tolerance
##  0
##
## \$type_dist
##  "euclidean-f"
``````

Model fitting:

``````# fit AdaOpt to iris dataset
obj\$fit(X_train, y_train)
``````
``````## AdaOpt(batch_size=100, cache=True, eta=0.01, gamma=0.01, k=3, learning_rate=0.3,
##        n_clusters=0.0, n_iterations=50, reg_alpha=0.5, reg_lambda=0.1,
##        row_sample=1.0, seed=123, tolerance=0.0, type_dist='euclidean-f')
``````

Obtain test set accuracy:

``````# accuracy on test set
print(obj\$score(X_test, y_test))
``````
``````##  0.9701493
``````

Lastly, no this package is not going to end up on CRAN. None of my packages will, starting from now. If you’re planning to submit your package to this website, well, there’s more to it than being proud of having it accepted. If I may: think about it longer. In particular: read this document about licenses choices, and know your rights regarding your intellectual property…

Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!

Previous publications