logo

AdaOpt (a probabilistic classifier based on a mix of multivariable optimization and nearest neighbors) for R


22 May 2020 | RSS | Back to list of posts


Last week on this blog, I presented AdaOpt for Python on a handwritten digits classification task. AdaOpt is a novel probabilistic classifier, based on a mix of multivariable optimization and a nearest neighbors algorithm. It’s still very new and only time will allow to fully appreciate all of its features.

The tool is fast due to Cython, and to the ubiquitous (and mighty) numpy, which both help in bringing C/C++ -like performances to Python. There are also a few tricks available in AdaOpt, that allow to make it faster to train on bigger datasets. More details about the algorithm can be found in this (short) paper.

image-title-here

AdaOpt is now available to R users, and I used reticulate for porting it (as I did for nnetsauce). An R documentation for the package can be found in this repo.

Here is an example of use:

1 - Install packages

AdaOpt’s development code is available on GitHub, and the package can be installed by using devtools in R console:

library(devtools)
devtools::install_github("thierrymoudiki/mlsauce/R-package/")
library(mlsauce)

The package datasets is also used for training the model:

library(datasets)

2 - Dataset for classification

The iris dataset (from package datasets) is used for this simple demo:

# import iris dataset

X <- as.matrix(iris[, 1:4])
y <- as.integer(iris[, 5]) - 1L # the classifier will accept numbers starting at 0
n <- dim(X)[1]
p <- dim(X)[2]

# create training set and test set
set.seed(21341)
train_index <- sample(x = 1:n, size = floor(0.8*n), 
                      replace = TRUE)
test_index <- -train_index

X_train <- as.matrix(iris[train_index, 1:4])
y_train <- as.integer(iris[train_index, 5]) - 1L # the classifier will accept numbers starting at 0
X_test <- as.matrix(iris[test_index, 1:4])
y_test <- as.integer(iris[test_index, 5]) - 1L # the classifier will accept numbers starting at 0

3 - Model fitting and score on test set

Now, we create an AdaOptobject and print its attributes:

# create AdaOpt object with default parameters
obj <- mlsauce::AdaOpt()

# print object attributes
print(obj$get_params())
## $batch_size
## [1] 100
## 
## $cache
## [1] TRUE
## 
## $eta
## [1] 0.01
## 
## $gamma
## [1] 0.01
## 
## $k
## [1] 3
## 
## $learning_rate
## [1] 0.3
## 
## $n_clusters
## [1] 0
## 
## $n_iterations
## [1] 50
## 
## $reg_alpha
## [1] 0.5
## 
## $reg_lambda
## [1] 0.1
## 
## $row_sample
## [1] 1
## 
## $seed
## [1] 123
## 
## $tolerance
## [1] 0
## 
## $type_dist
## [1] "euclidean-f"

Model fitting:

# fit AdaOpt to iris dataset
obj$fit(X_train, y_train)
## AdaOpt(batch_size=100, cache=True, eta=0.01, gamma=0.01, k=3, learning_rate=0.3,
##        n_clusters=0.0, n_iterations=50, reg_alpha=0.5, reg_lambda=0.1,
##        row_sample=1.0, seed=123, tolerance=0.0, type_dist='euclidean-f')

Obtain test set accuracy:

# accuracy on test set 
print(obj$score(X_test, y_test))
## [1] 0.9701493

Lastly, no this package is not going to end up on CRAN. None of my packages will, starting from now. If you’re planning to submit your package to this website, well, there’s more to it than being proud of having it accepted. If I may: think about it longer. In particular: read this document about licenses choices, and know your rights regarding your intellectual property…

Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!



Previous publications