Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization
In this post, I illustrate classification using linear regression, as implemented in Python/R package nnetsauce
, and more precisely, in nnetsauce
’s MultitaskClassifier
. If you’re not interested in reading about the model description, you can jump directly to the 2nd section, “Two examples in Python”. In addition, the source code is relatively self-explanatory.
Model description
Chapter 4 of Elements of Statistical Learning (ESL), at section 4.2 Linear Regression of an Indicator Matrix, describes classification using linear regression pretty well. Let \(K \in \mathbb{N}\) be the number of classes and \(y \in \mathbb{N}^n\) with values in \(\lbrace 1, \ldots, K \rbrace\) be the variable to be explained. An indicator response matrix \(\textbf{Y} \in \mathbb{N}^{n \times K }\), containing only 0’s and 1’s, can be obtained from \(y\). Each row of \(\textbf{Y}\) shall contain a single 1 – in the column corresponding to the class where the example belongs, and 0’s elsewhere.
Now, let \(\textbf{X} \in \mathbb{R}^{n \times p }\) be the set of explanatory variables for \(y\) and \(\textbf{Y}\), with examples in rows, and characteristics in columns. ESL applies \(K\) least squares models to \(\textbf{X}\), for each column of \(\textbf{Y}\). The regression’s predicted values can be interpreted as raw estimates of probabilities, because the least squares’ solution is a conditional expectation. And for \(G\), a random variable describing the class, we have:
\[\mathbb{E} \left[ \mathbb{1}_{ G = k } \vert X = x \right] = \mathbb{P} \left[ G = k \vert X = x \right]\]The difference between nnetsauce
’s MultitaskClassifier
and the model described in ESL is:
- Any model possessing methods
fit
andpredict
can be used in lieu of a linear regression of \(\textbf{Y}\) on \(\textbf{X}\) -
the set of covariates include the original covariates, \(\textbf{X}\), plus nonlinear transformations of \(\textbf{X}\), \(h(\textbf{X})\), as done in Quasi-Randomized Networks. Having \(h(\textbf{X})\) as additional explanatory variables enhances the models’ flexibility; the model is no longer linear.
- If for each \(k \in \lbrace 1, \ldots, K \rbrace\), \(\hat{f}_k(x)\) is the regression’s predicted value for class \(k\) and an observation characterized by \(x\),
nnetsauce
’sMultitaskClassifier
obtains probabilities that an observation characterized by \(x\) belongs to class \(k\) as:
Where we have \(expit := \frac{1}{1 + exp(-x)}\). \(x \mapsto expit(x)\) is strictly increasing, hence it preserves the ordering of linear regression’s predictions. \(x \mapsto expit(x)\) is also bounded in \([0, 1]\), which helps in avoiding overflows. I divide \(expit \left( \hat{f}_k(x) \right)\) by \(\sum_{i=1}^K expit \left( \hat{f}_k(x) \right)\), so that the probabilities add up to 1. And to finish, the class predicted for an example characterized by \(x\) is:
\[argmax_{k \in \lbrace 1, \ldots, K \rbrace} \hat{p}_k(x)\]Two examples in Python
Currently, installing nnetsauce
from Pypi doesn’t work – and I’m working on fixing it. However, you can install nnetsauce
from GitHub as follows:
pip install git+https://github.com/Techtonique/nnetsauce.git
Import the packages required for the 2 examples.
import nnetsauce as ns
import numpy as np
from sklearn.datasets import load_wine, load_iris
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from time import time
1. Classification of iris dataset:
dataset = load_iris()
Z = dataset.data
t = dataset.target
# training set (80%) and test set (20%)
X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2,
random_state=143)
# Linear Regression is used here
regr3 = LinearRegression()
# `n_hidden_features` makes the model nonlinear
# `n_clusters` takes into account heterogeneity
fit_obj3 = ns.MultitaskClassifier(regr3, n_hidden_features=5,
n_clusters=2, type_clust="gmm")
# Adjust the model
start = time()
fit_obj3.fit(X_train, y_train)
print(f"Elapsed {time() - start}")
# Classification report
start = time()
preds = fit_obj3.predict(X_test)
print(f"Elapsed {time() - start}")
print(metrics.classification_report(preds, y_test))
Elapsed 0.021012067794799805
Elapsed 0.0010943412780761719
precision recall f1-score support
0 1.00 1.00 1.00 12
1 1.00 1.00 1.00 5
2 1.00 1.00 1.00 13
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
2. Classification of wine dataset:
dataset = load_wine()
Z = dataset.data
t = dataset.target
# training set (80%) and test set (20%)
X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2,
random_state=143)
# Linear Regression is used here
regr4 = LinearRegression()
# `n_hidden_features` makes the model nonlinear
# `n_clusters` takes into account heterogeneity
fit_obj4 = ns.MultitaskClassifier(regr4, n_hidden_features=5,
n_clusters=2, type_clust="gmm")
# Adjust the model
start = time()
fit_obj4.fit(X_train, y_train)
print(f"Elapsed {time() - start}")
# Classification report
start = time()
preds = fit_obj4.predict(X_test)
print(f"Elapsed {time() - start}")
print(metrics.classification_report(preds, y_test))
Elapsed 0.019229650497436523
Elapsed 0.001451253890991211
precision recall f1-score support
0 1.00 1.00 1.00 16
1 1.00 1.00 1.00 11
2 1.00 1.00 1.00 9
accuracy 1.00 36
macro avg 1.00 1.00 1.00 36
weighted avg 1.00 1.00 1.00 36
Comments powered by Talkyard.