In this post, I illustrate *classification using linear regression*, as implemented in Python/R package `nnetsauce`

, and more precisely, in `nnetsauce`

’s `MultitaskClassifier`

. If you’re not interested in reading about the model description, you can jump directly to the 2nd section, “Two examples in Python”. In addition, the source code is relatively self-explanatory.

# Model description

Chapter 4 of Elements of Statistical Learning (ESL), at section *4.2 Linear Regression of an Indicator Matrix*, describes *classification using linear regression* pretty well. Let \(K \in \mathbb{N}\) be the number of classes and \(y \in \mathbb{N}^n\) with values in \(\lbrace 1, \ldots, K \rbrace\) be the variable to be explained. An **indicator response** matrix \(\textbf{Y} \in \mathbb{N}^{n \times K }\), containing only 0’s and 1’s, can be obtained from \(y\). Each row of \(\textbf{Y}\) shall contain a single 1 – in the column corresponding to the class where the example belongs, and 0’s elsewhere.

Now, let \(\textbf{X} \in \mathbb{R}^{n \times p }\) be the set of explanatory variables for \(y\) and \(\textbf{Y}\), with examples in rows, and characteristics in columns. ESL applies \(K\) least squares models to \(\textbf{X}\), for each column of \(\textbf{Y}\). The regression’s predicted values can be interpreted as *raw* estimates of probabilities, because the least squares’ solution is a conditional expectation. And for \(G\), a random variable describing the class, we have:

The difference between `nnetsauce`

’s `MultitaskClassifier`

and the model described in ESL is:

- Any model possessing methods
`fit`

and`predict`

can be used in lieu of a linear regression of \(\textbf{Y}\) on \(\textbf{X}\) -
the set of covariates include the original covariates, \(\textbf{X}\),

**plus nonlinear transformations**of \(\textbf{X}\), \(h(\textbf{X})\), as done in Quasi-Randomized Networks. Having \(h(\textbf{X})\) as additional explanatory variables enhances the models’ flexibility; the**model is no longer linear**. - If for each \(k \in \lbrace 1, \ldots, K \rbrace\), \(\hat{f}_k(x)\) is the regression’s predicted value for class \(k\) and an observation characterized by \(x\),
`nnetsauce`

’s`MultitaskClassifier`

obtains*probabilities*that an observation characterized by \(x\) belongs to class \(k\) as:

Where we have \(expit := \frac{1}{1 + exp(-x)}\). \(x \mapsto expit(x)\) is strictly increasing, hence it preserves the ordering of *linear* regression’s predictions. \(x \mapsto expit(x)\) is also bounded in \([0, 1]\), which helps in avoiding overflows. I divide \(expit \left( \hat{f}_k(x) \right)\) by \(\sum_{i=1}^K expit \left( \hat{f}_k(x) \right)\), so that the *probabilities* add up to 1. And to finish, the class predicted for an example characterized by \(x\) is:

# Two examples in Python

Currently, installing `nnetsauce`

from Pypi doesn’t work – and I’m working on fixing it. However, you can install `nnetsauce`

from GitHub as follows:

```
pip install git+https://github.com/Techtonique/nnetsauce.git
```

Import the packages required for the 2 examples.

```
import nnetsauce as ns
import numpy as np
from sklearn.datasets import load_wine, load_iris
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from time import time
```

**1. Classification of iris dataset:**

```
dataset = load_iris()
Z = dataset.data
t = dataset.target
# training set (80%) and test set (20%)
X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2,
random_state=143)
# Linear Regression is used here
regr3 = LinearRegression()
# `n_hidden_features` makes the model nonlinear
# `n_clusters` takes into account heterogeneity
fit_obj3 = ns.MultitaskClassifier(regr3, n_hidden_features=5,
n_clusters=2, type_clust="gmm")
# Adjust the model
start = time()
fit_obj3.fit(X_train, y_train)
print(f"Elapsed {time() - start}")
# Classification report
start = time()
preds = fit_obj3.predict(X_test)
print(f"Elapsed {time() - start}")
print(metrics.classification_report(preds, y_test))
```

```
Elapsed 0.021012067794799805
Elapsed 0.0010943412780761719
precision recall f1-score support
0 1.00 1.00 1.00 12
1 1.00 1.00 1.00 5
2 1.00 1.00 1.00 13
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
```

**2. Classification of wine dataset:**

```
dataset = load_wine()
Z = dataset.data
t = dataset.target
# training set (80%) and test set (20%)
X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2,
random_state=143)
# Linear Regression is used here
regr4 = LinearRegression()
# `n_hidden_features` makes the model nonlinear
# `n_clusters` takes into account heterogeneity
fit_obj4 = ns.MultitaskClassifier(regr4, n_hidden_features=5,
n_clusters=2, type_clust="gmm")
# Adjust the model
start = time()
fit_obj4.fit(X_train, y_train)
print(f"Elapsed {time() - start}")
# Classification report
start = time()
preds = fit_obj4.predict(X_test)
print(f"Elapsed {time() - start}")
print(metrics.classification_report(preds, y_test))
```

```
Elapsed 0.019229650497436523
Elapsed 0.001451253890991211
precision recall f1-score support
0 1.00 1.00 1.00 16
1 1.00 1.00 1.00 11
2 1.00 1.00 1.00 9
accuracy 1.00 36
macro avg 1.00 1.00 1.00 36
weighted avg 1.00 1.00 1.00 36
```