Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb. 100 API requests are now (and forever) offered to every user every month, no matter the pricing tier.
Feature importance is a crucial aspect of machine learning model interpretation, helping us understand which features contribute most to a model’s predictions. In this blog post, we’ll explore two popular methods for computing feature importance using techtonique.net’s API:
-
Permutation Importance: This method measures how much a model’s performance decreases when a feature is randomly shuffled. It’s model-agnostic and provides a straightforward way to understand feature impact.
-
SHAP (SHapley Additive exPlanations) Values: Based on game theory, SHAP values provide a more detailed view of how each feature contributes to individual predictions, offering both global and local interpretability.
We’ll demonstrate how to use these methods with both R and Python, showing how to:
- Send requests to techtonique.net’s API
- Process and visualize the results
0 - Download data
!wget https://raw.githubusercontent.com/Techtonique/datasets/refs/heads/main/tabular/classification/breast_cancer_dataset2.csv
!wget https://raw.githubusercontent.com/Techtonique/datasets/refs/heads/main/tabular/regression/boston_dataset2.csv
1 - Load rpy2 extension
%load_ext rpy2.ipython
2 - Send requests
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@breast_cancer_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/mlclassification?base_model=RandomForestClassifier&n_hidden_features=5&interpretability=permutation" > res1.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 124k 100 4938 100 119k 927 23049 0:00:05 0:00:05 --:--:-- 26549
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@breast_cancer_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/gbdtclassification?model_type=xgboost&predict_proba=False&interpretability=permutation" > res2.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 122k 100 2751 100 119k 1130 50421 0:00:02 0:00:02 --:--:-- 51557
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language. This is a slow request, for generally-accepted APIs standards.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@breast_cancer_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/mlclassification?base_model=RandomForestClassifier&n_hidden_features=5&interpretability=shap" > res3.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 236k 100 117k 100 119k 2780 2845 0:00:43 0:00:43 --:--:-- 26830
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language. This is a slow request, for generally-accepted APIs standards.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@breast_cancer_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/gbdtclassification?model_type=xgboost&predict_proba=False&interpretability=shap" > res4.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 173k 100 55122 100 119k 5159 11487 0:00:10 0:00:10 --:--:-- 13849
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@boston_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/mlregression?base_model=RidgeCV&n_hidden_features=5&return_pi=True&interpretability=permutation" > res5.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 52785 100 11903 100 40882 8541 29337 0:00:01 0:00:01 --:--:-- 37893
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language. This is a slow request, for generally-accepted APIs standards.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@boston_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/mlregression?base_model=RidgeCV&n_hidden_features=5&return_pi=True&interpretability=shap" > res6.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 87409 100 46527 100 40882 4139 3637 0:00:11 0:00:11 --:--:-- 12810
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language. This is a slow request, for generally-accepted APIs standards.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjIxMDF9.2tAjOBPLTC-YuGeu8Iep-QRrOP8DSo7l1B89lU1eI3Q" \
-F "file=@boston_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/gbdtregression?model_type=xgboost&interpretability=shap" > res7.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 40912 100 30 100 40882 34 46655 --:--:-- --:--:-- --:--:-- 46649
Note that you can use https://curlconverter.com/ to translate the following request in your favorite programming language.
!curl -X POST \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI1NGY3ZDE3Ny05OWQ0LTQzNDktOTc1OC0zZTBkOGVkYWZkYWUiLCJlbWFpbCI6InRoaWVycnkubW91ZGlraS50ZWNodG9uaXF1ZUBnbWFpbC5jb20iLCJleHAiOjE3NDg4MjU5MTh9.ARqoz9PWZGILxgx8tcUJEc1h19TY-1odEqDkpCUxYmE" \
-F "file=@boston_dataset2.csv;type=text/csv" \
"https://www.techtonique.net/gbdtregression?model_type=xgboost&interpretability=permutation" > res8.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 47135 100 6253 100 40882 4012 26236 0:00:01 0:00:01 --:--:-- 30253
3 - Plot results
import json
import matplotlib.pyplot as plt
with open('res1.json', 'r') as f:
data = json.load(f)
for key, value in data.items():
print(f"{key}: {value}")
y_true: [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1]
y_pred: [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1]
0: {'precision': 0.9508196721311475, 'recall': 0.90625, 'f1-score': 0.928, 'support': 64.0}
1: {'precision': 0.9454545454545454, 'recall': 0.9719626168224299, 'f1-score': 0.9585253456221198, 'support': 107.0}
accuracy: 0.9473684210526315
macro avg: {'precision': 0.9481371087928465, 'recall': 0.939106308411215, 'f1-score': 0.9432626728110599, 'support': 171.0}
weighted avg: {'precision': 0.9474625460820457, 'recall': 0.9473684210526315, 'f1-score': 0.9471006548629639, 'support': 171.0}
proba: [[0.97, 0.03], [0.92, 0.08], [0.99, 0.01], [0.7, 0.3], [0.98, 0.02], [1.0, 0.0], [0.02, 0.98], [1.0, 0.0], [1.0, 0.0], [0.03, 0.97], [0.38, 0.62], [0.28, 0.72], [0.0, 1.0], [0.0, 1.0], [0.97, 0.03], [0.94, 0.06], [0.0, 1.0], [1.0, 0.0], [0.27, 0.73], [0.0, 1.0], [1.0, 0.0], [0.97, 0.03], [0.0, 1.0], [1.0, 0.0], [0.47, 0.53], [1.0, 0.0], [0.04, 0.96], [0.0, 1.0], [0.01, 0.99], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.97, 0.03], [0.03, 0.97], [0.97, 0.03], [0.5, 0.5], [0.92, 0.08], [0.0, 1.0], [0.0, 1.0], [0.98, 0.02], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.7, 0.3], [1.0, 0.0], [1.0, 0.0], [0.1, 0.9], [0.77, 0.23], [1.0, 0.0], [0.95, 0.05], [0.02, 0.98], [0.99, 0.01], [0.03, 0.97], [0.59, 0.41], [0.0, 1.0], [0.96, 0.04], [0.0, 1.0], [1.0, 0.0], [0.09, 0.91], [0.04, 0.96], [0.02, 0.98], [0.99, 0.01], [0.31, 0.69], [0.02, 0.98], [0.24, 0.76], [1.0, 0.0], [0.98, 0.02], [1.0, 0.0], [0.78, 0.22], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.22, 0.78], [0.01, 0.99], [0.08, 0.92], [0.19, 0.81], [0.01, 0.99], [0.01, 0.99], [0.0, 1.0], [0.12, 0.88], [0.01, 0.99], [0.0, 1.0], [0.02, 0.98], [0.01, 0.99], [0.0, 1.0], [0.0, 1.0], [0.94, 0.06], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [0.29, 0.71], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.98, 0.02], [0.0, 1.0], [0.01, 0.99], [0.03, 0.97], [0.01, 0.99], [0.0, 1.0], [0.69, 0.31], [1.0, 0.0], [0.98, 0.02], [0.44, 0.56], [0.2, 0.8], [0.0, 1.0], [0.02, 0.98], [0.01, 0.99], [0.0, 1.0], [0.08, 0.92], [0.42, 0.58], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.06, 0.94], [0.0, 1.0], [0.02, 0.98], [1.0, 0.0], [0.02, 0.98], [0.01, 0.99], [0.19, 0.81], [0.9, 0.1], [0.01, 0.99], [0.1, 0.9], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.03, 0.97], [0.08, 0.92], [0.0, 1.0], [0.98, 0.02], [0.04, 0.96], [0.51, 0.49], [0.0, 1.0], [1.0, 0.0], [0.01, 0.99], [0.1, 0.9], [0.11, 0.89], [0.99, 0.01], [0.86, 0.14], [0.18, 0.82], [0.01, 0.99], [0.03, 0.97], [0.35, 0.65], [0.98, 0.02], [0.09, 0.91], [1.0, 0.0], [0.12, 0.88], [0.01, 0.99], [0.02, 0.98], [0.15, 0.85], [0.02, 0.98], [0.14, 0.86], [0.01, 0.99], [1.0, 0.0], [0.42, 0.58], [0.14, 0.86], [0.0, 1.0], [0.44, 0.56], [0.17, 0.83], [0.01, 0.99], [0.01, 0.99], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.01, 0.99]]
interpretability: {'method': 'permutation', 'importances_mean': [-0.008771929824561441, 0.007602339181286511, -0.002923976608187162, -0.004678362573099448, -0.0052631578947368585, -0.009356725146198841, -0.004093567251462038, -0.007017543859649145, -0.0005847953216374324, -0.0035087719298245836, -0.0005847953216374435, 0.002339181286549685, -0.0005847953216374324, -0.0035087719298245944, -0.006432748538011734, -0.0005847953216374324, -0.0005847953216374324, -0.0046783625730994595, -0.0011695906432748649, 0.0, -0.0017543859649122972, 0.006432748538011668, -0.009356725146198864, -0.0040935672514620155, -0.006432748538011734, -0.0035087719298245944, -0.005263157894736891, -0.006432748538011734, 0.0, 0.0], 'importances_std': [0.005391546466253153, 0.0026798688274595806, 0.002923976608187162, 0.0035087719298245723, 0.004857674773636302, 0.003879093322053089, 0.0037445170979139527, 0.004376207469911037, 0.0017543859649122972, 0.0038790933220531126, 0.003149219185458781, 0.0028649002839569063, 0.0017543859649122972, 0.0028649002839569605, 0.00485767477363631, 0.0017543859649122972, 0.0017543859649122972, 0.0023391812865497298, 0.0023391812865497298, 0.0, 0.006944059700022174, 0.004093567251461992, 0.006512005102725163, 0.007420220783888606, 0.004857674773636309, 0.0028649002839569605, 0.0017543859649122972, 0.006105442402871663, 0.0, 0.0], 'feature_names': ['mean radius', 'mean texture', 'mean perimeter', 'mean area', 'mean smoothness', 'mean compactness', 'mean concavity', 'mean concave points', 'mean symmetry', 'mean fractal dimension', 'radius error', 'texture error', 'perimeter error', 'area error', 'smoothness error', 'compactness error', 'concavity error', 'concave points error', 'symmetry error', 'fractal dimension error', 'worst radius', 'worst texture', 'worst perimeter', 'worst area', 'worst smoothness', 'worst compactness', 'worst concavity', 'worst concave points', 'worst symmetry', 'worst fractal dimension']}
3 - 1 Permutation importance
%%R
library(jsonlite)
library(ggplot2)
# Load JSON data
res1 <- fromJSON("res1.json")
# Extract feature names, importances_mean, and importances_std
feature_names <- res1$interpretability$feature_names
importances_mean <- res1$interpretability$importances_mean
importances_std <- res1$interpretability$importances_std
# Create a data frame
df <- data.frame(
feature_names = factor(feature_names, levels = feature_names), # Maintain order
importances_mean = importances_mean,
importances_std = importances_std
)
# Create the bar plot
ggplot(df, aes(x = feature_names, y = importances_mean)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = importances_mean - importances_std, ymax = importances_mean + importances_std), width = 0.2) +
labs(title = "Feature importances via permutation on test set", y = "Mean accuracy decrease") +
theme_minimal()
import matplotlib.pyplot as plt
feature_names = data['interpretability']['feature_names']
importances_mean = data['interpretability']['importances_mean']
importances_std = data['interpretability']['importances_std']
fig, ax = plt.subplots()
ax.bar(feature_names, importances_mean, yerr=importances_std)
ax.set_title("Feature importances via permutation on test set")
ax.set_ylabel("Mean accuracy decrease")
fig.tight_layout()
plt.show()
3 - 2 SHAPley values
import matplotlib.pyplot as plt
import numpy as np
with open('res3.json', 'r') as f:
data = json.load(f)
# Assuming 'shap_values' and 'feature_names' are keys in the JSON response for SHAP
shap_values = data['interpretability']['values']
feature_names = data['interpretability']['feature_names']
print(shap_values[0])
abs_shap_values = [abs(s[1]) for s in shap_values[0]]
# Plotting using matplotlib
plt.figure(figsize=(10, 6))
plt.barh(feature_names, abs_shap_values)
plt.xlabel("Mean Absolute SHAP Value")
plt.title("Mean Absolute SHAP Values per Feature")
plt.gca().invert_yaxis() # To display the most important feature at the top
plt.tight_layout()
plt.show()
[[0.0467927752729715, -0.04679277527297147], [0.0, 0.0], [0.03065755727391224, -0.030657557273912207], [0.05967603575097714, -0.0596760357509771], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.039243510520076554, -0.039243510520076526], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.040102414661446345, -0.04010241466144643], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.12096136416860846, -0.12096136416860838], [0.0, 0.0], [0.10703338349800118, -0.10703338349800114], [0.11830883652988157, -0.11830883652988157], [0.0, 0.0], [0.0, 0.0], [0.004369056993017484, -0.004369056993017595], [0.04197566834618272, -0.04197566834618292], [0.0, 0.0], [0.0, 0.0]]
Comments powered by Talkyard.