27 Jan 2025 | RSS | Back to list of posts | Toggle dark mode | Hire me on Malt | Hire me on Fiverr | Hire me on Upwork
Today, give a try to Techtonique web app, a tool designed
to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning,
and Data Visualization.
Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb
I recently released the genbooster
Python package (usable from R), Gradient-Boosting and Boostrap aggregating implementations that use a Rust backend. Any base learner in the ensembles use randomized features as a form of feature engineering. The package was downloaded 3000 times in 5 days, so I guess it’s somehow useful.
The last version of my generic Gradient Boosting algorithm was implemented in Python package mlsauce (see #172, #169, #166, #165), but the package can be difficult to install on some systems. If you’re using Windows, for example, you may want to use the Windows Subsystem for Linux (WSL). Installing directly from PyPI is also nearly impossible, and it needs to be installed directly from GitHub.
This post is a quick overview of genbooster
, in Python and R. It was an occasion to “learn”/try the Rust programming language, and I’m happy with the result; a stable package that’s easy to install. However, I wasn’t blown away ( hey Rust guys ;)) by the speed, which is roughly equivalent to Cython (that is, using C under the hood).

Python version
2025_10_22_genbooster_randombag_rust_python
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 7.691629648208618
5%|▌ | 3/55 [00:08<02:22, 2.74s/it]
Time (inference): 0.5218164920806885
Accuracy 1.0
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:08<00:58, 1.19s/it]
Time (training): 0.23518085479736328
Time (inference): 0.05045819282531738
Accuracy 1.0
Estimator: ElasticNetCV
16%|█▋ | 9/55 [00:16<01:24, 1.84s/it]
Time (training): 7.839452505111694
Time (inference): 0.0176393985748291
Accuracy 0.9333333333333333
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [00:16<01:10, 1.57s/it]
Time (training): 0.28067755699157715
Time (inference): 0.04417252540588379
Accuracy 1.0
Estimator: SVR
100%|██████████| 55/55 [00:17<00:00, 3.10it/s]
Time (training): 0.8663649559020996
Time (inference): 0.1500096321105957
Accuracy 1.0
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
0 |
iris |
BaggingRegressor |
1.00 |
7.69 |
0.52 |
1 |
iris |
DecisionTreeRegressor |
1.00 |
0.24 |
0.05 |
3 |
iris |
ExtraTreeRegressor |
1.00 |
0.28 |
0.04 |
4 |
iris |
SVR |
1.00 |
0.87 |
0.15 |
2 |
iris |
ElasticNetCV |
0.93 |
7.84 |
0.02 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 49.687711000442505
5%|▌ | 3/55 [00:50<14:29, 16.72s/it]
Time (inference): 0.46684932708740234
Accuracy 0.9473684210526315
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:51<05:50, 7.16s/it]
Time (training): 1.357734203338623
Time (inference): 0.028873205184936523
Accuracy 0.9473684210526315
Estimator: ElasticNetCV
16%|█▋ | 9/55 [01:14<05:40, 7.40s/it]
Time (training): 23.06330966949463
Time (inference): 0.016884803771972656
Accuracy 0.9824561403508771
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [01:14<04:36, 6.14s/it]
Time (training): 0.28661274909973145
Time (inference): 0.03186511993408203
Accuracy 0.9736842105263158
Estimator: SVR
Time (training): 2.7746951580047607
100%|██████████| 55/55 [01:18<00:00, 1.42s/it]
Time (inference): 0.359896183013916
Accuracy 0.9824561403508771
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
2 |
breast_cancer |
ElasticNetCV |
0.98 |
23.06 |
0.02 |
4 |
breast_cancer |
SVR |
0.98 |
2.77 |
0.36 |
3 |
breast_cancer |
ExtraTreeRegressor |
0.97 |
0.29 |
0.03 |
0 |
breast_cancer |
BaggingRegressor |
0.95 |
49.69 |
0.47 |
1 |
breast_cancer |
DecisionTreeRegressor |
0.95 |
1.36 |
0.03 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 10.581303596496582
5%|▌ | 3/55 [00:11<03:12, 3.69s/it]
Time (inference): 0.49717187881469727
Accuracy 1.0
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:11<01:17, 1.59s/it]
Time (training): 0.3143930435180664
Time (inference): 0.032999515533447266
Accuracy 0.9722222222222222
Estimator: ElasticNetCV
16%|█▋ | 9/55 [00:21<01:53, 2.46s/it]
Time (training): 10.467571020126343
Time (inference): 0.02497076988220215
Accuracy 1.0
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [00:22<01:34, 2.10s/it]
Time (training): 0.36039304733276367
Time (inference): 0.06993508338928223
Accuracy 0.9722222222222222
Estimator: SVR
100%|██████████| 55/55 [00:23<00:00, 2.32it/s]
Time (training): 1.1567068099975586
Time (inference): 0.15232062339782715
Accuracy 0.7777777777777778
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
0 |
wine |
BaggingRegressor |
1.00 |
10.58 |
0.50 |
2 |
wine |
ElasticNetCV |
1.00 |
10.47 |
0.02 |
1 |
wine |
DecisionTreeRegressor |
0.97 |
0.31 |
0.03 |
3 |
wine |
ExtraTreeRegressor |
0.97 |
0.36 |
0.07 |
4 |
wine |
SVR |
0.78 |
1.16 |
0.15 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 528.3990564346313
5%|▌ | 3/55 [08:52<2:33:43, 177.37s/it]
Time (inference): 3.7134416103363037
Accuracy 0.9666666666666667
Estimator: DecisionTreeRegressor
Time (training): 12.664016008377075
11%|█ | 6/55 [09:05<1:01:42, 75.57s/it]
Time (inference): 0.2317643165588379
Accuracy 0.9222222222222223
Estimator: ElasticNetCV
16%|█▋ | 9/55 [12:42<56:52, 74.17s/it]
Time (training): 217.41770839691162
Time (inference): 0.12707233428955078
Accuracy 0.9472222222222222
Estimator: ExtraTreeRegressor
Time (training): 4.515539646148682
18%|█▊ | 10/55 [12:47<46:21, 61.80s/it]
Time (inference): 0.37054967880249023
Accuracy 0.975
Estimator: SVR
Time (training): 140.5957386493683
100%|██████████| 55/55 [15:25<00:00, 16.83s/it]
Time (inference): 17.804856538772583
Accuracy 0.9583333333333334
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
3 |
digits |
ExtraTreeRegressor |
0.97 |
4.52 |
0.37 |
0 |
digits |
BaggingRegressor |
0.97 |
528.40 |
3.71 |
4 |
digits |
SVR |
0.96 |
140.60 |
17.80 |
2 |
digits |
ElasticNetCV |
0.95 |
217.42 |
0.13 |
1 |
digits |
DecisionTreeRegressor |
0.92 |
12.66 |
0.23 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 5.794196128845215
5%|▌ | 3/55 [00:06<01:58, 2.27s/it]
Accuracy 1.0
Time (inference): 1.0278966426849365
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:07<00:50, 1.04s/it]
Time (training): 0.41347432136535645
Accuracy 1.0
Time (inference): 0.0855255126953125
Estimator: ElasticNetCV
16%|█▋ | 9/55 [00:29<03:02, 3.97s/it]
Time (training): 22.3568377494812
Accuracy 0.9
Time (inference): 0.04216742515563965
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [00:30<02:29, 3.32s/it]
Time (training): 0.24758148193359375
Accuracy 1.0
Time (inference): 0.07839632034301758
Estimator: SVR
100%|██████████| 55/55 [00:30<00:00, 1.80it/s]
Time (training): 0.4286081790924072
Accuracy 0.9666666666666667
Time (inference): 0.09523606300354004
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
0 |
iris |
BaggingRegressor |
1.00 |
5.79 |
1.03 |
1 |
iris |
DecisionTreeRegressor |
1.00 |
0.41 |
0.09 |
3 |
iris |
ExtraTreeRegressor |
1.00 |
0.25 |
0.08 |
4 |
iris |
SVR |
0.97 |
0.43 |
0.10 |
2 |
iris |
ElasticNetCV |
0.90 |
22.36 |
0.04 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 16.8834490776062
5%|▌ | 3/55 [00:17<05:07, 5.90s/it]
Accuracy 0.956140350877193
Time (inference): 0.8271636962890625
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:20<02:22, 2.91s/it]
Time (training): 2.403634548187256
Accuracy 0.9473684210526315
Time (inference): 0.04659414291381836
Estimator: ElasticNetCV
16%|█▋ | 9/55 [01:09<06:56, 9.06s/it]
Time (training): 49.08372521400452
Accuracy 0.9473684210526315
Time (inference): 0.05168628692626953
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [01:09<05:38, 7.52s/it]
Time (training): 0.3687095642089844
Accuracy 0.9649122807017544
Time (inference): 0.04958987236022949
Estimator: SVR
Time (training): 1.4680569171905518
100%|██████████| 55/55 [01:11<00:00, 1.30s/it]
Accuracy 0.9473684210526315
Time (inference): 0.522120475769043
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
3 |
breast_cancer |
ExtraTreeRegressor |
0.96 |
0.37 |
0.05 |
0 |
breast_cancer |
BaggingRegressor |
0.96 |
16.88 |
0.83 |
1 |
breast_cancer |
DecisionTreeRegressor |
0.95 |
2.40 |
0.05 |
2 |
breast_cancer |
ElasticNetCV |
0.95 |
49.08 |
0.05 |
4 |
breast_cancer |
SVR |
0.95 |
1.47 |
0.52 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 8.266825675964355
5%|▌ | 3/55 [00:09<02:38, 3.04s/it]
Accuracy 0.9722222222222222
Time (inference): 0.85321044921875
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:09<01:06, 1.36s/it]
Time (training): 0.46535372734069824
Accuracy 0.8888888888888888
Time (inference): 0.07617998123168945
Estimator: ElasticNetCV
16%|█▋ | 9/55 [00:37<03:49, 5.00s/it]
Time (training): 27.90892767906189
Accuracy 0.9722222222222222
Time (inference): 0.06512594223022461
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [00:38<03:07, 4.17s/it]
Time (training): 0.27864670753479004
Accuracy 1.0
Time (inference): 0.0849001407623291
Estimator: SVR
100%|██████████| 55/55 [00:38<00:00, 1.42it/s]
Time (training): 0.5329914093017578
Accuracy 0.7777777777777778
Time (inference): 0.12239527702331543
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
3 |
wine |
ExtraTreeRegressor |
1.00 |
0.28 |
0.08 |
0 |
wine |
BaggingRegressor |
0.97 |
8.27 |
0.85 |
2 |
wine |
ElasticNetCV |
0.97 |
27.91 |
0.07 |
1 |
wine |
DecisionTreeRegressor |
0.89 |
0.47 |
0.08 |
4 |
wine |
SVR |
0.78 |
0.53 |
0.12 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 138.1354341506958
5%|▌ | 3/55 [02:23<41:18, 47.67s/it]
Accuracy 0.95
Time (inference): 4.882359981536865
Estimator: DecisionTreeRegressor
Time (training): 19.426659107208252
11%|█ | 6/55 [02:42<19:12, 23.52s/it]
Accuracy 0.875
Time (inference): 0.39545631408691406
Estimator: ElasticNetCV
Time (training): 601.1737794876099
16%|█▋ | 9/55 [12:44<1:20:00, 104.36s/it]
Accuracy 0.9444444444444444
Time (inference): 0.49893951416015625
Estimator: ExtraTreeRegressor
Time (training): 6.681625127792358
18%|█▊ | 10/55 [12:51<1:05:14, 86.99s/it]
Accuracy 0.9333333333333333
Time (inference): 0.41587328910827637
Estimator: SVR
Time (training): 62.861069440841675
100%|██████████| 55/55 [14:12<00:00, 15.50s/it]
Accuracy 0.9277777777777778
Time (inference): 17.88122320175171
|
dataset |
estimator |
accuracy |
training_time |
inference_time |
0 |
digits |
BaggingRegressor |
0.95 |
138.14 |
4.88 |
2 |
digits |
ElasticNetCV |
0.94 |
601.17 |
0.50 |
3 |
digits |
ExtraTreeRegressor |
0.93 |
6.68 |
0.42 |
4 |
digits |
SVR |
0.93 |
62.86 |
17.88 |
1 |
digits |
DecisionTreeRegressor |
0.88 |
19.43 |
0.40 |
V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 \
0 0.01 18.00 2.31 0.00 0.54 6.58 65.20 4.09 1.00 296.00 15.30 396.90 4.98
1 0.03 0.00 7.07 0.00 0.47 6.42 78.90 4.97 2.00 242.00 17.80 396.90 9.14
2 0.03 0.00 7.07 0.00 0.47 7.18 61.10 4.97 2.00 242.00 17.80 392.83 4.03
3 0.03 0.00 2.18 0.00 0.46 7.00 45.80 6.06 3.00 222.00 18.70 394.63 2.94
4 0.07 0.00 2.18 0.00 0.46 7.15 54.20 6.06 3.00 222.00 18.70 396.90 5.33
target training_index
0 24.00 0
1 21.60 1
2 34.70 0
3 33.40 1
4 36.20 1
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 8.92781949043274
5%|▌ | 3/55 [00:09<02:39, 3.07s/it]
Time (inference): 0.2669663429260254
RMSE 2.858643903624637
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:09<01:07, 1.39s/it]
Time (training): 0.6041898727416992
Time (inference): 0.022806882858276367
RMSE 4.344879007810268
Estimator: ElasticNetCV
16%|█▋ | 9/55 [00:12<00:55, 1.21s/it]
Time (training): 2.9936535358428955
Time (inference): 0.008358955383300781
RMSE 4.224416765751234
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [00:13<00:47, 1.05s/it]
Time (training): 0.26199960708618164
Time (inference): 0.02275371551513672
RMSE 2.5375423322614834
Estimator: SVR
Time (training): 2.0531556606292725
100%|██████████| 55/55 [00:15<00:00, 3.53it/s]
Time (inference): 0.4060652256011963
RMSE 3.6491765040296373
|
estimator |
RMSE |
training_time |
inference_time |
3 |
ExtraTreeRegressor |
2.54 |
0.26 |
0.02 |
0 |
BaggingRegressor |
2.86 |
8.93 |
0.27 |
4 |
SVR |
3.65 |
2.05 |
0.41 |
2 |
ElasticNetCV |
4.22 |
2.99 |
0.01 |
1 |
DecisionTreeRegressor |
4.34 |
0.60 |
0.02 |
0%| | 0/55 [00:00<?, ?it/s]
Estimator: BaggingRegressor
Time (training): 9.518575429916382
5%|▌ | 3/55 [00:09<02:49, 3.26s/it]
Time (inference): 0.2690126895904541
RMSE 2.771679576927226
Estimator: DecisionTreeRegressor
11%|█ | 6/55 [00:10<01:11, 1.47s/it]
Time (training): 0.6097230911254883
Time (inference): 0.021549463272094727
RMSE 4.398489123615693
Estimator: ElasticNetCV
16%|█▋ | 9/55 [00:13<00:57, 1.26s/it]
Time (training): 2.9978723526000977
Time (inference): 0.00686335563659668
RMSE 4.224416765751234
Estimator: ExtraTreeRegressor
18%|█▊ | 10/55 [00:13<00:48, 1.08s/it]
Time (training): 0.25699806213378906
Time (inference): 0.02178335189819336
RMSE 2.655690158719138
Estimator: SVR
Time (training): 2.04705548286438
100%|██████████| 55/55 [00:16<00:00, 3.43it/s]
Time (inference): 0.27837133407592773
RMSE 3.6491765040296373
|
estimator |
RMSE |
training_time |
inference_time |
3 |
ExtraTreeRegressor |
2.66 |
0.26 |
0.02 |
0 |
BaggingRegressor |
2.77 |
9.52 |
0.27 |
4 |
SVR |
3.65 |
2.05 |
0.28 |
2 |
ElasticNetCV |
4.22 |
3.00 |
0.01 |
1 |
DecisionTreeRegressor |
4.40 |
0.61 |
0.02 |
R version
!mkdir -p ~/.virtualenvs
!python3 -m venv ~/.virtualenvs/r-reticulate
!source ~/.virtualenvs/r-reticulate/bin/activate
!pip install numpy pandas matplotlib scikit-learn tqdm genbooster
utils::install.packages("reticulate")
library(reticulate)
# Use a virtual environment or conda environment to manage Python dependencies
use_virtualenv("r-reticulate", required = TRUE)
# Import required Python libraries
np <- import("numpy")
pd <- import("pandas")
plt <- import("matplotlib.pyplot")
sklearn <- import("sklearn")
tqdm <- import("tqdm")
time <- import("time")
# Import specific modules from sklearn
BoosterRegressor <- import("genbooster.genboosterregressor", convert = FALSE)$BoosterRegressor
BoosterClassifier <- import("genbooster.genboosterclassifier", convert = FALSE)$BoosterClassifier
RandomBagRegressor <- import("genbooster.randombagregressor", convert = FALSE)$RandomBagRegressor
RandomBagClassifier <- import("genbooster.randombagclassifier", convert = FALSE)$RandomBagClassifier
ExtraTreeRegressor <- import("sklearn.tree")$ExtraTreeRegressor
mean_squared_error <- import("sklearn.metrics")$mean_squared_error
train_test_split <- import("sklearn.model_selection")$train_test_split
# Load Boston dataset
url <- "https://raw.githubusercontent.com/Techtonique/datasets/refs/heads/main/tabular/regression/boston_dataset2.csv"
df <- read.csv(url)
# Split dataset into features and target
y <- df[["target"]]
X <- df[colnames(df)[c(-which(colnames(df) == "target"), -which(colnames(df) == "training_index"))]]
# Split into training and testing sets
set.seed(123)
index_train <- sample.int(nrow(df), nrow(df) * 0.8, replace = FALSE)
X_train <- X[index_train, ]
X_test <- X[-index_train,]
y_train <- y[index_train]
y_test <- y[-index_train]
# BoosterRegressor on Boston dataset
regr <- BoosterRegressor(base_estimator = ExtraTreeRegressor())
start <- time$time()
regr$fit(X_train, y_train)
end <- time$time()
cat(sprintf("Time taken: %.2f seconds\n", end - start))
rmse <- np$sqrt(mean_squared_error(y_test, regr$predict(X_test)))
cat(sprintf("BoosterRegressor RMSE: %.2f\n", rmse))
# RandomBagRegressor on Boston dataset
regr <- RandomBagRegressor(base_estimator = ExtraTreeRegressor())
start <- time$time()
regr$fit(X_train, y_train)
end <- time$time()
cat(sprintf("Time taken: %.2f seconds\n", end - start))
rmse <- np$sqrt(mean_squared_error(y_test, regr$predict(X_test)))
cat(sprintf("RandomBagRegressor RMSE: %.2f\n", rmse))
X <- as.matrix(iris[, 1:4])
y <- as.numeric(iris[, 5]) - 1
# Split into training and testing sets
set.seed(123)
index_train <- sample.int(nrow(iris), nrow(iris) * 0.8, replace = FALSE)
X_train <- X[index_train, ]
X_test <- X[-index_train,]
y_train <- y[index_train]
y_test <- y[-index_train]
regr <- BoosterClassifier(base_estimator = ExtraTreeRegressor())
start <- time$time()
regr$fit(X_train, y_train)
end <- time$time()
cat(sprintf("Time taken: %.2f seconds\n", end - start))
accuracy <- mean(y_test == as.numeric(regr$predict(X_test)))
cat(sprintf("BoosterClassifier accuracy: %.2f\n", accuracy))
regr <- RandomBagClassifier(base_estimator = ExtraTreeRegressor())
start <- time$time()
regr$fit(X_train, y_train)
end <- time$time()
cat(sprintf("Time taken: %.2f seconds\n", end - start))
accuracy <- mean(y_test == as.numeric(regr$predict(X_test)))
cat(sprintf("RandomBagClassifier accuracy: %.2f\n", accuracy))
Time taken: 0.39 seconds
BoosterRegressor RMSE: 3.49
Time taken: 0.44 seconds
RandomBagRegressor RMSE: 4.06
Time taken: 0.28 seconds
BoosterClassifier accuracy: 0.97
Time taken: 0.37 seconds
RandomBagClassifier accuracy: 0.97
Comments powered by Talkyard.