mlsauce (home to a model-agnostic gradient boosting algorithm) can now be installed from PyPI.

Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb. 100 API requests are now (and forever) offered to every user every month, no matter the pricing tier.

Python package mlsauce (docs are here), which is home to a model-agnostic gradient boosting algorithm (described in https://www.researchgate.net/publication/386212136_Scalable_Gradient_Boosting_using_Randomized_Neural_Networks and https://www.researchgate.net/publication/346059361_LSBoost_gradient_boosted_penalized_nonlinear_least_squares), can now be installed from PyPI. This makes it easier for users to install and use the package without needing to clone the repository or manage dependencies manually.

The algorithm is designed to be model-agnostic, meaning it can work with various base learners, including decision trees, linear models, and neural networks. This flexibility allows users to choose the best base learner for their specific problem while still benefiting from the gradient boosting framework.

To install mlsauce, you can use the following command:

!pip install mlsauce --verbose

This will download and install the latest version of mlsauce along with its dependencies. Once installed, you can use it in your Python projects to implement model-agnostic gradient boosting with various base learners. Here is an example of how to use mlsauce with different base learners:

import mlsauce as ms
from sklearn.utils import all_estimators
from tqdm import tqdm 
from sklearn.metrics import balanced_accuracy_score, make_scorer
from sklearn.model_selection import train_test_split


estimators = all_estimators(type_filter='regressor')



from sklearn.datasets import load_digits, load_breast_cancer, load_wine, load_iris
from sklearn.tree import ExtraTreeRegressor
from sklearn.linear_model import Ridge
from time import time


results = []

datasets = [load_digits(return_X_y=True),
            load_breast_cancer(return_X_y=True),
            load_wine(return_X_y=True),
            load_iris(return_X_y=True)]

names = ["digits", "breast_cancer", "wine", "iris"]


balanced_accuracy_scorer = make_scorer(balanced_accuracy_score)


for i, dataset in tqdm(enumerate(datasets)):

  X, y = dataset
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
                                                      random_state=42, stratify=y)
  
  dataset_name = names[i]
  print(f"\n Dataset: {dataset_name} --------------------")

  for name, estimator in tqdm(estimators):

    try: 

      # Split dataset into training and testing sets
      model = ms.GenericBoostingClassifier(estimator(), 
                                           n_estimators=10, 
                                           learning_rate=0.1, 
                                            verbose=0)
      start = time()
      model.fit(X_train, y_train)
      y_pred_train = model.predict(X_train)
      y_pred = model.predict(X_test)
      elapsed = time() - start
      # Calculate balanced accuracy
      balanced_acc_train = balanced_accuracy_score(y_train, y_pred_train)
      balanced_acc_test = balanced_accuracy_score(y_test, y_pred)      
      res = [name, dataset_name, balanced_acc_train, balanced_acc_test, elapsed]
      print(f"\n Result for {name} base learner: {res}\n")
      results.append(res)

    except Exception as e:

      continue 



import numpy as np
import pandas as pd

results_df = pd.DataFrame(results, columns=["Base Learner", "Dataset", "Train Acc", "Test Acc", "Time"])

results_df['Pct Diff'] = np.log(1 + np.abs((results_df['Train Acc'] - results_df['Test Acc']) / results_df['Test Acc']))

results_df.sort_values(by=['Pct Diff'], ascending=True)

import matplotlib.pyplot as plt
# Order by increasing average testing accuracy
average_accuracies_ordered = average_accuracies.sort_values(by='Test Acc')

# Create the scatter plot for average training and testing accuracy, ordered by Test Acc
plt.figure(figsize=(12, 7))

# Plot average training accuracy
plt.scatter(average_accuracies_ordered['Base Learner'], average_accuracies_ordered['Train Acc'], label='Average Train Acc', marker='o', s=50)

# Plot average testing accuracy
plt.scatter(average_accuracies_ordered['Base Learner'], average_accuracies_ordered['Test Acc'], label='Average Test Acc', marker='o', s=50)

plt.title('Average Training and Testing Accuracy for Each Model (Ordered by Test Accuracy)')
plt.xlabel('Model (Ordered by Increasing Average Test Accuracy)')
plt.ylabel('Average Accuracy')
plt.xticks(rotation=90) # Rotate x-axis labels for better readability
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Comparison of average training and testing accuracy for each model can be visualized using a scatter plot. The plot will show the average training and testing accuracy for each model, ordered by increasing average testing accuracy. This allows for easy comparison of how well each model performs on both the training and testing datasets, and spots for potential overfitting.

image-title-here