Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb. 100 API requests are now (and forever) offered to every user every month, no matter the pricing tier.
Disclaimer: This post was written with the help of LLMs, based on:
- https://thierrymoudiki.github.io/blog/2022/07/21/r/misc/boosted-configuration-networks
- https://www.researchgate.net/publication/332291211_Forecasting_multivariate_time_series_with_boosted_configuration_networks
- https://docs.techtonique.net/bcn/articles/bcn-intro.html
- https://github.com/Techtonique/bcn_python
Potential remaining errors are mine.
What if you could have a model that:
- ✅ Captures non-linear patterns like neural networks
- ✅ Builds iteratively like gradient boosting
- ✅ Provides built-in interpretability through its additive structure
- ✅ Works well on regression, classitication, time series
That’s Boosted Configuration Networks (BCN).
Where BCN fits: BCN sits between Neural Additive Models (NAMs) and gradient boosting—combining neural flexibility with boosting’s greedy refinement. It’s particularly effective for:
- Medium-sized tabular datasets (100s to 10,000s of rows)
- Multivariate prediction tasks (multiple outputs that share structure)
- Problems requiring both accuracy and interpretability
- Time series forecasting with multiple related series
In this post, I’ll explain BCN’s intuition by walking through its hyperparameters. Each parameter reveals something fundamental about how the algorithm works.
The Core Idea: Building Smart Weak Learners
BCN asks a simple question at each iteration:
“What’s the best artificial neural network feature I can add right now to explain what I haven’t captured yet?”
Let’s break down this sentence:
1. artificial “Neural network feature”
At each iteration L, a BCN creates a simple single-layer feedforward neural network:
h_L = activation(w_L^T · x)
This is just: multiply features by weights, then apply an activation function (tanh or sigmoid; bounded).
2. Best
BCN finds weights w_L that maximize how much this feature explains the residuals.
Specifically, it finds the artificial neural network whose output has the largest regression coefficient when predicting the residuals. This is captured in the ξ (xi) criterion:
ξ = ν(2-ν)·β²_L - penalty
where β_L is the least-squares coefficient from regressing residuals on h_L.
3. “What I haven’t captured yet”
Like all boosting methods, BCN works on residuals - the gap between current predictions and truth. Each iteration “carves away” at the error.
4. “Add”
Once we find the best h_L, we add it to our ensemble:
new prediction = old prediction + ν · β_L · h_L
Visual mental model: Imagine starting with the mean prediction (flat surface). Each iteration adds a “bump” (artificial neural network feature) where the residuals are largest, gradually sculpting a complex prediction surface.
Now let’s see how the hyperparameters control this process.
Hyperparameter Priority: The Big Three
Before diving deep, here’s how parameters rank by impact:
Tier 1 - Critical (tune these first):
B(iterations): Model complexitynu(learning rate): Step size and stabilitylam(weight bounds): Feature complexity
Tier 2 - Regularization (tune for robustness):
r(convergence rate, most of the times 0.99, 0.999, 0.9999, etc.): Adaptive quality controlcol_sample(feature sampling): Regularization via randomness
Tier 3 - Technical (usually keep defaults):
type_optim(optimizer): Computational trade-offsactivation(nonlinearity): Usually tanh because boundedhidden_layer_bias: Usually TRUEtol(tolerance): Early stopping
Hyperparameter 1: B (Number of Iterations)
Default: No universal default (typically 100-500)
What it controls: How many weak learners to train
Intuition: BCN builds your model piece by piece. Each iteration adds one artificial neural network feature that explains some of what you haven’t captured.
Trade-offs:
- Small B (10-50):
- ✅ Fast training
- ✅ Less risk of overfitting
- ❌ May underfit complex relationships
- Large B (100-1000):
- ✅ Can capture subtle patterns
- ✅ Better accuracy on complex tasks
- ❌ Slower training
- ❌ Risk of overfitting without other regularization
Rule of thumb: Start with B=100. If using early stopping (tol > 0), set B high (500-1000) and let the algorithm stop when improvement plateaus.
What’s happening internally:
Each iteration finds weights w_L that maximize:
ξ_L = ν(2-ν) · β²_L - penalty
where β²_L measures how much the neural network feature correlates with residuals.
Hyperparameter 2: nu (Learning Rate)
Default: 0.1 (conservative)
Typical range: 0.1-0.8
Sweet spot: 0.3-0.5
What it controls: How aggressively to use each weak learner
Intuition: Even if you find a great neural network feature, you might not want to use it at full strength. The learning rate controls the step size.
When BCN finds a good feature h_L with coefficient β_L, it updates predictions by:
prediction += nu · β_L · h_L
Trade-offs:
- Small ν (0.1-0.3):
- ✅ More stable training
- ✅ Better generalization (smooths out noise)
- ✅ Less sensitive to individual weak learners
- ❌ Need more iterations (larger B)
- ❌ Slower convergence
- Large ν (0.5-1.0):
- ✅ Faster convergence
- ✅ Fewer iterations needed
- ❌ Risk of overfitting
- ❌ Can be unstable
Why ν(2-ν) appears in the math:
This factor arises when we want to prove the convergence of residuals’ L2-norm towards 0. It’s maximized at ν=1 (full gradient step):
f(ν) = 2ν - ν²
f'(ν) = 2 - 2ν = 0 ⟹ ν = 1
This ensures stability for ν ∈ (0,2) and explains why ν=1 is the “natural” full step.
Think of it like:
- ν=0.1: “I trust each feature a little, build slowly” (like learning rate 0.01 in SGD)
- ν=0.5: “I trust each feature moderately, build steadily”
- ν=1.0: “I trust each feature fully, build quickly” (can be unstable)
Hyperparameter 3: lam (λ - Weight Bounds)
Default: 0.1
Typical range: 0.1-100 (often on log scale: 10^0 to 10^2)
Sweet spot: 10^(0.5 to 1.0) ≈ 3-10
What it controls: How large the neural network weights can be
Intuition: This constrains the weights w_L at each iteration to the range [-λ, λ]. It’s a form of regularization through box constraints.
# Tight constraints: simpler features
fit_simple <- bcn(x, y, lam = 0.5)
# Loose constraints: more complex features
fit_complex <- bcn(x, y, lam = 10.0)
Why this matters:
Small λ (0.1-1.0):
- Neural network features are “gentle” (bounded outputs)
- Less risk of overfitting
- May miss complex interactions
- ✅ Use for: Small datasets, high interpretability needs
Large λ (5-100):
- Neural network features can be more “extreme”
- Can capture stronger non-linearities
- Risk of overfitting if not balanced with other regularization
- ✅ Use for: Complex patterns, large datasets
What’s happening mathematically:
At each iteration, we solve:
maximize ξ(w_L)
subject to: -λ ≤ w_L,j ≤ λ for all features j
This is a constrained optimization - we’re finding the best weights within a box.
Think of it like:
- Small λ: “Keep the weak learners simple” (like L∞ regularization)
- Large λ: “Allow complex weak learners”
Note on consistency: In the code, this parameter is lam (avoiding the Greek letter for R compatibility).
Hyperparameter 4: r (Convergence Rate)
Default: 0.3
Typical range: 0.3-0.99
Sweet spot: 0.9-0.99999
What it controls: How the acceptance threshold changes over iterations
Intuition: This is the most subtle hyperparameter. It controls how picky BCN is about accepting new weak learners, and this pickiness decreases as training progresses. Think of r as the “patience” or “quality control” officer: high r means “Only the best features get through the door early on.”
The acceptance criterion:
BCN only accepts a weak learner if:
ξ_L = ν(2-ν)·β²_L - [1 - r - (1-r)/(L+1)]·||residuals||² ≥ 0
The penalty term [1 - r - (1-r)/(L+1)] decreases as L increases:
| Iteration L | r = 0.95 | r = 0.70 | r = 0.50 |
|---|---|---|---|
| L = 1 | 0.075 | 0.45 | 0.75 |
| L = 10 | 0.055 | 0.33 | 0.55 |
| L = 100 | 0.050 | 0.30 | 0.50 |
| L → ∞ | 0.050 | 0.30 | 0.50 |
Interpretation: The penalty starts higher and converges to (1-r) as training progresses.
Trade-offs:
Large r (0.9-0.99):
- Early iterations: very picky (high penalty)
- Later iterations: more permissive
- ✅ Prevents premature commitment to poor features
- ✅ Allows fine-tuning in later iterations
- ✅ Better generalization
- ✅ Use for: Production models, complex tasks
Small r (0.3-0.7):
- Less selective throughout training
- ✅ Accepts more weak learners
- ✅ Faster initial progress
- ❌ May accept noisy features early
- ✅ Use for: Quick prototyping, exploratory work
The dynamic threshold:
Rearranging the acceptance criterion:
Required R² > [1 - r - (1-r)/(L+1)] / [ν(2-ν)]
This creates an adaptive selection criterion that evolves during training.
Think of it like:
- High r: “Be very careful early on (we have lots of iterations left), but allow refinements later”
- Low r: “Accept good-enough features throughout training”
Hyperparameter 5: col_sample (Feature Sampling)
Default: 1.0 (no sampling)
Typical range: 0.3-1.0
Sweet spot: 0.5-0.7 for high-dimensional data
What it controls: What fraction of features to consider at each iteration
Intuition: Like Random Forests, BCN can use only a random subset of features at each iteration. This reduces overfitting, adds diversity, and speeds up computation.
# Use all features (no sampling)
fit_full <- bcn(x, y, col_sample = 1.0)
# Use 50% of features at each iteration
fit_sampled <- bcn(x, y, col_sample = 0.5)
How it works:
At iteration L, randomly sample col_sample × d features and optimize only over those:
w_L ∈ R^d_reduced (instead of R^d)
Different features are sampled at each iteration, creating diversity like Random Forests but for neural network features.
Trade-offs:
col_sample = 1.0 (no sampling):
- ✅ Can use all information
- ✅ Potentially better accuracy
- ❌ Slower training (larger optimization)
- ❌ Higher overfitting risk
- ✅ Use for: Small datasets (N < 1000), few features (d < 50)
col_sample = 0.3-0.7:
- ✅ Faster training (smaller optimization)
- ✅ Regularization effect (like Random Forests)
- ✅ More diverse weak learners
- ❌ May miss important feature combinations
- ✅ Use for: Large datasets, many features (d > 100)
Interaction with B: Column sampling as implicit regularization means you may need more iterations:
Hyperparameter 6: activation (Activation Function)
Default: “tanh”
Options: “tanh”, “sigmoid”
What it controls: The non-linearity in each weak learner
Intuition: This determines the shape of transformations each neural network can create.
Characteristics:
tanh (hyperbolic tangent):
tanh(z) = (e^z - e^(-z)) / (e^z + e^(-z))
- Range: [-1, 1]
- Symmetric around 0
- Gradient: 1 - tanh²(z)
- Good for: Most tasks, especially when features are centered
- ✅ Recommended default
sigmoid:
sigmoid(z) = 1 / (1 + e^(-z))
- Range: [0, 1]
- Asymmetric
- Gradient: sigmoid(z) · (1 - sigmoid(z))
- Good for: When outputs are probabilities or rates
Why bounded activations?
BCN requires bounded activations for theoretical guarantees and stability of the ξ criterion. Unbounded activations like ReLU are not recommended because:
- Theoretical issues: The ξ optimization assumes bounded activation outputs
- Stability: Unbounded outputs can destabilize the ensemble
- While ReLU could theoretically work with very tight weight constraints (small λ), tanh/sigmoid provide stronger guarantees
Rule of thumb: Use tanh as default. It’s more balanced, bounded and zero-centered.
Hyperparameter 7: tol (Early Stopping Tolerance)
Default: 0 (no early stopping)
Typical range: 1e-7 to 1e-3
Recommended: 1e-7 for most tasks
What it controls: When to stop training before reaching B iterations
Intuition: If the model stops improving (residual norm isn’t decreasing much), stop early to avoid overfitting and save computation.
How it works (corrected): BCN tracks the relative improvement in residual norm and stops if progress is too slow:
if (relative_decrease_in_residuals < tol):
stop training
Important clarification: Early stopping is based on improvement rate, not absolute residual magnitude. This means BCN can stop even when residuals are still large (on a hard problem) if adding more weak learners doesn’t help.
Trade-offs:
tol = 0 (no early stopping):
- Always trains for exactly B iterations
- May overfit if B is too large
- ✅ Use for: Quick experiments with small B
tol = 1e-7 to 1e-5:
- Stops when improvement becomes negligible
- Prevents overfitting
- Can save significant computation
- ✅ Use for: Production models with large B
Practical tip: Set B large (e.g., 500-1000) and tol small (e.g., 1e-7) to let the algorithm decide when to stop. The actual number of iterations used will be stored in fit$maxL.
Hyperparameter 8: type_optim (Optimization Method)
Gradient-based.
Default: “nlminb”
Options: “nlminb”, “adam”, “sgd”, “nmkb”, “hjkb”, “randomsearch”
What it controls: How to solve the optimization problem at each iteration
Intuition: Finding the best weights w_L is a non-convex optimization problem. Different solvers have different trade-offs.
Available optimizers:
nlminb (default):
- Uses gradient and Hessian approximations
- ✅ Robust
- ✅ Well-tested in R
- ✅ Works well in most cases
- ⚠️ Medium speed
- ✅ Use for: General purpose, production
adam / sgd:
- Gradient-based optimizers from deep learning
- ✅ Fast, especially for high-dimensional problems
- ✅ Good for large d (many features)
- ⚠️ May need tuning (learning rate, iterations)
- ✅ Use for: d > 100, speed-critical applications
nmkb / hjkb:
- Derivative-free Nelder-Mead / Hooke-Jeeves
- ✅ Very robust (no gradient needed)
- ❌ Slow
- ✅ Use when: Other optimizers fail or diverge
randomsearch:
- Random sampling + local search
- ✅ Can escape local minima
- ❌ Slower
- ✅ Use when: Problem is very non-convex
Rule of thumb:
- Start with
"nlminb" - If training is slow and d > 100, try
"adam" - Can pass additional arguments via
...(e.g., max iterations, tolerance)
Important insight: Because BCN uses an ensemble, local optima are OK! Even if we don’t find the globally optimal w_L, the next iteration can compensate. This is why BCN is robust despite non-convex optimization at each step.
Hyperparameter 9: hidden_layer_bias (Include Bias Term)
Default: TRUE
Options: TRUE, FALSE
What it controls: Whether neural networks have a bias/intercept term
Intuition: Without bias, h_L = activation(w^T x). With bias, h_L = activation(w^T x + b).
Trade-offs:
hidden_layer_bias = FALSE:
- Simpler optimization (one less parameter per iteration)
- Faster training
- Assumes data is centered
- ✅ Use when: Features are already centered, want pure multiplicative effects
hidden_layer_bias = TRUE:
- More expressive (can handle shifts)
- Can handle non-centered data better
- One additional parameter to optimize per iteration
- ✅ Recommended default - safer choice
Typical choice: Use TRUE unless you have a specific reason not to (e.g., theoretical interest in purely multiplicative models).
Hyperparameter 10: n_clusters (Optional Clustering Features)
Default: NULL (no clustering)
Typical range: 2-10
What it controls: Whether to add cluster membership features
Intuition: BCN can automatically perform k-means clustering on your inputs and add cluster memberships as additional features. This can help capture local patterns.
When to use:
- ✅ Data has natural groupings or modes
- ✅ Local patterns differ across regions of feature space
- ❌ Not needed for most standard regression/classification
Note: This is an advanced feature - start without it and add only if needed.
Putting It All Together: Hyperparameter Recipes
Recipe 1: Fast Prototyping (Small Dataset, N < 1000)
fit <- bcn(
x = X_train,
y = y_train,
B = 50, # Few iterations for speed
nu = 0.5, # Moderate learning rate
col_sample = 1.0, # Use all features (dataset is small)
lam = 10^0.5, # ~3.16, moderate regularization
r = 0.9, # Adaptive threshold
tol = 1e-5, # Early stopping
activation = "tanh",
type_optim = "nlminb",
hidden_layer_bias = TRUE
)
Why these choices:
- Small B for speed
- High nu for faster convergence
- No column sampling (dataset is small)
- Standard other parameters
Expected performance: Quick baseline in minutes
Recipe 2: Production Model (Medium Dataset, N ~ 10,000)
fit <- bcn(
x = X_train,
y = y_train,
B = 200, # Enough iterations with early stopping
nu = 0.3, # Conservative for stability
col_sample = 0.6, # Some regularization
lam = 10^0.8, # ~6.31, allow some complexity
r = 0.95, # Very selective early on
tol = 1e-7, # Train until converged
activation = "tanh",
type_optim = "nlminb",
hidden_layer_bias = TRUE
)
Why these choices:
- Moderate B with early stopping safety
- Conservative nu for stability
- Column sampling for regularization
- High r for careful feature selection
Expected performance: Robust model, may train 100-150 iterations before stopping
Recipe 3: Complex Task (Large Dataset, High-Dimensional)
fit <- bcn(
x = X_train,
y = y_train,
B = 500, # Many iterations (will stop early if needed)
nu = 0.4, # Balanced
col_sample = 0.5, # Strong regularization for high d
lam = 10^1.0, # 10, higher complexity allowed
r = 0.95, # Adaptive
tol = 1e-7, # Early stopping safety
activation = "tanh",
type_optim = "adam", # Fast optimizer for high d
hidden_layer_bias = TRUE
)
Why these choices:
- Large B to capture complexity
- Column sampling crucial for high dimensions (d > 100)
- Adam optimizer for speed with many features
- High r to prevent early overfitting
Expected performance: May use 200-400 iterations, handles d > 500 well
Recipe 4: Multivariate Time Series / Multi-Output
fit <- bcn(
x = X_train,
y = Y_train, # Matrix with multiple outputs (e.g., N x m)
B = 300,
nu = 0.5, # Can be higher for shared structure
col_sample = 0.7,
lam = 10^0.7,
r = 0.95, # Critical: enforces shared structure
tol = 1e-7,
activation = "tanh",
type_optim = "nlminb"
)
Why these choices:
- High r is critical: In multivariate mode, BCN computes ξ_k for each output k and requires min_k(ξ_k) ≥ 0 for acceptance. This ensures each weak learner contributes meaningfully across all time series/outputs, creating shared representations.
- Higher nu because shared structure is more stable
- Standard B with early stopping
Note on multivariate: BCN handles multiple outputs naturally through one-hot encoding (classification) or matrix targets (regression). The min(ξ) criterion prevents sacrificing one output to improve another.
Expected performance: Strong on related time series or multi-task learning
Understanding Hyperparameter Interactions
Interaction 1: nu × B ≈ Constant
Trade-off: Small nu needs large B
# Approximately equivalent final predictions:
fit1 <- bcn(B = 100, nu = 0.5)
fit2 <- bcn(B = 200, nu = 0.25)
Why: Smaller steps need more iterations to reach similar places.
Rule: For similar model complexity, nu × B ≈ constant (approximately).
In practice:
- Production (stability priority): nu = 0.3, B = 300
- Prototyping (speed priority): nu = 0.5, B = 100
Interaction 2: lam ↔ r (Complexity Control)
Both control complexity:
lam: How complex each weak learner can ber: How selective we are about accepting weak learners
# More regularization
fit_reg <- bcn(lam = 1.0, r = 0.95) # Simple features, selective
# Less regularization
fit_complex <- bcn(lam = 10.0, r = 0.7) # Complex features, permissive
Balance principle: If you allow complex features (high lam), be selective (high r) to avoid noise.
Typical combinations:
- High quality: lam = 10, r = 0.95 → Complex but carefully selected features
- Moderate: lam = 5, r = 0.90 → Balanced
- Fast/loose: lam = 3, r = 0.80 → Simple features, permissive
Interaction 3: col_sample ↔ B (Coverage)
Column sampling as implicit regularization:
# Fewer features per iteration → need more iterations for coverage
fit1 <- bcn(col_sample = 1.0, B = 100)
fit2 <- bcn(col_sample = 0.5, B = 200)
Rough guideline:
B_needed ≈ B_baseline / col_sample
In practice:
- col_sample = 1.0 → B = 100-200
- col_sample = 0.5 → B = 200-400
- col_sample = 0.3 → B = 300-500
The Mathematical Connection: How Hyperparameters Appear in ξ
The core optimization criterion ties everything together:
ξ_L = ν(2-ν) · β²_L - [1 - r - (1-r)/(L+1)] · ||residuals||²
└─┬──┘ └┬─┘ └──────────┬──────────┘
nu optimized over r
w ∈ [-lam, lam]
Reading the formula:
- Find w_L (constrained by
lam) that maximizes β²_L-
β_L is the OLS coefficient: β_L = (h_L^T · residuals) / h_L ²
-
- Scale by ν(2-ν) (controlled by
nu) - Subtract penalty (controlled by
r) - Accept only if ξ ≥ 0 for all outputs
- Repeat for
Biterations (or untiltolreached) - At each step, sample
col_samplefraction of features
This unified view shows how all hyperparameters work together to control the greedy feature selection process.
Practical Tips for Hyperparameter Tuning
Start Simple, Add Complexity
- Begin with defaults:
fit <- bcn(x, y, B = 100, nu = 0.3, lam = 10^0.7, r = 0.9) - If underfitting (train error too high):
- ↑ Increase B (more capacity)
- ↑ Increase lam (allow more complex features)
- ↑ Increase nu (use features more aggressively)
- ↓ Decrease r (be less selective)
- If overfitting (train « test error):
- ↓ Decrease nu (smaller, more careful steps)
- ↓ Decrease lam (simpler features)
- Add column sampling (col_sample = 0.5-0.7)
- ↑ Increase r (be more selective)
- Use early stopping (tol = 1e-7)
Use Cross-Validation Wisely
Most important to tune: B, nu, lam
Moderately important: r, col_sample
Usually fixed: hidden_layer_bias = TRUE, type_optim = "nlminb", activation = "tanh"
Example CV strategy:
library(caret)
# Grid search on log-scale for lam
grid <- expand.grid(
B = c(100, 200, 500),
nu = c(0.1, 0.3, 0.5),
lam = 10^seq(0, 1.5, by = 0.5) # 1, 3.16, 10, 31.6
)
# Use caret, mlr3, or tidymodels for CV
Monitor Training
# Enable verbose output
fit <- bcn(x, y, verbose = 1, show_progress = TRUE)
Watch for:
-
How fast residuals _F decreases (convergence rate) - Whether ξ stays positive (quality of weak learners)
- If training stops early and at what iteration (capacity needs)
Diagnostic patterns:
- Residuals plateau early → Increase B or lam
- ξ often negative → Decrease r or increase lam
- Training very slow → Try adam optimizer or increase col_sample
Quick Reference: Hyperparameter Cheat Sheet
| Hyperparameter | Low Value Effect | High Value Effect | Typical Range | Default |
|---|---|---|---|---|
| B | Simple, fast, may underfit | Complex, slow, may overfit | 50-1000 | 100 |
| nu | Stable, slow convergence | Fast, potentially unstable | 0.1-0.8 | 0.1 |
| lam | Linear-ish, simple features | Nonlinear, complex features | 1-100 | 0.1 |
| r | Permissive, accepts more | Selective, high quality | 0.3-0.99 | 0.3 |
| col_sample | No regularization | Strong regularization | 0.3-1.0 | 1.0 |
| tol | No early stop | Aggressive early stop | 0-1e-3 | 0 |
| activation | tanh (symmetric) | sigmoid (asymmetric) | - | tanh |
| type_optim | nlminb (robust) | adam (fast) | - | nlminb |
| hidden_layer_bias | Simpler, through origin | More flexible | - | TRUE |
When NOT to Use BCN
While BCN is versatile, it’s not always the best choice:
❌ Ultra-high-dimensional sparse data (d > 10,000)
- Tree-based boosting (XGBoost/LightGBM) may be faster
- Column sampling helps, but trees handle sparsity natively
❌ Very large datasets (N > 1,000,000)
- Training time scales roughly O(B × N × d)
- Consider subsampling or streaming methods
❌ Deep sequential/temporal structure
- BCN is static (no recurrence)
- Use RNNs/Transformers for complex time dependencies
❌ Image/text/audio from scratch
- Convolutional/attention architectures more suitable
- BCN works on extracted features (embeddings, tabular)
✅ BCN shines at:
- Tabular data (100s to 10,000s of rows)
- Multivariate prediction (shared structure across outputs)
- Needing both accuracy AND interpretability
- Time series with extracted features
- When XGBoost works but you want gradient-based explanations
Debugging BCN Training
| Symptom | Likely Cause | Fix |
|---|---|---|
| ξ frequently negative early | r too high or lam too low | Decrease r to 0.8 or increase lam to 5-10 |
| Residuals plateau quickly | nu too small or B too low | Increase nu to 0.4-0.5 or B to 300+ |
| Training very slow | col_sample=1 on wide data | Set col_sample=0.5 and try type_optim=”adam” |
| High train accuracy, poor test | Overfitting | Decrease nu, increase r, add col_sample < 1 |
| Poor train accuracy | Underfitting | Increase B, increase lam, try different activation |
| Optimizer not converging | Bad initialization or scaling | Check feature scaling, try different type_optim |
Interpretability Example
One of BCN’s unique advantages is gradient-based interpretability.
What makes this special:
- ✅ Exact analytic gradients (no approximation)
- ✅ Same O(B × m × d) cost as prediction
- ✅ Shows direction of influence (positive/negative)
- ✅ Works for both regression and classification
- ✅ Much faster than SHAP on tree ensembles
Conclusion: The Philosophy of BCN
BCN’s hyperparameters reveal its design philosophy:
1. Iterative Refinement (via B)
Build the model piece by piece, adding one well-chosen feature at a time.
2. Conservative Steps (via nu)
Don’t trust any single feature too much - combine many weak learners.
3. Bounded Complexity (via lam)
Keep individual weak learners simple to ensure stability and interpretability.
4. Adaptive Selection (via r)
Start picky (prevent early mistakes), become permissive (allow refinement).
5. Randomization (via col_sample)
Like Random Forests, diversity through randomness helps generalization.
6. Early Stopping (via tol)
Know when to stop - more iterations aren’t always better.
7. Explicit Optimization for Interpretability
Unlike methods that require post-hoc explanations, BCN is designed with interpretability in mind through its additive structure and differentiable components.
Together, these create a model that’s:
- ✅ Expressive (neural network features capture non-linearity)
- ✅ Interpretable (additive structure + gradients)
- ✅ Robust (ensemble of bounded weak learners)
- ✅ Efficient (sparse structure, early stopping, column sampling)
Next Steps
To learn more:
To contribute: BCN is open source! Contributions welcome for:
- New activation functions
- Additional optimization methods
- Interpretability visualizations
- Benchmark studies and applications
For attribution, please cite this work as:
T. Moudiki (2026-02-16). Understanding Boosted Configuration Networks (combined neural networks and boosting): An Intuitive Guide Through Their Hyperparameters. Retrieved from https://thierrymoudiki.github.io/blog/2026/02/16/r/python/bcn-explained
BibTeX citation (remove empty spaces)
@misc{ tmoudiki20260216,
author = { T. Moudiki },
title = { Understanding Boosted Configuration Networks (combined neural networks and boosting): An Intuitive Guide Through Their Hyperparameters },
url = { https://thierrymoudiki.github.io/blog/2026/02/16/r/python/bcn-explained },
year = { 2026 } }
Previous publications
- Understanding Boosted Configuration Networks (combined neural networks and boosting): An Intuitive Guide Through Their Hyperparameters Feb 16, 2026
- R version of Python package survivalist, for model-agnostic survival analysis Feb 9, 2026
- Presenting Lightweight Transfer Learning for Financial Forecasting (Risk 2026) Feb 4, 2026
- Option pricing using time series models as market price of risk Feb 1, 2026
- Enhancing Time Series Forecasting (ahead::ridge2f) with Attention-Based Context Vectors (ahead::contextridge2f) Jan 31, 2026
- Overfitting and scaling (on GPU T4) tests on nnetsauce.CustomRegressor Jan 29, 2026
- Beyond Cross-validation: Hyperparameter Optimization via Generalization Gap Modeling Jan 25, 2026
- GPopt for Machine Learning (hyperparameters' tuning) Jan 21, 2026
- rtopy: an R to Python bridge -- novelties Jan 8, 2026
- Python examples for 'Beyond Nelson-Siegel and splines: A model- agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation' Jan 3, 2026
- Forecasting benchmark: Dynrmf (a new serious competitor in town) vs Theta Method on M-Competitions and Tourism competitition Jan 1, 2026
- Finally figured out a way to port python packages to R using uv and reticulate: example with nnetsauce Dec 17, 2025
- Overfitting Random Fourier Features: Universal Approximation Property Dec 13, 2025
- Counterfactual Scenario Analysis with ahead::ridge2f Dec 11, 2025
- Zero-Shot Probabilistic Time Series Forecasting with TabPFN 2.5 and nnetsauce Dec 10, 2025
- ARIMA Pricing: Semi-Parametric Market price of risk for Risk-Neutral Pricing (code + preprint) Dec 7, 2025
- Analyzing Paper Reviews with LLMs: I Used ChatGPT, DeepSeek, Qwen, Mistral, Gemini, and Claude (and you should too + publish the analysis) Dec 3, 2025
- tisthemachinelearner: New Workflow with uv for R Integration of scikit-learn Dec 1, 2025
- (ICYMI) RPweave: Unified R + Python + LaTeX System using uv Nov 21, 2025
- unifiedml: A Unified Machine Learning Interface for R, is now on CRAN + Discussion about AI replacing humans Nov 16, 2025
- Context-aware Theta forecasting Method: Extending Classical Time Series Forecasting with Machine Learning Nov 13, 2025
- unifiedml in R: A Unified Machine Learning Interface Nov 5, 2025
- Deterministic Shift Adjustment in Arbitrage-Free Pricing (historical to risk-neutral short rates) Oct 28, 2025
- New instantaneous short rates models with their deterministic shift adjustment, for historical and risk-neutral simulation Oct 27, 2025
- RPweave: Unified R + Python + LaTeX System using uv Oct 19, 2025
- GAN-like Synthetic Data Generation Examples (on univariate, multivariate distributions, digits recognition, Fashion-MNIST, stock returns, and Olivetti faces) with DistroSimulator Oct 19, 2025
- Part2 of More data (> 150 files) on T. Moudiki's situation: a riddle/puzzle (including R, Python, bash interfaces to the game -- but everyone can play) Oct 16, 2025
- More data (> 150 files) on T. Moudiki's situation: a riddle/puzzle (including R, Python, bash interfaces to the game -- but everyone can play) Oct 12, 2025
- R port of llama2.c Oct 9, 2025
- Native uncertainty quantification for time series with NGBoost Oct 8, 2025
- NGBoost (Natural Gradient Boosting) for Regression, Classification, Time Series forecasting and Reserving Oct 6, 2025
- Real-time pricing with a pretrained probabilistic stock return model Oct 1, 2025
- Combining any model with GARCH(1,1) for probabilistic stock forecasting Sep 23, 2025
- Generating Synthetic Data with R-vine Copulas using esgtoolkit in R Sep 21, 2025
- Reimagining Equity Solvency Capital Requirement Approximation (one of my Master's Thesis subjects): From Bilinear Interpolation to Probabilistic Machine Learning Sep 16, 2025
- Transfer Learning using ahead::ridge2f on synthetic stocks returns Pt.2: synthetic data generation Sep 9, 2025
- Transfer Learning using ahead::ridge2f on synthetic stocks returns Sep 8, 2025
- I'm supposed to present 'Conformal Predictive Simulations for Univariate Time Series' at COPA CONFERENCE 2025 in London... Sep 4, 2025
- external regressors in ahead::dynrmf's interface for Machine learning forecasting Sep 1, 2025
- Another interesting decision, now for 'Beyond Nelson-Siegel and splines: A model-agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation' Aug 20, 2025
- Boosting any randomized based learner for regression, classification and univariate/multivariate time series forcasting Jul 26, 2025
- New nnetsauce version with CustomBackPropRegressor (CustomRegressor with Backpropagation) and ElasticNet2Regressor (Ridge2 with ElasticNet regularization) Jul 15, 2025
- mlsauce (home to a model-agnostic gradient boosting algorithm) can now be installed from PyPI. Jul 10, 2025
- A user-friendly graphical interface to techtonique dot net's API (will eventually contain graphics). Jul 8, 2025
- Calling =TECHTO_MLCLASSIFICATION for Machine Learning supervised CLASSIFICATION in Excel is just a matter of copying and pasting Jul 7, 2025
- Calling =TECHTO_MLREGRESSION for Machine Learning supervised regression in Excel is just a matter of copying and pasting Jul 6, 2025
- Calling =TECHTO_RESERVING and =TECHTO_MLRESERVING for claims triangle reserving in Excel is just a matter of copying and pasting Jul 5, 2025
- Calling =TECHTO_SURVIVAL for Survival Analysis in Excel is just a matter of copying and pasting Jul 4, 2025
- Calling =TECHTO_SIMULATION for Stochastic Simulation in Excel is just a matter of copying and pasting Jul 3, 2025
- Calling =TECHTO_FORECAST for forecasting in Excel is just a matter of copying and pasting Jul 2, 2025
- Random Vector Functional Link (RVFL) artificial neural network with 2 regularization parameters successfully used for forecasting/synthetic simulation in professional settings: Extensions (including Bayesian) Jul 1, 2025
- R version of 'Backpropagating quasi-randomized neural networks' Jun 24, 2025
- Backpropagating quasi-randomized neural networks Jun 23, 2025
- Beyond ARMA-GARCH: leveraging any statistical model for volatility forecasting Jun 21, 2025
- Stacked generalization (Machine Learning model stacking) + conformal prediction for forecasting with ahead::mlf Jun 18, 2025
- An Overfitting dilemma: XGBoost Default Hyperparameters vs GenericBooster + LinearRegression Default Hyperparameters Jun 14, 2025
- Programming language-agnostic reserving using RidgeCV, LightGBM, XGBoost, and ExtraTrees Machine Learning models Jun 13, 2025
- Exceptionally, and on a more personal note (otherwise I may get buried alive)... Jun 10, 2025
- Free R, Python and SQL editors in techtonique dot net Jun 9, 2025
- Beyond Nelson-Siegel and splines: A model-agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation Jun 7, 2025
- scikit-learn, glmnet, xgboost, lightgbm, pytorch, keras, nnetsauce in probabilistic Machine Learning (for longitudinal data) Reserving (work in progress) Jun 6, 2025
- R version of Probabilistic Machine Learning (for longitudinal data) Reserving (work in progress) Jun 5, 2025
- Probabilistic Machine Learning (for longitudinal data) Reserving (work in progress) Jun 4, 2025
- Python version of Beyond ARMA-GARCH: leveraging model-agnostic Quasi-Randomized networks and conformal prediction for nonparametric probabilistic stock forecasting (ML-ARCH) Jun 3, 2025
- Beyond ARMA-GARCH: leveraging model-agnostic Machine Learning and conformal prediction for nonparametric probabilistic stock forecasting (ML-ARCH) Jun 2, 2025
- Permutations and SHAPley values for feature importance in techtonique dot net's API (with R + Python + the command line) Jun 1, 2025
- Which patient is going to survive longer? Another guide to using techtonique dot net's API (with R + Python + the command line) for survival analysis May 31, 2025
- A Guide to Using techtonique.net's API and rush for simulating and plotting Stochastic Scenarios May 30, 2025
- Simulating Stochastic Scenarios with Diffusion Models: A Guide to Using techtonique.net's API for the purpose May 29, 2025
- Will my apartment in 5th avenue be overpriced or not? Harnessing the power of www.techtonique.net (+ xgboost, lightgbm, catboost) to find out May 28, 2025
- How long must I wait until something happens: A Comprehensive Guide to Survival Analysis via an API May 27, 2025
- Harnessing the Power of techtonique.net: A Comprehensive Guide to Machine Learning Classification via an API May 26, 2025
- Quantile regression with any regressor -- Examples with RandomForestRegressor, RidgeCV, KNeighborsRegressor May 20, 2025
- Survival stacking: survival analysis translated as supervised classification in R and Python May 5, 2025
- 'Bayesian' optimization of hyperparameters in a R machine learning model using the bayesianrvfl package Apr 25, 2025
- A lightweight interface to scikit-learn in R: Bayesian and Conformal prediction Apr 21, 2025
- A lightweight interface to scikit-learn in R Pt.2: probabilistic time series forecasting in conjunction with ahead::dynrmf Apr 20, 2025
- Extending the Theta forecasting method to GLMs, GAMs, GLMBOOST and attention: benchmarking on Tourism, M1, M3 and M4 competition data sets (28000 series) Apr 14, 2025
- Extending the Theta forecasting method to GLMs and attention Apr 8, 2025
- Nonlinear conformalized Generalized Linear Models (GLMs) with R package 'rvfl' (and other models) Mar 31, 2025
- Probabilistic Time Series Forecasting (predictive simulations) in Microsoft Excel using Python, xlwings lite and www.techtonique.net Mar 28, 2025
- Conformalize (improved prediction intervals and simulations) any R Machine Learning model with misc::conformalize Mar 25, 2025
- My poster for the 18th FINANCIAL RISKS INTERNATIONAL FORUM by Institut Louis Bachelier/Fondation du Risque/Europlace Institute of Finance Mar 19, 2025
- Interpretable probabilistic kernel ridge regression using Matérn 3/2 kernels Mar 16, 2025
- (News from) Probabilistic Forecasting of univariate and multivariate Time Series using Quasi-Randomized Neural Networks (Ridge2) and Conformal Prediction Mar 9, 2025
- Word-Online: re-creating Karpathy's char-RNN (with supervised linear online learning of word embeddings) for text completion Mar 8, 2025
- CRAN-like repository for most recent releases of Techtonique's R packages Mar 2, 2025
- Presenting 'Online Probabilistic Estimation of Carbon Beta and Carbon Shapley Values for Financial and Climate Risk' at Institut Louis Bachelier Feb 27, 2025
- Web app with DeepSeek R1 and Hugging Face API for chatting Feb 23, 2025
- tisthemachinelearner: A Lightweight interface to scikit-learn with 2 classes, Classifier and Regressor (in Python and R) Feb 17, 2025
- R version of survivalist: Probabilistic model-agnostic survival analysis using scikit-learn, xgboost, lightgbm (and conformal prediction) Feb 12, 2025
- Model-agnostic global Survival Prediction of Patients with Myeloid Leukemia in QRT/Gustave Roussy Challenge (challengedata.ens.fr): Python's survivalist Quickstart Feb 10, 2025
- A simple test of the martingale hypothesis in esgtoolkit Feb 3, 2025
- Command Line Interface (CLI) for techtonique.net's API Jan 31, 2025
- Gradient-Boosting and Boostrap aggregating anything (alert: high performance): Part5, easier install and Rust backend Jan 27, 2025
- Just got a paper on conformal prediction REJECTED by International Journal of Forecasting despite evidence on 30,000 time series (and more). What's going on? Part2: 1311 time series from the Tourism competition Jan 20, 2025
- Techtonique is out! (with a tutorial in various programming languages and formats) Jan 14, 2025
- Univariate and Multivariate Probabilistic Forecasting with nnetsauce and TabPFN Jan 14, 2025
- Just got a paper on conformal prediction REJECTED by International Journal of Forecasting despite evidence on 30,000 time series (and more). What's going on? Jan 5, 2025
- Python and Interactive dashboard version of Stock price forecasting with Deep Learning: throwing power at the problem (and why it won't make you rich) Dec 31, 2024
- Stock price forecasting with Deep Learning: throwing power at the problem (and why it won't make you rich) Dec 29, 2024
- No-code Machine Learning Cross-validation and Interpretability in techtonique.net Dec 23, 2024
- survivalist: Probabilistic model-agnostic survival analysis using scikit-learn, glmnet, xgboost, lightgbm, pytorch, keras, nnetsauce and mlsauce Dec 15, 2024
- Model-agnostic 'Bayesian' optimization (for hyperparameter tuning) using conformalized surrogates in GPopt Dec 9, 2024
- You can beat Forecasting LLMs (Large Language Models a.k.a foundation models) with nnetsauce.MTS Pt.2: Generic Gradient Boosting Dec 1, 2024
- You can beat Forecasting LLMs (Large Language Models a.k.a foundation models) with nnetsauce.MTS Nov 24, 2024
- Unified interface and conformal prediction (calibrated prediction intervals) for R package forecast (and 'affiliates') Nov 23, 2024
- GLMNet in Python: Generalized Linear Models Nov 18, 2024
- Gradient-Boosting anything (alert: high performance): Part4, Time series forecasting Nov 10, 2024
- Predictive scenarios simulation in R, Python and Excel using Techtonique API Nov 3, 2024
- Chat with your tabular data in www.techtonique.net Oct 30, 2024
- Gradient-Boosting anything (alert: high performance): Part3, Histogram-based boosting Oct 28, 2024
- R editor and SQL console (in addition to Python editors) in www.techtonique.net Oct 21, 2024
- R and Python consoles + JupyterLite in www.techtonique.net Oct 15, 2024
- Gradient-Boosting anything (alert: high performance): Part2, R version Oct 14, 2024
- Gradient-Boosting anything (alert: high performance) Oct 6, 2024
- Benchmarking 30 statistical/Machine Learning models on the VN1 Forecasting -- Accuracy challenge Oct 4, 2024
- Automated random variable distribution inference using Kullback-Leibler divergence and simulating best-fitting distribution Oct 2, 2024
- Forecasting in Excel using Techtonique's Machine Learning APIs under the hood Sep 30, 2024
- Techtonique web app for data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization Sep 25, 2024
- Parallel for loops (Map or Reduce) + New versions of nnetsauce and ahead Sep 16, 2024
- Adaptive (online/streaming) learning with uncertainty quantification using Polyak averaging in learningmachine Sep 10, 2024
- New versions of nnetsauce and ahead Sep 9, 2024
- Prediction sets and prediction intervals for conformalized Auto XGBoost, Auto LightGBM, Auto CatBoost, Auto GradientBoosting Sep 2, 2024
- Quick/automated R package development workflow (assuming you're using macOS or Linux) Part2 Aug 30, 2024
- R package development workflow (assuming you're using macOS or Linux) Aug 27, 2024
- A new method for deriving a nonparametric confidence interval for the mean Aug 26, 2024
- Conformalized adaptive (online/streaming) learning using learningmachine in Python and R Aug 19, 2024
- Bayesian (nonlinear) adaptive learning Aug 12, 2024
- Auto XGBoost, Auto LightGBM, Auto CatBoost, Auto GradientBoosting Aug 5, 2024
- Copulas for uncertainty quantification in time series forecasting Jul 28, 2024
- Forecasting uncertainty: sequential split conformal prediction + Block bootstrap (web app) Jul 22, 2024
- learningmachine for Python (new version) Jul 15, 2024
- learningmachine v2.0.0: Machine Learning with explanations and uncertainty quantification Jul 8, 2024
- My presentation at ISF 2024 conference (slides with nnetsauce probabilistic forecasting news) Jul 3, 2024
- 10 uncertainty quantification methods in nnetsauce forecasting Jul 1, 2024
- Forecasting with XGBoost embedded in Quasi-Randomized Neural Networks Jun 24, 2024
- Forecasting Monthly Airline Passenger Numbers with Quasi-Randomized Neural Networks Jun 17, 2024
- Automated hyperparameter tuning using any conformalized surrogate Jun 9, 2024
- Recognizing handwritten digits with Ridge2Classifier Jun 3, 2024
- Forecasting the Economy May 27, 2024
- A detailed introduction to Deep Quasi-Randomized 'neural' networks May 19, 2024
- Probability of receiving a loan; using learningmachine May 12, 2024
- mlsauce's `v0.18.2`: various examples and benchmarks with dimension reduction May 6, 2024
- mlsauce's `v0.17.0`: boosting with Elastic Net, polynomials and heterogeneity in explanatory variables Apr 29, 2024
- mlsauce's `v0.13.0`: taking into account inputs heterogeneity through clustering Apr 21, 2024
- mlsauce's `v0.12.0`: prediction intervals for LSBoostRegressor Apr 15, 2024
- Conformalized predictive simulations for univariate time series on more than 250 data sets Apr 7, 2024
- learningmachine v1.1.2: for Python Apr 1, 2024
- learningmachine v1.0.0: prediction intervals around the probability of the event 'a tumor being malignant' Mar 25, 2024
- Bayesian inference and conformal prediction (prediction intervals) in nnetsauce v0.18.1 Mar 18, 2024
- Multiple examples of Machine Learning forecasting with ahead Mar 11, 2024
- rtopy (v0.1.1): calling R functions in Python Mar 4, 2024
- ahead forecasting (v0.10.0): fast time series model calibration and Python plots Feb 26, 2024
- A plethora of datasets at your fingertips Part3: how many times do couples cheat on each other? Feb 19, 2024
- nnetsauce's introduction as of 2024-02-11 (new version 0.17.0) Feb 11, 2024
- Tuning Machine Learning models with GPopt's new version Part 2 Feb 5, 2024
- Tuning Machine Learning models with GPopt's new version Jan 29, 2024
- Subsampling continuous and discrete response variables Jan 22, 2024
- DeepMTS, a Deep Learning Model for Multivariate Time Series Jan 15, 2024
- A classifier that's very accurate (and deep) Pt.2: there are > 90 classifiers in nnetsauce Jan 8, 2024
- learningmachine: prediction intervals for conformalized Kernel ridge regression and Random Forest Jan 1, 2024
- A plethora of datasets at your fingertips Part2: how many times do couples cheat on each other? Descriptive analytics, interpretability and prediction intervals using conformal prediction Dec 25, 2023
- Diffusion models in Python with esgtoolkit (Part2) Dec 18, 2023
- Diffusion models in Python with esgtoolkit Dec 11, 2023
- Julia packaging at the command line Dec 4, 2023
- Quasi-randomized nnetworks in Julia, Python and R Nov 27, 2023
- A plethora of datasets at your fingertips Nov 20, 2023
- A classifier that's very accurate (and deep) Nov 12, 2023
- mlsauce version 0.8.10: Statistical/Machine Learning with Python and R Nov 5, 2023
- AutoML in nnetsauce (randomized and quasi-randomized nnetworks) Pt.2: multivariate time series forecasting Oct 29, 2023
- AutoML in nnetsauce (randomized and quasi-randomized nnetworks) Oct 22, 2023
- Version v0.14.0 of nnetsauce for R and Python Oct 16, 2023
- A diffusion model: G2++ Oct 9, 2023
- Diffusion models in ESGtoolkit + announcements Oct 2, 2023
- An infinity of time series forecasting models in nnetsauce (Part 2 with uncertainty quantification) Sep 25, 2023
- (News from) forecasting in Python with ahead (progress bars and plots) Sep 18, 2023
- Forecasting in Python with ahead Sep 11, 2023
- Risk-neutralize simulations Sep 4, 2023
- Comparing cross-validation results using crossval_ml and boxplots Aug 27, 2023
- Reminder Apr 30, 2023
- Did you ask ChatGPT about who you are? Apr 16, 2023
- A new version of nnetsauce (randomized and quasi-randomized 'neural' networks) Apr 2, 2023
- Simple interfaces to the forecasting API Nov 23, 2022
- A web application for forecasting in Python, R, Ruby, C#, JavaScript, PHP, Go, Rust, Java, MATLAB, etc. Nov 2, 2022
- Prediction intervals (not only) for Boosted Configuration Networks in Python Oct 5, 2022
- Boosted Configuration (neural) Networks Pt. 2 Sep 3, 2022
- Boosted Configuration (_neural_) Networks for classification Jul 21, 2022
- A Machine Learning workflow using Techtonique Jun 6, 2022
- Super Mario Bros © in the browser using PyScript May 8, 2022
- News from ESGtoolkit, ycinterextra, and nnetsauce Apr 4, 2022
- Explaining a Keras _neural_ network predictions with the-teller Mar 11, 2022
- New version of nnetsauce -- various quasi-randomized networks Feb 12, 2022
- A dashboard illustrating bivariate time series forecasting with `ahead` Jan 14, 2022
- Hundreds of Statistical/Machine Learning models for univariate time series, using ahead, ranger, xgboost, and caret Dec 20, 2021
- Forecasting with `ahead` (Python version) Dec 13, 2021
- Tuning and interpreting LSBoost Nov 15, 2021
- Time series cross-validation using `crossvalidation` (Part 2) Nov 7, 2021
- Fast and scalable forecasting with ahead::ridge2f Oct 31, 2021
- Automatic Forecasting with `ahead::dynrmf` and Ridge regression Oct 22, 2021
- Forecasting with `ahead` Oct 15, 2021
- Classification using linear regression Sep 26, 2021
- `crossvalidation` and random search for calibrating support vector machines Aug 6, 2021
- parallel grid search cross-validation using `crossvalidation` Jul 31, 2021
- `crossvalidation` on R-universe, plus a classification example Jul 23, 2021
- Documentation and source code for GPopt, a package for Bayesian optimization Jul 2, 2021
- Hyperparameters tuning with GPopt Jun 11, 2021
- A forecasting tool (API) with examples in curl, R, Python May 28, 2021
- Bayesian Optimization with GPopt Part 2 (save and resume) Apr 30, 2021
- Bayesian Optimization with GPopt Apr 16, 2021
- Compatibility of nnetsauce and mlsauce with scikit-learn Mar 26, 2021
- Explaining xgboost predictions with the teller Mar 12, 2021
- An infinity of time series models in nnetsauce Mar 6, 2021
- New activation functions in mlsauce's LSBoost Feb 12, 2021
- 2020 recap, Gradient Boosting, Generalized Linear Models, AdaOpt with nnetsauce and mlsauce Dec 29, 2020
- A deeper learning architecture in nnetsauce Dec 18, 2020
- Classify penguins with nnetsauce's MultitaskClassifier Dec 11, 2020
- Bayesian forecasting for uni/multivariate time series Dec 4, 2020
- Generalized nonlinear models in nnetsauce Nov 28, 2020
- Boosting nonlinear penalized least squares Nov 21, 2020
- Statistical/Machine Learning explainability using Kernel Ridge Regression surrogates Nov 6, 2020
- NEWS Oct 30, 2020
- A glimpse into my PhD journey Oct 23, 2020
- Submitting R package to CRAN Oct 16, 2020
- Simulation of dependent variables in ESGtoolkit Oct 9, 2020
- Forecasting lung disease progression Oct 2, 2020
- New nnetsauce Sep 25, 2020
- Technical documentation Sep 18, 2020
- A new version of nnetsauce, and a new Techtonique website Sep 11, 2020
- Back next week, and a few announcements Sep 4, 2020
- Explainable 'AI' using Gradient Boosted randomized networks Pt2 (the Lasso) Jul 31, 2020
- LSBoost: Explainable 'AI' using Gradient Boosted randomized networks (with examples in R and Python) Jul 24, 2020
- nnetsauce version 0.5.0, randomized neural networks on GPU Jul 17, 2020
- Maximizing your tip as a waiter (Part 2) Jul 10, 2020
- New version of mlsauce, with Gradient Boosted randomized networks and stump decision trees Jul 3, 2020
- Announcements Jun 26, 2020
- Parallel AdaOpt classification Jun 19, 2020
- Comments section and other news Jun 12, 2020
- Maximizing your tip as a waiter Jun 5, 2020
- AdaOpt classification on MNIST handwritten digits (without preprocessing) May 29, 2020
- AdaOpt (a probabilistic classifier based on a mix of multivariable optimization and nearest neighbors) for R May 22, 2020
- AdaOpt May 15, 2020
- Custom errors for cross-validation using crossval::crossval_ml May 8, 2020
- Documentation+Pypi for the `teller`, a model-agnostic tool for Machine Learning explainability May 1, 2020
- Encoding your categorical variables based on the response variable and correlations Apr 24, 2020
- Linear model, xgboost and randomForest cross-validation using crossval::crossval_ml Apr 17, 2020
- Grid search cross-validation using crossval Apr 10, 2020
- Documentation for the querier, a query language for Data Frames Apr 3, 2020
- Time series cross-validation using crossval Mar 27, 2020
- On model specification, identification, degrees of freedom and regularization Mar 20, 2020
- Import data into the querier (now on Pypi), a query language for Data Frames Mar 13, 2020
- R notebooks for nnetsauce Mar 6, 2020
- Version 0.4.0 of nnetsauce, with fruits and breast cancer classification Feb 28, 2020
- Create a specific feed in your Jekyll blog Feb 21, 2020
- Git/Github for contributing to package development Feb 14, 2020
- Feedback forms for contributing Feb 7, 2020
- nnetsauce for R Jan 31, 2020
- A new version of nnetsauce (v0.3.1) Jan 24, 2020
- ESGtoolkit, a tool for Monte Carlo simulation (v0.2.0) Jan 17, 2020
- Search bar, new year 2020 Jan 10, 2020
- 2019 Recap, the nnetsauce, the teller and the querier Dec 20, 2019
- Understanding model interactions with the `teller` Dec 13, 2019
- Using the `teller` on a classifier Dec 6, 2019
- Benchmarking the querier's verbs Nov 29, 2019
- Composing the querier's verbs for data wrangling Nov 22, 2019
- Comparing and explaining model predictions with the teller Nov 15, 2019
- Tests for the significance of marginal effects in the teller Nov 8, 2019
- Introducing the teller Nov 1, 2019
- Introducing the querier Oct 25, 2019
- Prediction intervals for nnetsauce models Oct 18, 2019
- Using R in Python for statistical learning/data science Oct 11, 2019
- Model calibration with `crossval` Oct 4, 2019
- Bagging in the nnetsauce Sep 25, 2019
- Adaboost learning with nnetsauce Sep 18, 2019
- Change in blog's presentation Sep 4, 2019
- nnetsauce on Pypi Jun 5, 2019
- More nnetsauce (examples of use) May 9, 2019
- nnetsauce Mar 13, 2019
- crossval Mar 13, 2019
- test Mar 10, 2019

Comments powered by Talkyard.