Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization
I had to rename my R package crossval
– generic functions for cross-validation – to crossvalidation
, because its name was clashing with an existing CRAN R package’s named crossval
.
Here is how to install
crossvalidation
:
options(repos = c(
techtonique = 'https://techtonique.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
install.packages("crossvalidation")
What is the R-universe mentioned in the previous code snippet? It is, IMHO, a quite promising CRAN-like repository for storing, sharing and building R packages (for Linux, macOS and Windows). If you want to create your own repository on R-universe, read this.
I’ve been looking
for such an infrastructure for some time, and tried miniCRAN
in particular.
Unfortunately on miniCRAN (which works pretty well for CRAN packages), I haven’t been able, so far, to upload/build local packages – local meaning non-CRAN packages. Maybe I missed a point on miniCRAN
’s use, so if you know how to do that, please reach out to me (even though I’ll continue to follow R-universe’s development)!
Examples of use of crossvalidation
for regression and univariate time series can be found through the following links (hence, you must replace crossval
occurences by crossvalidation
):
- Grid search cross-validation using crossval
- Linear model, xgboost and randomForest cross-validation using crossval::crossval_ml
- Custom errors for cross-validation using crossval::crossval_ml
- Time series cross-validation using crossval
For classification, an example is presented below.
Example of use of crossvalidation
for classification
# Import libraries
library(crossvalidation)
library(randomForest)
# Input data
# Transforming model response into a factor
y <- as.factor(as.numeric(iris$Species))
# Explanatory variables
X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])
# 5-fold cross-validation repeated 3 times
# default error metric, when y is a factor: accuracy
crossvalidation::crossval_ml(x = X, y = y, k = 5, repeats = 3,
fit_func = randomForest::randomForest,
predict_func = predict,
fit_params = list(mtry = 2),
packages = "randomForest")
## $folds
## repeat_1 repeat_2 repeat_3
## fold_1 0.9666667 0.9666667 1.0000000
## fold_2 0.9666667 0.9000000 0.9333333
## fold_3 1.0000000 0.9666667 0.9333333
## fold_4 0.9333333 1.0000000 0.9333333
## fold_5 0.9333333 0.9333333 0.9666667
##
## $mean
## [1] 0.9555556
##
## $sd
## [1] 0.02999118
##
## $median
## [1] 0.9666667
# We can specify custom error metrics for crossvalidation::crossval_ml
# here, the error rate
eval_metric <- function (preds, actual)
{
stopifnot(length(preds) == length(actual))
res <- 1-mean(preds == actual)
names(res) <- "error rate"
return(res)
}
# specify `eval_metric` argument for measuring the error rate
# instead of the (default) accuracy
crossvalidation::crossval_ml(x = X, y = y, k = 5, repeats = 3,
fit_func = randomForest::randomForest,
predict_func = predict,
fit_params = list(mtry = 2),
packages = "randomForest",
eval_metric=eval_metric)
## $folds
## repeat_1 repeat_2 repeat_3
## fold_1 0.03333333 0.03333333 0.00000000
## fold_2 0.03333333 0.10000000 0.06666667
## fold_3 0.00000000 0.03333333 0.06666667
## fold_4 0.06666667 0.00000000 0.06666667
## fold_5 0.06666667 0.06666667 0.03333333
##
## $mean
## [1] 0.04444444
##
## $sd
## [1] 0.02999118
##
## $median
## [1] 0.03333333
Comments powered by Talkyard.