I had to rename my R package crossval – generic functions for cross-validation – to crossvalidation, because its name was clashing with an existing CRAN R package’s named crossval. Here is how to install crossvalidation:

options(repos = c(
  techtonique = 'https://techtonique.r-universe.dev',
  CRAN = 'https://cloud.r-project.org'))

install.packages("crossvalidation")

What is the R-universe mentioned in the previous code snippet? It is, IMHO, a quite promising CRAN-like repository for storing, sharing and building R packages (for Linux, macOS and Windows). If you want to create your own repository on R-universe, read this.

I’ve been looking for such an infrastructure for some time, and tried miniCRAN in particular. Unfortunately on miniCRAN (which works pretty well for CRAN packages), I haven’t been able, so far, to upload/build local packages – local meaning non-CRAN packages. Maybe I missed a point on miniCRAN’s use, so if you know how to do that, please reach out to me (even though I’ll continue to follow R-universe’s development)!

Examples of use of crossvalidation for regression and univariate time series can be found through the following links (hence, you must replace crossval occurences by crossvalidation):

For classification, an example is presented below.

Example of use of crossvalidation for classification

# Import libraries

library(crossvalidation)
library(randomForest)
# Input data 

# Transforming model response into a factor
y <- as.factor(as.numeric(iris$Species))

# Explanatory variables 
X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])
# 5-fold cross-validation repeated 3 times

# default error metric, when y is a factor: accuracy
crossvalidation::crossval_ml(x = X, y = y, k = 5, repeats = 3,
                             fit_func = randomForest::randomForest, 
                             predict_func = predict,
                             fit_params = list(mtry = 2),
                             packages = "randomForest")

## $folds
##         repeat_1  repeat_2  repeat_3
## fold_1 0.9666667 0.9666667 1.0000000
## fold_2 0.9666667 0.9000000 0.9333333
## fold_3 1.0000000 0.9666667 0.9333333
## fold_4 0.9333333 1.0000000 0.9333333
## fold_5 0.9333333 0.9333333 0.9666667
## 
## $mean
## [1] 0.9555556
## 
## $sd
## [1] 0.02999118
## 
## $median
## [1] 0.9666667
# We can specify custom error metrics for crossvalidation::crossval_ml
# here, the error rate 

eval_metric <- function (preds, actual)
{
 stopifnot(length(preds) == length(actual))
  res <- 1-mean(preds == actual)
  names(res) <- "error rate"
  return(res)
}

# specify `eval_metric` argument for measuring the error rate
# instead of the (default) accuracy 
crossvalidation::crossval_ml(x = X, y = y, k = 5, repeats = 3,
                             fit_func = randomForest::randomForest, 
                             predict_func = predict,
                             fit_params = list(mtry = 2),
                             packages = "randomForest", 
                             eval_metric=eval_metric)
## $folds
##          repeat_1   repeat_2   repeat_3
## fold_1 0.03333333 0.03333333 0.00000000
## fold_2 0.03333333 0.10000000 0.06666667
## fold_3 0.00000000 0.03333333 0.06666667
## fold_4 0.06666667 0.00000000 0.06666667
## fold_5 0.06666667 0.06666667 0.03333333
## 
## $mean
## [1] 0.04444444
## 
## $sd
## [1] 0.02999118
## 
## $median
## [1] 0.03333333

Comments powered by Talkyard.