Statistical Consultants Ltd

Cross-validation

Simulation Techniques, Statistical Analysis Techniques

Cross-validation techniques can be used to test the predictive performance of models. The techniques can be used to help prevent a model being over fitted. Cross-validation involves repeatedly fitting a model to subsets of the data (known as a training sets), and then using the rest of the data (known as validation sets) to test the performance of that model.

There are several ways of performing a cross-validation analysis, including the following:

Repeated random sub-sampling validation

This method involves the following steps:

Randomly assign each observation into one of two groups: training and validation.
Fit the model to the observations in the training set.
Use the observations from the validation set to test the model’s performance. Store this information.
Repeat steps 1 to 3 many times.

K-fold cross-validation

This method involves the following steps:

Randomly partition the observations into K groups of equal length.
For a group of observations, fit the model using all observations except that group.
Use that group’s observations to test the model’s predictive performance. Store this information.
Repeat steps 2 and 3 for the other groups.

When the number of folds (K) equals the number of observations in the data set, it is known as a leave-one-out cross-validation.

K × 2 cross-validation

This method involves the following steps:

Randomly partition the observations into two groups of equal length.
Use one group to fit the model and the other group to test the model’s performance. Store this information.
Repeat step 2 but with the two groups switched around.
Repeat steps 1 to 3 several times.

Statistical Consultants Ltd

Cross-validation

Repeated random sub-sampling validation

K-fold cross-validation

K × 2 cross-validation

See also: