Cross-validation
Simulation
Techniques, Statistical Analysis Techniques
Cross-validation techniques can be used to test the predictive
performance of models. The
techniques can be used to help prevent a model being over
fitted. Cross-validation
involves repeatedly fitting a model to
subsets of the data (known as a training sets), and then using the rest
of the
data (known as validation sets) to test the performance of that model.
There are several ways of performing a cross-validation
analysis, including the following:
Repeated random
sub-sampling validation
This method involves the following steps:
- Randomly assign each observation into one of two
groups:
training and validation.
- Fit the model to the observations in the training
set.
- Use the observations from the validation set to test
the
model’s performance. Store this information.
- Repeat steps 1 to 3 many times.
K-fold
cross-validation
This method involves the following steps:
- Randomly partition the observations into K groups of equal length.
- For a group of observations, fit the model using all
observations except that group.
- Use that group’s observations to test the
model’s predictive
performance. Store this information.
- Repeat steps 2 and 3 for the other groups.
When the number of folds
(K) equals the number of
observations
in the data set, it is known as a leave-one-out
cross-validation.
K
× 2 cross-validation
This method involves the following steps:
- Randomly partition the observations into
two groups of equal length.
- Use one group to fit the model and the other group to
test
the model’s performance. Store this information.
- Repeat step 2 but with the two groups switched
around.
- Repeat steps 1 to 3 several times.
See
also:
Monte
Carlo Methods
Bootstrapping
|