This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: On the Dangers of Cross-Validation. An Experimental Evaluation R. Bharat Rao IKM CKS Siemens Medical Solutions USA Glenn Fung IKM CKS Siemens Medical Solutions USA Romer Rosales IKM CKS Siemens Medical Solutions USA Abstract Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reduc- tions in the (computational) cost of classification algo- rithms, and the development of closed-form solutions (for performing cross validation in certain classes of learning algorithms) makes it possible to test thousand or millions of variants of learning models on the data. Thus, it is now possible to calculate cross validation per- formance on a much larger number of tuned models than would have been possible otherwise. However, we em- pirically show how under such large number of models the risk for overfitting increases and the performance estimated by cross validation is no longer an effective estimate of generalization; hence, this paper provides an empirical reminder of the dangers of cross valida- tion. We use a closed-form solution that makes this evaluation possible for the cross validation problem of interest. In addition, through extensive experiments we expose and discuss the effects of the overuse/misuse of cross validation in various aspects, including model se- lection, feature selection, and data dimensionality. This is illustrated on synthetic, benchmark, and real-world data sets. 1 Introduction In a general classification problem, the goal is to learn a classifier that performs well on unseen data drawn from the same distribution as the available data 1 ; in other words, to learn classifiers with good generalization. One common way to estimate generalization capabilities is to measure the performance of the learned classifier on test data that has not been used to train the classifier. When a large test data set cannot be held out or easily 1 We concentrate on performance on data drawn from the same distribution but performance on a different distribution is also a (less explored) problem of interest. acquired, resampling methods, such as cross validation, are commonly used to estimate the generalization er- ror. The resulting estimates of generalization can also be used for model selection by choosing from various possible classification algorithms (models) the one that has the lowest cross validation error (and hence the low- est expected generalization error). A strong argument in favor of using cross validation is the potential of using the entire training set for testing (albeit not at once), creating the largest possible test set for a fixed training data set. Essentially, the classifier is trained on a subset of the training data set, and tested on the remainder. This process is repeated systematically so that all the points in the training set are tested.the training set are tested....
View Full Document
This note was uploaded on 11/08/2009 for the course STATS 241 taught by Professor Lai,t during the Spring '08 term at Stanford.
- Spring '08