This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSC 411 / CSC D11 Cross Validation 10 Cross Validation Suppose we must choose between two possible ways to fit some data. How do we choose between them? Simply measuring how well they fit they data would mean that we always try to fit the data as closely as possible the best method for fitting the data is simply to memorize it in big lookup table. However, fitting the data is no guarantee that we will be able to generalize to new measurements. As another example, consider the use of polynomial regression to model a function given a set of data points. Higherorder polynomials will always fit the data as well or better than a loworder polynomial; indeed, an N 1 degree polynomial will fit N data points exactly (to within numerical error). So just fitting the data as well as we can usually produces models with many parameters, and they are not going to generalize to new inputs in almost all cases of interest. The general solution is to evaluate models by testing them on a new data set (the test set), distinct from the training set. This measures how predictive the model is: Is it useful in new situations? More generally, we often wish to obtain empirical estimates of performance. This can be useful for finding errors in implementation, comparing competing models and learning algorithms, and detecting over or under fitting in a learned model. 10.1 CrossValidation The idea of empirical performance evaluation can also be used to determine model parameters that might otherwise to hard to determine. Examples of such model parameters include the constantmight otherwise to hard to determine....
View
Full
Document
 Spring '10
 DavidFleet
 Machine Learning

Click to edit the document details