Stat841f09 - Wiki Course Notes

# Wikipediaorgwikicross validation28statistics29 is used

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: i/o i > >sm(-rdc(s,aafaex)^)lnt() > u(ypeitpidt.rm())2/eghy > 01046 >1 .154 > >sm(ts-rdc(s,aafaexxet)^)lnt(ts) > u(yetpeitpidt.rm(=ts))2/eghyet > 1300 >1 .244 wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 44/74 10/09/2013 Stat841 - Wiki Cour se Notes Fitting a function of the form sin(x)+cos(x) works pretty well on the training set, but because it is not the real underlying function, it fails on test data which doesn't lie on the same domain. Cross-Validation (CV) - Introduction Cross- Validation (http://en.wikipedia.org/wiki/Cross- validation_%28statistics%29) is used to estimate the error rate of a classifier with respect to test data rather than data used in the model. Here is a general introduction to CV: We have a set of collected data for which we know the proper labels We divide it into 2 parts, Training data (T) and Validation data (V) Figure 1: Illustration of CrossValidation For our calculation, we pretend that we do not know the label of V and we use data in T to train the classifier We estimate an empirical error rate on V since the model hasn't seen V yet and we know the proper label of all elements in V to know how many were misclassified. CV has different implementations which can reduce the variance of the calculated error rate, but sometimes with a tradeoff of a higher calculation time. Complexity Control - Nov 4, 2009 Cross-validation Cross- validation (http://en.wikipedia.org/wiki/Cross- validation_(statistics)) is the simplest and most widely used method to estimate the true error. It comes from the observation that although training error always decreases with the increasing complexity of the model, the test error starts to increase from a certain point, which is noted as overfitting (see figure 2 above). Since test error estimates MSE (mean square error) best, people came up with the idea to divide the data set into three parts: training set, validation set, and test set. training set is used to build the model, va...
View Full Document

## This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online