Unformatted text preview: biased (rather than mean unbiased, the standard unbiasedness property).
The property of median unbiasedness is invariant under transformations while the property of mean unbiasedness may be lost under nonlinear transformations. For example,
while using an unbiased estimator with large mean square error to estimate the parameter, we highly risk a big error. In contrast, a biased estimator with small mean square
error will well improve the precision of our prediction.
Hence, our goal is to minimize . From figure 3, we can see that the relationship of the three parameters is: . Thus given the Mean Squared Error (MSE), if we have a low bias, then we will have a high variance and vice versa.
A Test error is a good estimation on MSE. We want to have a somewhat balanced bias and variance (not high on bias or variance), although it will have some bias.
wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 43/74 10/09/2013 Stat841  Wiki Cour se Notes Referring to Figure 2, overfitting happens after the point where training data (training sample line) starts to decrease and test data (test sample line) starts to increase. There
are 2 main approaches to avoid overfitting:
1. Estimating error rate
Empirical training error is not a good estimation
Empirical test error is a better estimation
Cross Validation is fast
Computing error bound (analytically) using some probability inequality.
We will not discuss computing the error bound in class; however, a popular method for doing this computation is called VC Dimension (short for Vapnik–Chervonenkis
Dimension). Information can be found from Andrew Moore (http://www.autonlab.org/tutorials/vcdim.html) and Steve Gunn (http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.10.7171&rep=rep1&type=pdf) .
2. Regularization
Use of shrinkage method
Decrease the chance of overfitting by controlling the weights Example of under and overfitting in R
To give further intuition of over and underfitting, consider this example. A simple quadratic data set with some random noise is generated, and then polynomials of varying
degrees are fitted. The errors for the training set and a test set are calculated.
> x< rom2001
&g...
View
Full
Document
 Winter '13

Click to edit the document details