{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture 8

# Lecture 8 - Lecture 8 Model Reduction Goals Albert Einstein...

This preview shows pages 1–8. Sign up to view the full content.

Lecture 8: Model Reduction

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Goals Albert Einstein ``Things should be made as simple as possible, but not any simpler.'‘ In cases where there is a cost associated with collecting more variables, we want good results with fewer variables Increase the reliability (i.e. decrease variance) of predictions Increase the precision of parameter estimates. When interpretation is important, we want a parsimonious description
Developing a predictive model Y=0.5+X+ε where ε~N(0,.01) Generate independent noise variables Z1- Z15 Dataset 1: fit the model Dataset 2: make predictions using coefficients estimates from dataset 1. Number of Noise Variables in the Covariates 0 2 3 10 15 R2 in dataset 1 0.7962 0.8670 0.8765 0.8877 0.9015 R a 2 in dataset 1 0.7919 0.8584 0.8593 0.8551 0.8538 R2 in dataset 2 0.6120 0.4791 0.4357 0.4051 0.3334

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The model does worse on a new dataset: Fitting means fitting the current dataset as well as possible so the model is optimized for this dataset and will generally do worse for others. This effect is exacerbated when there are noise variables. The R 2 goes up as noise variables are added, even though these variables are completely unrelated to Y. • The adjusted R a 2 also goes up until after 5 noise variables are added The model does increasingly poorly: it is overfitted.
Problems and biases in model selection Omission bias is the bias in and that results from omitting relevant covariates. Generate Y=β 1 X 1 + β 2 X 2 Fit the reduced model E(Y|X 1 )=α 1 X 1, where β ˆ y ˆ 1 1 2 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 ' ' ) ˆ ( , 0 ) ( ' ' ' ' ' ' ' ' ' ) ' ( ˆ X X X X E E X X X X X X X X X X X X X X Y X X X α ε + = = + + = + + = = -

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Problems and biases in model selection • The omission bias ( ) is 0 if β 2 is 0 or corr(X1, X2)=0 (that is, X1’X2=0) Dropping variables relegates them to the error term, but they still have an effect. The parameter estimates from the reduced model are generally less variables than those of the full model. is biased upwards. Predicted y's obtained from the reduced model are biased unless X1’X2β2=0 1 1 2 1 2 ' ' X X X X β 2 ˆ reduced σ ) ˆ ( ) ˆ ( reduced y Var y Var
β2=0,but we don't drop: keeping extraneous variables will generally increase the variance of the parameter estimates and of future predictions. Dropping them leads to tighter estimates. β2≠0,but we drop: Dropping variables with

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 29

Lecture 8 - Lecture 8 Model Reduction Goals Albert Einstein...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online