This preview shows pages 1–14. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat 104: Quantitative Methods for Economists Class 35: Model Building and Variable Selection 1 There is a lot to think about s Nonlinearity s Normality s Multicollinearity s Heteroskedasticity s Summer Vacation!!! 2 Reporting Your Regression Results 3 (this table was borrowed from James Stocks Econometrics book) BTW: Sample Sizes for Regression s Degree of freedom: In this context, a piece of information accounted for in a model. b Continuous, linear variables take up one df. b Categorical variables take up one df per category, minus one. s Linear regression: Need approximately 15 (1020) observations er degree of freedom. per degree of freedom. b Example: You are interested in predicting cognitive functioning in a group of ICU survivors, using a standard test score, while adjusting for age, education, hospital length of stay and whether or not cognitive therapy was administered after a hospital stay. b You have a total of four degrees of freedom, so you need approximately 60 patients. 4 Dont Overfit s We have been preaching that the lower the s_e, the better (i.e. smaller prediction intervals). ut its a fine art to model building, and this is s But its a fine art to model building, and this is not exactly the case all the time. s That is, you dont want to overfit your data s You can always force s_e to be as small as you want by adding more and more variables. 5 Example s Consider the following simple data set 60 80 6 20 40 y42 2 4 x Regression Fits Well 7 Looks Pretty Too 40 60 80 8 2042 2 4 x Fitted values y Lets be insane now s Lets fit a 9 th order polynomial to our data (why? Because we can) s Our model is 9 2 9 1 2 9 y X X X = + + + + + L Regression Output 10 Completely Overfitted 11 Moral s A bigger model always has a smaller error sum of squares (SSE), just because a minimum taken over a larger set is smaller. s Thus least squares, taken as a criterion for model election says always choose the biggest model. But selection says always choose the biggest model. But this is silly. s You want to balance number of variables in the model (as few as possible), versus predictive ability. s Occams Razor: "the simplest explanation is more likely the correct one." 12 There are many good models s There is unlikely to be a single "best model" that we'll be able to discover as long as we work at it long enough....
View
Full
Document
This note was uploaded on 03/27/2012 for the course STATS 104 taught by Professor Michaelparzen during the Fall '11 term at Harvard.
 Fall '11
 MichaelParzen

Click to edit the document details