This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat 104: Quantitative Methods for Economists Class 35: Model Building and Variable Selection 1 There is a lot to think about s Nonlinearity s Normality s Multicollinearity s Heteroskedasticity s Summer Vacation!!! 2 Reporting Your Regression Results 3 (this table was borrowed from James Stock’s Econometrics book) BTW: Sample Sizes for Regression s Degree of freedom: In this context, a piece of information accounted for in a model. b Continuous, linear variables take up one df. b Categorical variables take up one df per category, minus one. s Linear regression: Need approximately 15 (1020) observations er degree of freedom. per degree of freedom. b Example: You are interested in predicting cognitive functioning in a group of ICU survivors, using a standard test score, while adjusting for age, education, hospital length of stay and whether or not cognitive therapy was administered after a hospital stay. b You have a total of four degrees of freedom, so you need approximately 60 patients. 4 Don’t Overfit s We have been preaching that the lower the s_e, the better (i.e. smaller prediction intervals). ut it’s a fine art to model building, and this is s But it’s a fine art to model building, and this is not exactly the case all the time. s That is, you don’t want to overfit your data s You can always force s_e to be as small as you want by adding more and more variables. 5 Example s Consider the following simple data set 60 80 6 20 40 y42 2 4 x Regression Fits Well 7 Looks Pretty Too 40 60 80 8 2042 2 4 x Fitted values y Let’s be insane now s Let’s fit a 9 th order polynomial to our data (why? Because we can) s Our model is 9 2 9 1 2 9 y X X X β β β β ε = + + + + + L Regression Output 10 Completely Overfitted 11 Moral s A bigger model always has a smaller error sum of squares (SSE), just because a minimum taken over a larger set is smaller. s Thus least squares, taken as a criterion for model election says “always choose the biggest model.” But selection says “always choose the biggest model.” But this is silly. s You want to balance number of variables in the model (as few as possible), versus predictive ability. s Occam’s Razor: “"the simplest explanation is more likely the correct one." 12 There are many good models s There is unlikely to be a single "best model" that we'll be able to discover as long as we work at it long enough....
View
Full Document
 Fall '11
 MichaelParzen
 Regression Analysis, Model building, fitted values, lower floors, higher floors

Click to edit the document details