EXST 7015 - Statistical Inference II, Fall 2011 Lab 7: Multiple Linear Regression_Variable Selection OBJECTIVES In multiple regression, a number of variables can be involved and regressed on one another (model: Y = β 0 + β 1 X1+ β 2 X2+ · · · + β p XP + ε ). The overall test of hypothesis of multiple linear regression is H 0 : β 1 = β 2 = · · ·= β p =0 v.s. H 1 : at least one β 0. Rejection of H 0 implies that at least one of the regressors, X1, X2, . . . , Xp, contributes significantly to the model. In the lab 5 and lab 6, we have used several statistics such as F-test, t-test of regression coefficient, standardized regression coefficients and partial R^2 to measure the relative importance of independent variables, which tell us which independent variables are more important than the others in predicating the values of the dependent variable. Then the question is how to choose the ‘best’ model of multiple regression for the current data, i.e. which variables should remain in the model, to guide its application and future studies. Theoretically, the ideal model provides the best possible fit while using the fewest possible parameters. In practice, however, in addition to the expensive and time-consuming processes of data collection, problems of multicollinearity, poor combination of independent variables and influential observations make the model fitting quite challenging, as we have learned in previous labs. In this lab, we will introduce common variable selection methods based on F-statistics or t- test of parameter estimates (the best criteria to measure the relative importance of independent

