5/12/2010 1 Multiple Regression in Practice - Intro • A common question in practice is which X variables to include in a multiple regression. • Issue 1: Coefficient estimates will always change at least a little bit (and sometimes a lot) when we add, subtract, or change subsets of the regressors (i.e. the X ’s) . • Issue 2: Because of Issue 1, an important decision is what variables to include as X ’s in a regression. The main cost-benefit tradeoff here is: – 1) Not including a variable can lead to OVB – 2) Including too many variables can lead to Imperfect Multicollinearity , i.e. when the X variables are highly (but not perfectly) correlated with each other. Imperfect Multicollinearity can result in high standard errors, i.e. imprecise estimates. For example, with very large standard errors, one gets very wide and uninformative confidence intervals. General Strategies - I • Suppose you are primarily interested in quantifying how one specific regressor ( X 1i ) affects Y i . • You have a set of possible additional regressor variables X 2i , X 3i ,…., X Ki . Which should you include? • First, use your intuition to come up with a “baseline” specification, i.e. an initial set of regressors. • In this baseline specification:

5/12/2010 2 General Strategies - II • 1) Include (as X’ s) variables that you think are likely to be important determinants of Y i . • 2) Do not include (as X’ s) variables that measure the same thing as Y i , or that are “caused” by Y i . This can cause “simultaneity bias” (see pp 324-325 in text). Examples: – Suppose that test scores are the average of a math score and a verbal score, i.e. testscore = 0.5*mtestscore + 0.5*vtestscore. You should not include mtestscore and vtestscore as regressor variables. What would happen? – Suppose you also measure 9 th grade testscores. Since these are in some sense “caused” by 5 th grade testscores, you should not include this as a regressor variable. So include regressors that you think cause Y i , do not include regressors that you think are caused by Y i . General Strategies - III • 3) In constructing the baseline specification, if you have two (or more) X ’s that are similar measures of the same effect, you often want to start by only including one (or a few) of these X ’s • Example: In our Testscore-STR study, suppose we also want to
