Multiple Linear Regression Response Variable: Y • Explanatory Variables: X 1 ,..., X k Model (Extension of Simple Regression): E ( Y ) = α + β 1 X 1 + + k X k V ( Y ) = σ 2 • Partial Regression Coefficients ( i ): Effect of increasing X i by 1 unit, holding all other predictors constant. Computer packages fit models, hand calculations very tedious

Prediction Equation & Residuals • Model Parameters: α, β 1 ,…, β k , σ • Estimators: a , b 1 , …, b k , Least squares prediction equation: Residuals: Error Sum of Squares: Estimated conditional standard deviation: ^ k k X b X b a Y + + + = 1 1 ^ ^ Y Y e - = - = = 2 ^ 2 ) ( Y Y e SSE 1 ^ - - = k n SSE
Commonly Used Plots Scatterplot : Bivariate plot of pairs of variables. Do not adjust for other variables. Some software packages plot a matrix of plots Conditional Plot (Coplot) : Plot of Y versus a predictor variable, seperately for certain ranges of a second predictor variable. Can show whether a relationship between Y and X 1 is the same across levels of X 2 Partial Regression (Added-Variable) Plot : Plots residuals from regression models to determine association between Y and X 2 , after removing effect of X 1 (residuals from ( Y , X 1 ) vs ( X 2 , X 1 ))

Example - Airfares 2002Q4 Response Variable: Average Fare ( Y , in \$) Explanatory Variables: – Distance ( X 1 , in miles) – Average weekly passengers ( X 2 ) Data: 1000 city pairs for 4th Quarter 2002 Source: U.S. DOT Descriptive Statistics 1000 50.52 401.23 163.3754 55.36547 1000 108.00 2724.00 1056.9730 643.20325 1000 181.41 8950.76 672.2791 766.51925 1000 AVEFARE DISTANCE AVEPASS Valid N (listwise) N Minimum Maximum Mean Std. Deviation
Example - Airfares 2002Q4 avefare distance avepass 0 200 400 0 200 400 0 1000 2000 3000 0 1000 2000 3000 0 5000 10000 0 5000 10000 Scatterplot Matrix of Average Fare, Distance, and Average Passengers (produced by STATA):

