Multiple Regression

Multiple regression Typically, we want to use more than a single predictor (independent variable) to make predictions Regression with more than one predictor is called “multiple regression” y i = α + β 1 x 1 i + β 2 x 2 i +K + β p x pi + ε i
Motivating example: Sex discrimination in wages In 1970’s, Harris Trust and Savings Bank was sued for discrimination on the basis of sex. Analysis of salaries of employees of one type (skilled, entry-level clerical) presented as evidence by the defense. Did female employees tend to receive lower starting salaries than similarly qualified and experienced male employees?

Variables collected 93 employees on data file (61 female, 32 male). bsal: Annual salary at time of hire. sal77 : Annual salary in 1977. educ: years of education. exper: months previous work prior to hire at bank. fsex: 1 if female, 0 if male senior: months worked at bank since hired age: months So we have six x ’s and and one y (bsal). However, in what follows we won’t use sal77.
Comparison for male and females This shows men started at higher salaries than women (t=6.3, p<.0001). But, it doesn’t control for other characteristics. bsal 4000 5000 6000 7000 8000 Female Male fsex Oneway Analysis of bsal By fsex

Relationships of bsal with other variables Senior and education predict bsal well. We want to control for them when judging gender effect. 4000 5000 6000 7000 8000 bsa l 60 65 70 75 80 85 90 95 100 senior Linear Fit Bivariate Fit of bsal By senior 4000 5000 6000 7000 8000 bsa l 300 400 500 600 700 800 age Linear Fit Bivariate Fit of bsal By age 4000 5000 6000 7000 8000 bsa l 7 8 9 10 11 12 13 14 15 16 17 educ Linear Fit Bivariate Fit of bsal By educ 4000 5000 6000 7000 8000 bsa l -50 0 50 100 150 200 250 300 350 400 exper Linear Fit Bivariate Fit of bsal By exper Fit Y by X Group
Multiple regression model For any combination of values of the predictor variables, the average value of the response (bsal) lies on a straight line: Just like in simple regression, assume that ε follows a normal curve within any combination of predictors.

