Unformatted text preview: Business Statistics (BUSA 3101)
Dr. Lari H. Arjomand
[email protected] Slide 1 Chapter 14 Multiple Regression
s Multiple Regression Model s Least Squares Method s Multiple Coefficient of Determination s Model Assumptions s Testing for Significance s Using the Estimated Regression Equation for Estimation and Prediction s Qualitative Independent Variables Slide 2 Multiple Regression Model Everything you have learned about simple linear
Everything
regression model is a special case of multiple regression.
regression
is
multiple The interpretation of regression results is similar.
The Since all calculations are done by computer, there is
no extra computational burden.
no In fact, statisticians do not make any distinction
between simple regression and multiple regression—the
just call it regression.
just
regression. Slide 3 Multiple Regression Model Multiple regression is required when a singlepredictor model (simple regression model) is inadequate to describe the true relationship between Y (the response variable or the dependent variable) and its potential predicators (the independent variables x1, x2, . . . xp ). Slide 4 Multiple Regression Model
The equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . xp and an error term is called the multiple regression model.
y = β0 + β1x1 + β2x2 + . . . + βpxp + ε
where:
β0, β1, β2, . . . , βp are the parameters, and
ε is a random variable called the error term Slide 5 Multiple Regression Equation
The equation that describes how the mean value of y is related to x1, x2, . . . xp is called the multiple regression equation.
E(y) = β0 + β1x1 + β2x2 + . . . + βpxp Slide 6 Estimated Multiple Regression Equation
A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters β0, β1, β2, . . . , βp.
The estimated multiple regression equation is:
^
y = b y = b0 + b1x1 + b2x2 + . . . + bpxp Slide 7 Estimated Multiple Regression Equation
bP is the net change in Y for each unit change in XP
holding all other values constant, where p=1 to k.
Note that bP is called a regression coefficient.
The least squares
estimation is used to
estimation
develop this equation. Because determining
b1, b2, etc. is very tedious,
a software package such
as Excel is
as
recommended. ^
y = b y = b0 + b1x1 + b2x2 + . . . + bpxp Slide 8 Multiple Regression Model
s Example: Programmer Salary Survey
A software firm collected data for a sample
software
of 20 computer programmers. A suggestion
computer
was made that regression analysis could
be used to determine if salary was related
be
to the years of experience and the score
on the firm’s programmer aptitude test.
The years of experience, score on the aptitude
The
test, and corresponding annual salary ($1000s) for a
test,
sample of 20 programmers is shown on the next
sample
slide. Slide 9 Multiple Regression Model (Example Continued)
Exper. Score Salary Exper. Score Salary 4
7
1
5
8
10
0
1
6
6 78
100
86
82
86
84
75
80
83
91 24
43
23.7
34.3
35.8
38
22.2
23.1
30
33 9
2
10
5
6
8
4
6
3
3 88
73
75
81
74
87
79
94
70
89 38
26.6
36.2
31.6
29
34
30.1
33.9
28.2
30 Slide 10 Multiple Regression Model (Example Continued)
Suppose we believe that salary (y) is
related to the years of experience (x1) and the score on
related
the programmer aptitude test (x2) by the following
the
regression model:
regression
y = β0 + β1x1 + β2x2 + ε
where
y
x1
x2 = annual salary ($1000)
= years of experience
= score on programmer aptitude test Slide 11 Solving for the Estimates of β0, β1, β2
(Example Continued)
Least Squares
Output Input Data
x1 x2 y 4 78 24 7 100 43 . . . . . . 3 89 30 Excel is used
for Solving
this Multiple
Regression
Problem b0 = b1 =
b2 = R2 =
etc. Slide 12 Using SWTStat+ to Solve the Problem
Creating Data Area Slide 13 Using SWTStat+ to Solve the Problem Slide 14 Using SWTStat+ to Solve the Problem
(Results) Multiple Coefficient of
Multiple
Determination R2
Determination Note: results are rounded to two decimal places.
Slide 15 Solving for the Estimates of β0, β1, β2 (Example Continued)
s Excel’s Regression Equation Output
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) OR Ý = 3.174 + 1.404 X11 + 0.251 X 22
OR Ý = 3.174 + 1.404 X + 0.251 X Note: Predicted salary will be in thousands of dollars. Slide 16 Interpreting the Coefficients ^
y = b y = b0 + b1x1 + b2x2 + . . . + bpxp In multiple regression analysis, we interpret each
In
regression coefficient as follows:
regression
bP represents an estimate of the change in y
corresponding to a 1unit change in xP when all
1unit
when
other independent variables are held constant.
other Slide 17 Interpreting the Coefficients (Example Continued)
b11 = 1. 404
b = 1. 404
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
Ý = 3.174 + 1.404 X11 + 0.251 X 22
Ý = 3.174 + 1.404 X + 0.251 X Conclusion: Salary is expected to increase by
Conclusion:
$1,404 for each additional year of experience (when
$1,404
the variable score on programmer attitude test is held
score
constant).
constant). Slide 18 Interpreting the Coefficients (Example Continued)
b22 = 0.251
b = 0.251
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
Ý = 3.174 + 1.404 X11 + 0.251 X 22
Ý = 3.174 + 1.404 X + 0.251 X Conclusion: Salary is expected to increase by
Conclusion:
$251 for each additional point scored on the
$251
programmer aptitude test (when the variable
years of experience is held constant).
years Slide 19 Multiple Correlation or Coefficient of Multiple
Multiple
Determination (R2)
Determination Recall that for simple regression we can compute the
simple coefficient of determination r2. In multiple regression, because there are at least two
independent (explanatory) variables, we compute the multiple
correlation or coefficient of multiple determination R2. R2 represents the proportion of the variation in Y that is
explained by the set of independent (explanatory) variables. Slide 20 Multiple Correlation Coefficient, R
s s s The strength of the association is measured by the
The
Multiple Correlation Coefficient, R.
Multiple
R.
R can be any value from 0 to +1.
+1
he
• The closer R is to one, tthe stronger the linear
association is.
f
hen
• IIf R equals zero, tthen there is no linear
association between the dependent variable (Y)
(Y)
and the independent variables (Xp).
(X
R is never a negative value.
Unlike the simple correlation coefficient, r, which tells both the
strength and direction of the association, R tells only the
strength of the association. Slide 21 Assumptions In Multiple Regression and Correlation
The independent variables
and the dependent variable
have a linear relationship. The dependent
variable must be
continuous and at
least intervalscaled. The residuals should
follow the normal
distributed with mean 0.
The variation in (YY’) or
residual must be the same
residual
for all values of Y. When
this is the case, we say the
difference exhibits
homoscedasticity.
homoscedasticity Successive values of the
dependent variable must
be uncorrelated. Slide 22 Using SWTStat+ to Solve the Problem
Creating Data Area Slide 23 Using SWTStat+ to Solve the Problem Slide 24 Using SWTStat+ to Solve the Problem
(Results) Multiple Coefficient of
Multiple
Determination R2
Determination Note: results are rounded to two decimal places.
Slide 25 Testing for Significance
IIn simple linear regression,, the F and tt ttests provide
n simple linear regression the F and ests provide
simple
provide
simple
provide
the same conclusion.
tthe same conclusion.
the
he
IIn multiple regression,, tthe F and tt ttests have different
n multiple regression the F and ests have different
multiple
he
have
multiple
the
have
purposes.
purposes.
purposes.
purposes. Slide 26 Testing for Significance: F Test
The F ttest is used to determine whether a significant
The F est is used to determine whether a significant
relationship exists between the dependent variable
rrelationship exists between the dependent variable
relationship
elationship
and the set of all tthe independent variables..
and the set of all he independent variables
all
all
The F ttest is referred to as the ttest for overall
The F est is referred to as the est for overall
overall
overall
significance..
significance
significance
significance Slide 27 Testing for Significance: t Test
(Individual Variables)
IIff the F ttest shows an overall significance,, the tt test is
the F est shows an overall significance the ttest is
overall
overall
test
est
used to determine whether each of the individual
used to determine whether each of the iindividual
individual
ndividual
independent variables is significant.
iindependent variables is significant.
independent
ndependent
A separate tt test is conducted for each of the
A separate ttest is conducted for each of the
test
est
independent variables in the model.
iindependent variables in the model.
independent
ndependent
We refer to each of these tt ttests as a ttest for individual
We refer to each of these ests as a est for iindividual
We
We
individual
ndividual
significance..
significance
significance
significance Slide 28 Testing for Significance Note that all computer packages report the tstatistic
Note
(actual t) and the pvalue for each independent variable.
(actual Also note that to test for a zero coefficient (H0: β j = 0) to
we could alternatively construct a confidence interval for
we
the true coefficient β j and see whether the interval the
includes zero. Excel provides all these information,
you only have to know how to interpret
the results Slide 29 Testing for Overall Significance: F Test
Hypotheses H0: β1 = β2 = . . . = βp = 0 Ha: One or more of the parameters
is not equal to zero.
is Test Statistics F = MSR/MSE Rejection Rule
Where p = number of
the independent
variables in the
regression equation.
regression Reject H0 if pvalue < α or if
actual F > critical Fα where critical Fα iis based on
s
where critical
an F distribution with p d.f. in the
d.f.
numerator and n  p  1 d.f. in the
denominator.
denominator. Slide 30 NOTE
NOTE As we indicated, in the case of multiple regression
As
model, the value of Fα (critical value of F ) is based on an F
model the
critical
distribution with p degrees of freedom in the numerator and
distribution
n – p  1 degrees of freedom in the denominator. Where p = number of the independent variables in the
regression equation, and n = sample size.
regression F Test for Overall Significance (Example Continued)
s Excel’s ANOVA Output
A 32
33
34
35
36
37
38 B C D E F ANOVA
df
SS
MS
F
Significance F
Regression
2 500.3285 250.1643 42.76013 2.32774E07
Residual
17 99.45697 5.85041
Total
19 599.7855 MSR MSE Actual value of F = MSR / MSE
Actual
MSR pvalue used to test for
overall significance
overall Slide 32 F Test for Overall Significance (Example Continued)
Test StatisticsF = MSR/MSE = 250.16/5.85 = 42.76
This is
from our
Excel Output Conclusion Since pvalue < .05, so we can reject H0.
value
so
reject
(Also, actual F = 42.76 > critical F =3.59)
(Also, Slide 33 Testing for Significance: t Test
(Individual Variables)
Hypotheses H 0 : βi = 0
H a : βi ≠ 0 Test Statistics t= bi
sbi This is
from our
Excel Output Reject H0 if pvalue < α or
Rejection Rule iif actual t < critical tα/2 or actual t > tα/2
f actual
or
where tα/2 iis based on a t distribution
s
where
with n  p  1 degrees of freedom.
with p = number of the
independent variables
in the regression
equation.
equation. Slide 34 Testing for Significance: t Test
(Individual VariablesExample)
Hypotheses H 0 : βi = 0
H a : βi ≠ 0 Rejection Rule For α = .05 and d.f. = 17, t.025 = 2.11
Reject H0 if pvalue < .05 or if actual t > critical t.025 = 2.11 Slide 35 Using SWTStat+ to Solve the Problem
(Results) Actual t
Values Actual F
Value Note: results are rounded to two decimal places.
Slide 36 Using the Regression Equation
for Estimation and Prediction
The procedures for estimating tthe mean value of y
The procedures for estimating he mean value of y
The
The
and predicting an individual value of y iin multiple
and predicting an individual value of y n multiple
predicting
predicting
regression are similar to those in simple regression.
rregression are similar to those in simple regression.
regression
egression We substitute the given values of x11, x22, . . . , xpp into
We substitute the given values of x , x , . . . , x into
the estimated regression equation and use the
tthe estimated regression equation and use the
the
he
corresponding value of y as the point estimate.
corresponding value of y as the point estimate. Slide 37 Qualitative Independent Variables
IIn many situations we must work with qualitative
n many situations we must work with qualitative
qualitative
qualitative
independent variables such as gender (male, female),
iindependent variables such as gender (male, female),
such
independent
ndependent
such
method of payment (cash, check, credit card), etc.
method of payment (cash, check, credit card), etc.
method
method For example, x22 might represent gender. Where, for
For example, x might represent gender. Where, for
might
might
example, x22 = 0 indicates male and x22 = 1 indicates female..
example, x = 0 indicates male and x = 1 indicates female
example,
example,
IIn this case, x22 iis called a dummy orr indicator variable..
n this case, x s called a dummy or iindicator variable
dummy or indicator
dummy o ndicator Slide 38 Qualitative Independent Variables
Example (Continued): Programmer Salary Survey
As an extension of the problem involving the
extension
computer programmer salary survey, suppose
that management also believes that the
annual salary is related to whether the
annual
whether
iindividual has a graduate degree in
ndividual
graduate
computer science or information systems.
The years of experience, the score on the programmer
aptitude test, whether the individual has a relevant
graduate degree, and the annual salary ($1000) for each
of the sampled 20 programmers are shown on the next
of
20
slide. s Slide 39 Qualitative Independent Variables
(Example Continued)
Exper. Score Degr. Salary
4
7
1
5
8
10
0
1
6
6 78
100
86
82
86
84
75
80
83
91 No
Yes No
Yes
Yes
Yes No No No
Yes 24
43
23.7
34.3
35.8
38
22.2
23.1
30
33 Exper. Score Degr. Salary
9
2
10
5
6
8
4
6
3
3 88
73
75
81
74
87
79
94
70
89 Yes No
Yes No No
Yes No
Yes No No 38
26.6
36.2
31.6
29
34
30.1
33.9
28.2
30 Slide 40 Qualitative Independent Variables
(Example Continued)
y = b0 + b1x1 + b2x2 + b3x3
where:
^ y = annual salary ($1000) y = annual salary ($1000) x1 = years of experience x2 = score on programmer aptitude test x3 = 0 if individual does not have a graduate degree 1 if individual does have a graduate degree
x3 is a dummy variable Slide 41 Using SWTStat+ to Solve the Problem
(Creating Data Area) Slide 42 Using SWTStat+ to Solve the Problem Slide 43 Using SWTStat+ to Solve the Problem
(Results) Variance Inflation Factor
Variance Multiple correlation coefficient R
Multiple Slide 44 Variance Inflation Factor (VIF)
s s s s
s s
s Variance inflation factor (VIF) measures the impact of multicollinearity
Variance
(VIF measures
(MC) among the X's (i.e., the independent variables) iin a regression
n
(MC)
(i.e.,
model on the precision of estimation.
In other words, multicollinearity can result in numerically unstable
In
estimates of the regression coefficients (small changes in X can result
in large changes to the estimated regression coefficients).
in
estimated
The higher VIF, the higher the variance of βi and the grater the
The
VIF the
chance of finding βi insignificant.
chance
Typically a VIF value greater than 10 is of concern.
Typically VIF
is
If the multiple correlation coefficient (Ri ) equals zero, then VIFi equals
If
1. This is the minimum value. There are a number of approaches to dealing with MC. There
MC.
One approach is to delete one or more of the independent variables
One
from the regression equation. Slide 45 Using SWTStat+ to Plot the Problem Slide 46 Using SWTStat+ to Plot the Problem Slide 47 Qualitative Independent Variables
(Example Continued)
s What salary would you estimate (predict) for a person with no graduate degree in IT, who she has 3 years of experience, and with a score of 76 on the programmer aptitude test?
y = b0 + b1x1 + b2x2 + b3x3
= From Excel Output y = 7.94 + 1.15 x1 + 0.2 x2 + 2.28 x3
= y = 7.94 + 1.15 (3) + 0.2 (76) + 2.28 (0) = 26.59 x $1000 = $26,590 Slide 48 Using SWTStat+ to Predict Slide 49 Using SWTStat+ to Predict
(Results) $26,355 Compared to $26,590 Slide 50 Thinking Challenge Example Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches. Student
Solve this
Problem Oil (Gal) Temp(0F) I nsulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10 Slide 51
Continued >> Thinking Challenge Example (Continued) s
1.
2.
3.
4.
5. 6. 7. Questions Explain your regression coefficients
Find r and r2 and explain your answers
Do an overall hypothesis testing using α = 0.05
Do a single test for each of the regression coefficients using α = 0.05
Predict the amount of heating oil used if the average temperature is 24 0F and amount of insulation used is 4 inches Predict the amount of heating oil used if the average temperature is 75 0F and amount of insulation used is 12 inches
Construct a confidence interval using α = 0.05 Slide 52 Thinking Challenge Example (Continued) s
8.
9.
10. Questions (Continued):
Find SST, SSE, SSR and explain your answers
Find Standard Error of Estimation and explain your answer
Why we use regression equation model
—explain some applications of regression analysis (model.) Slide 53 FROM: My Family
To: All of You Slide 54 Business Statistics By now, you
should be ready
to move to your
new house!!!! THE
END Lari Student
Student Slide 55 End of Chapter 14 Slide 56 ...
View
Full Document
 Fall '09
 Statistics, Regression Analysis, independent variables, Qualitative Independent Variables

Click to edit the document details