This preview shows pages 1–14. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Business Statistics
Fifth Edition Ken Black Chapter 15: Multiple Regression Analysis These notes are not to be used without the written permission of F. B. Alt 151 15.1 The Multiple Regression Model [Black, page 594] Example (with two predictors):
y = Sales revenue per region (tens of thousands of dollars)
x1 = advertising expenditures (thousands of dollars) x2 = median household income (thousands of dollars) The data are: Region Sales (y) Adv Exp (x1) Income (X2)
A 1 1 32
B 1 2 38
C 2 1 42
D 2 3 35
E 3 2 41
F 3 4 43
G 4 3 46
H 4 5 44
I 5 5 48
J 5 6 45 These notes are not to be used without the written permission of F. B. Alt 15~2 A graphical representation follows. 3D Scatterplot of Sales vs Income vs Adv Exp 5. 5
4‘0
Sales 2.5 1.0 Objective: Fit a through the points 0 The True or Idealized Model:
EO’) : :60 + 181x1+ + 16ka
or yzﬂo +,le] +...+,kak +8 where 8 is the error term in the true model. 0 Interpretation of any Bj 0 Change in E(y) per unit change in X], when all other independent Variables are held 0 Bi is called the partial slope, j = 1,2, ...,k These notes are not to be used without the written permission of F. B. Alt 15—3 Determining the Multiple Regression Equation (Fitted line) 0 Criterion used to estimate ﬂ'S: I Minimize the sum of squared residuals 0 Use software to do the calculations Example (y = Sales, x1 = Advertising Expenditures, x2 = Median Household Income):
The Minitab output follows. Regression Analysis: Sales versus Adv Exp and Income The regression equation is Sales =  5.09 + 0.416 Adv Exp + 0.163 Income Predictor Coef SE Coef T P VIF
Constant —5.091 1.720 2.96 0.021 Adv Exp 0.4158 0.1367 3.04 0.019 1.8
Income 0.1633 0.0475 3.44 0.011 1.8 s = 0.541142 RSq = 89.8% RSq(adj) = 86.8% Analysis of Variance Source DF SS MS F P
Regression 2 17.9502 8.9751 30.65 0.000
Residual Error 7 2.0498 0.2928 Total 9 20.0000 These notes are not to be used without the written permission of F. B. Alt 154 The ﬁtted model is:
)7 = —5.09 + 0.416x1 = 0.163x2 VS. )7 = 0681+ 0.725x1 and JA/ 2 ‘769 + 0258352 0 The coefﬁcient of an independent variable xj in a multiple regression equation does not, in general, equal the coefﬁcient that would apply to that variable in a simple linear regression. o In multiple regression, the coefﬁcient refers to the effect of changing that xJ variable while other independent variables stay 0 In simple linear regression, all other potential independent variables are ignored. Exalee (y = Sales, x1 2 Advertising Expenditures, x2 = Median Household Income):
Interpretation of 0.416: An additional unit (or an increase of $1,000) of Advertising Expenditures leads to 0.416 increase in Sales when Median Household Income is fixed, i.e., regardless of whether x2 is 32 or 48. Does this seem reasonable? If Advertising Expenditures are increased by 1 unit, do you expect Sales to increase by 0.416 units in a region that has income of $32,000 as well as for a region that has income of $48,000? These notes are not to be used without the wn'tten permission of F. B. Alt 155 15.2 Inferences in Multiple Regression [Black, page 601] 0 Objective: Build a model: . as few predictors as necessary 0 Must now assume errors in true model are normally distributed 0 F test for Overall Model [Black, page 602] 0 True or Idealized Model: EU) 2 ,30 + [31351 + + ,3ka HOzﬂlzﬂ2:”’ﬂk :0 VS. Ha :at least one ,8]. ¢ 0 0 Test Statistic: MS (Regression) SS (Re gression) / k F = MS (Residual Error) _ SS(Re SidualError) /(n — k — 1) 0 Concept: If SS (Regression) is large relative to SS (Residual), this indicates there is real predictive value in [some of] the independent variables. 0 Decision Rule: Reject Ho if F > Faakmm or if pvalue < (1 These notes are not to be used without the written permission of F. B. Alt 156 Examgle (Sales vs. Adv. Exp. and Income):
True or Idealized Population Model: E (y) = ’80 + ﬂlxl + ,Bzxz The Minitab output follows: Analysis of Variance
Source DF SS MS F P Regression 2 17.9502 8.9751 30.65 0.000
Residual Error 7 2.0498 02928 Total 9 20.000
Test H0: ,8, =ﬂ2 = 0 vs. Ha: At least oneﬂj i 0 at the 5% level.
Since F = > F 05,2,7 : My ___~ H0131 3,82 = 0 at the 5% level. Or since p—value = < .05, H0:,81 = ,6’2 = 0 at the 5% level. Implication: of the x’s has some predictive power. These notes are not to be used without the written permission of F. B. Alt 157 0 ttest for Signiﬁcance of an Individual Predictor [Black, page 603]
o Ho:/3j=0vs.Ha:ﬂj¢0,j = 1,2, ...,k 0 H0 implies that xj has no additional predictive value as the last predictor in to a model that contains all the other predictors
EU”) : :60 + ﬂlxl + m+ [Bka
o In Minitab notation, T = (Coef  0) / (SE Coef)
0 Decision Rule: Reject H0 if IT] > t,025,nk_1 or reject if pvalue < (1. 0 Warning: Limit the number of ttests to avoid a high overall Type 1 error rate. These notes are not to be used without the written permission of F. B. Alt 158 Examgle (Sales vs. Adv. Exp. and Income): The Minitab output follows: Predictor Coef SE Coef T P VIF
Constant —5.091 1.720 2.96 0.021 Adv Exp 0.4158 0.1367 3.04 0.019 1.8
Income 0.1633 0.0475 3.44 0.011 1.8 Test H0: [3, = 0 vs. Ha: ﬂ] 75 O, at the 5% level. Since T = > 1:025] = 2.365, H0: ,6] = 0 at the 5% level. Or since pvalue = . < .05, H0:,81 = 0 at the 5% level. Implication: Advertising Expenditures provides predictive value to a model having Income as a predictor. These notes are not to be used without the written permission of F. B. Alt 15—9 15.3 Residuals, Standard Error of the Estimate and Coefﬁcient of Determination o Residuals [Back, page 606] o Residuals are prediction errors in the sample. 0 The residual for an observation is: : y,‘ — yl‘ Exalee (Sales vs. Adv. Exp. and Income): The ﬁtted values and residuals for all regions follow. Sales Adv Exp Income FITS RESI
1 1 32 0.702513 0.297487
1 2 38 2.041841 4.04184
2 1 42 2.211894 O.21189
2 3 35 2.022726 —0.02273
3 2 41 2.494655 0.505345
3 4 43 3.663929 —0.66393
4 3 41 2.928354 1.071646
4 5 44 4.248566 —O.24857
5 5 48 4.852319 0.147681
5 6 45 4.833204 0.166796 0 How much variation is there in the residuals? [Black, pages 607608] I Terminology for Se: I Or, residual standard deviation These notes are not to be used without the written permission of F. B. Alt 1510 0 Like any other standard deviation, the residual standard deviation
may be interpreted by the Empirical Rule. About 95% of the prediction errors will fall Within +/— 2(standard
deviations) of the mean error (always ). Example (Sales vs. Adv. Exp. and Income): 0 A residual standard deviation of means that about 95% of the prediction errors will be less than 0 +/— 2( ) = +/ 1.082. 0 Did this occur? (Notes, page 10) These notes are not to be used without the written permission of F. B. Alt 1511 o Coefﬁcient of Determination . 2
0 Notation: R 0 As in simple regression, SSR SSE R2=——= ——— SST SST Examgle (Sales vs. Adv. Exp. and Income):
State the value of the coefﬁcient of determination.
From the output on page 12—4, “R—Sq = %” Interpretation: of the variation in Sales is explained by a multiple regression model with Adv. Exp. and Income as predictors. 0 Adjusted Coefﬁcient of Determination _SSE/(n(k+1)) i=1
R SST/(nI)
:1 "—1 .5313:
nﬂc+1) SST o SSE and SST are each divided by their degrees of freedom. . 2 2
o It is always true that Ra < R These notes are not to be used without the wn'tten permission of F. B. A/t 15—12 0 Why use Rf ? [Black, page 609] “ As additional independent variables are added to a regression model, 2 . . . .
the value of R cannot decrease, and in most cases, it Will increase.” ' SST is ﬁxed, regardless of the number of predictors. ' SSE decreases when more predictors are used 2 . .
I :> R increases when more predictors are used. 2 . .
0 However, Ra can decrease when another predictor is added to the fitted 2 .
model, even though R increases. 0 The following example illustrates this. Examgle (Hypothetical):
For a ﬁtted model with 10 observations, suppose SST = 50.
Also suppose when k = 2, SSE = 5 and when k = 3, SSE = 4.5. R2 R2 k: 1—32 9 1—(3)[3]=.871
50 7 so =3 i—i2 91 1[3][5'—5]=.865
50 6 50 Even though there has been a modest increase in R2, RaZ has decreased. Examgle (Sales vs. Adv. Exp. and Income):
S = 0.541142 R—Sq = 89.8% R—Sq(adj) = 86.8% These notes are not to be used without the written permission of EB. Alt 1513 Summary of Chapter 15 o The Multiple Linear Regression Model , 0 Interpreting the slope coefﬁcient of a single predictor in multiple
regression 0 Using the F statistic to test the overall utility of the predictors 0 Using the t—test to test the additional value of a single predictors o The interpretation of the t—test o The adjusted coefﬁcient of determination and R2 0 The detection of multicollinearity and its impact These notes are not to be used without the written permission of F. B. Alt 1514 ...
View
Full
Document
 Fall '08
 staff

Click to edit the document details