Unformatted text preview: Multiple Regression Part 2: Inference
Sections 14.4 and (kind of) 14.5 Multiple Regression 2: Inference 1 Book's Example
n = 34 stores in a chain Y = Monthly sales of the OmniPower bar X1 = Price of the bar in cents. X2 = Instore promotion expenditures (signs, displays, coupons, etc.) Used three prices and three promo levels
Multiple Regression 2: Inference 2 Twovariable regression PhStat Output Regression Statistics Multiple R 0.8705 R Square 0.7577 Adjusted R Square 0.7421 Standard Error 638.0653 Observations 34 ANOVA df Regression Residual Total 2 31 33 Coefficients 5837.5208 53.2173 3.6131 SS 39472730.77 12620946.67 52093677.44 Standard Error 628.1502 6.8522 0.6852 MS 19736365.39 407127.31 F Significance F 48.4771 0.0000 Intercept Price Promotion t Stat Pvalue Lower 95% Upper 95% 9.2932 0.0000 4556.3999 7118.6416 7.7664 0.0000 67.1925 39.2421 5.2728 0.0000 2.2155 5.0106 Multiple Regression 2: Inference 3 Tests on the j Values
H0: j = 0 (Xj does not help explain Y over and above the other Xs) H1: j 0 (Xj contains predictive information beyond that of the other Xs) Multiple Regression 2: Inference 4 OmniPower Example
Yhat = 5837.5  53.2173 Price + 3.6131 Promo (6.8522) (0.6852) With n=34, k=2, df = 31. Use t = 2.0395 Multiple Regression 2: Inference 5 Do the estimates make sense?
Yhat = 5837.5  53.2173 Price + 3.6131 Promo The coefficient estimates seem to. As price increases, ______________. As promotion expenditure increases, _____________. The correlations agree:
Sales Sales Price Promotion 1 0.7351 0.5351 Price 1 0.0968 Promotion 1
6 Multiple Regression 2: Inference This isn't always the case Here, the correlation between X1 and X2 was only .09, so the two Xs were more or less independent of each other. With correlated Xs, the coefficient estimates can get mixed up, even to the point of having the wrong sign. Significance may be affected. Multiple Regression 2: Inference 7 Our large data set
Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt PctFAT Muscle LoinEye FatDepth 1.000 0.187 1.000 0.154 0.758 1.000 0.608 0.049 0.063 0.566 0.025 0.127 0.094 0.861 0.833 0.083 0.459 0.397 0.004 0.719 0.655 0.523 0.085 0.031 0.081 0.758 0.772 LiveWt Yield SpecGrav BllyDpth Leanness BellyWt PctFAT 1.000 0.901 0.076 0.190 0.186 0.563 0.272 1.000 0.122 0.282 0.158 0.571 0.279 1.000 0.321 0.704 0.000 0.722 1.000 0.355 0.004 0.318 1.000 0.043 0.699 1.000 0.252 1.000 We appear to have at least four strong predictors. Multiple Regression 2: Inference 8 Ttests in PorkBelly data
Intercept Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt Coefficients Standard Error 39.5222 16.9797 0.6180 0.3100 1.5859 1.7535 3.0111 2.9164 0.0305 0.1189 0.1675 0.1381 3.3317 2.7291 2.9537 3.6726 0.4098 0.2943 0.6201 0.4721 t Stat Pvalue 2.3276 0.0258 1.9939 0.0540 0.9044 0.3720 1.0325 0.3089 0.2565 0.7990 1.2130 0.2333 1.2208 0.2303 0.8043 0.4267 1.3924 0.1726 1.3134 0.1976 Even though R2 is 78%, nothing significant.
Multiple Regression 2: Inference 9 Significance of Individual Xs
Once the model Ftest shows overall significance, we can then use Ttests on individual X's Do we need them all? Can we eliminate some and simplify? Multiple Regression 2: Inference 10 Use Tests One at a Time
The tests should be used one at a time.
T1 can tell you to drop X1 and keep X2X9 T2 can tell you to drop X2 and keep X1 and X3X9 Together, they don't necessarily tell you to drop both and keep only X3X9 Multiple Regression 2: Inference 11 Interval estimation
Interval for one of the slope coefficients bj tnk1 S(bj) Shows estimated effect on Y of a 1unit increase in Xj assuming all other Xs stay the same
Multiple Regression 2: Inference 12 Interval for OmniPower promotion effect
Yhat = 5837.5  53.2173 Price + 3.6131 Promo (6.8522) (0.6852) With n=34, k=2, df = 31. Use t = 2.0395 Multiple Regression 2: Inference 13 Prediction PHstat creates a new sheet to do prediction on. To make a single prediction, fill in the values of the Xs for the prediction Gives both a prediction interval and confidence interval.
Multiple Regression 2: Inference 14 Prediction example
Predict for price=75 and promo=300 Fill in the blue area Multiple Regression 2: Inference 15 Prediction output
t Statistic Predicted Y (YHat) 2.039513 2930.138 For Average Predicted Y (YHat) Interval Half Width 259.8325 Confidence Interval Lower Limit 2670.305 Confidence Interval Upper Limit 3189.97 For Individual Response Y Interval Half Width 1327.029 Prediction Interval Lower Limit 1603.109 Prediction Interval Upper Limit 4257.167 Multiple Regression 2: Inference 16 14.5 Testing Portions of a model Sometimes we have a subset of our X variables that we want to treat as a group. We feel they should all be in the model together or deleted together. There is a partial F test you can use to determine "in" versus "out". Multiple Regression 2: Inference 17 The book's approach In 14.5 they talk about the "group" consisting of only one variable. I don't like this section. In 14.6 they expand the approach to a group of variables that measure the same type of effects. We will look at a group that is more ad hoc. Multiple Regression 2: Inference 18 Our "Group"
Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt PctFAT Muscle LoinEye FatDepth 1.000 0.187 1.000 0.154 0.758 1.000 0.608 0.049 0.063 0.566 0.025 0.127 0.094 0.861 0.833 0.083 0.459 0.397 0.004 0.719 0.655 0.523 0.085 0.031 0.081 0.758 0.772 LiveWt Yield SpecGrav BllyDpth Leanness BellyWt PctFAT 1.000 0.901 0.076 0.190 0.186 0.563 0.272 1.000 0.122 0.282 0.158 0.571 0.279 1.000 0.321 0.704 0.000 0.722 1.000 0.355 0.004 0.318 1.000 0.043 0.699 1.000 0.252 1.000 Four Xs have a strong correlation with Y. Do we need any of the other five variables? (Muscle, LiveWt, Yield, BellyDpth and BellyWt)
Multiple Regression 2: Inference 19 The strategy
If we wanted to drop all 5 Xs together, we would need to test for their contribution as a group. 1. Run a regression with all 9 Xs 2. Run a regression with just the remaining four predictors. How much predictive power is lost? Does R2 go down a lot?
Multiple Regression 2: Inference 20 Notation Let SSR( list of Xs ) denote the regression sum of squares with the listed group of variables in the model. For example, SSR(X1 X4 X7) comes from a regression on X1 X4 and X7 only. Multiple Regression 2: Inference 21 The "full" and "reduced" models Define the full model to be the one with all 9 predictors. We need SSR(all variables). to be the one with just the 4 remaining variables. Define the reduced model Get SSR( LoinEye, FatDepth, SpecGrav, Leanness) Multiple Regression 2: Inference 22 Partial F Statistic Look at the decrease in explanatory power SSR(full model) SSR(reduced model) to see how much of the explained variation is lost. Divide this by g, the number of variables in the group we are eliminating. Put this in ratio to the MSE of the full model. This is the partial F statistic.
Multiple Regression 2: Inference 23 The Hypothesis Test
If we are proposing to delete the g variables, it implies this test: H0: 1 = 2 = ... = g = 0 H1: At least one of the g variables matters Multiple Regression 2: Inference 24 Test Statistic
{SSR(full)  SSR(reduced)} / g Partial F = MSE(full) The theory (poof!) says this has an F distribution with g numerator and (nK1) denominator degrees of freedom Multiple Regression 2: Inference 25 We need two regression runs
We already have the full model.
ANOVA df Regression Residual Total 9 35 44 SS 843.9881 235.8599 1079.8480 MS 93.7765 6.7389 SSR(full model) = ________ MSE(full model)= _______
Multiple Regression 2: Inference 26 Getting the second is trickier For multiple regression in PhStat or the template, the Xs have to be in consecutive columns. We should make a copy of the X columns and delete the ones we don't want. Keep only LoinEye, FatDepth, SpecGrav, and Leanness Multiple Regression 2: Inference 27 New model
ANOVA df Regression Residual Total 4 40 44 S S MS F 747.9859 186.9965 22.5391 331.8621 8.2966 1079.8480 tS tat Pvalue 5.6630 0.0000 1.9304 0.0607 2.8360 0.0071 0.6231 0.5368 1.7995 0.0795 Intercept LoinEye FatDepth S pecGrav Leanness C oefficients S tandard Error 55.5915 9.8165 2.7652 1.4325 7.0551 2.4876 1.4108 2.2644 0.5458 0.3033 SSR( LoinEye, FatDepth, SpecGrav, Leanness) = _____ Multiple Regression 2: Inference 28 The Partial F for 5 variables
Ho: None of the 5 other variables are significant H1: at least one in the group useful F= The correct F dist to test against is 5 numerator and 35 denominator degrees of freedom. From Table E.5, the value that is significant is about 2.50 at a significance level of .05
Multiple Regression 2: Inference 29 Result This indicates that dumping all 5 variables was too drastic of a step. At least one of them has significance and should be retained. We will need some other techniques to find the "right" Xs in this data. Multiple Regression 2: Inference 30 ...
View
Full Document
 Spring '08
 Thompson
 Regression Analysis, Statistical hypothesis testing, Prediction interval

Click to edit the document details