Unformatted text preview: Multiple Regression Part 1: Introduction
14.114.3 Multiple Regression 1 1 Multiple Regression The data set consists of n observations (often called cases). Each case has Y and two or more Xs: X1, X2, ..., Xk Some Xs may be functions of other Xs: X3 = X1/X2 or X3 = X1X1
Multiple Regression 1 2 Book's Example
n = 34 stores in a chain Y = Monthly sales of the OmniPower bar X1 = Price of the bar in cents. X2 = Instore promotion expenditures (signs, displays, coupons, etc.) Used three prices and three promo levels
Multiple Regression 1 3 Exploratory Tools Multiple scatter plots can use PhStat or Excel directly Correlations use Correl function several times Correlation matrix (requires Data Analysis Tool Kit) Multiple Regression 1 4 OmniPower Sales
Sales versus Price
6000 5000 4000 Sales versus Promotion Budget
6000 5000 4000 Sales 3000 2000 1000 0 50 60 70 80 90 100 Sales 3000 2000 1000 0 100 200 300 400 500 600 700 Price Promotion Budget Sales Sales Price Promotion 1 0.7351 0.5351 Price 1 0.0968 Promotion 1
5 Multiple Regression 1 Quick impressions The relationships appear sensible: sales increase as price drops or promotion rises. Both variables look like useful predictors with price (r = .735) better than promotion (r = .535). There is very little correlation between price and promotion, which is good. Multiple Regression 1 6 Estimation of Parameters
Use the least squares approach again ^ Y = b0 + b1X1 + b2X2 + ... + bkXk
Find (b0, b1, b2, ..., bk) to minimize the sum of squared residuals Multiple Regression 1 7 Twovariable regression PhStat Output Regression Statistics Multiple R 0.8705 R Square 0.7577 Adjusted R Square 0.7421 Standard Error 638.0653 Observations 34 ANOVA df Regression Residual Total 2 31 33 Coefficients 5837.5208 53.2173 3.6131 SS 39472730.77 12620946.67 52093677.44 Standard Error 628.1502 6.8522 0.6852 MS 19736365.39 407127.31 F Significance F 48.4771 0.0000 Intercept Price Promotion t Stat Pvalue Lower 95% Upper 95% 9.2932 0.0000 4556.3999 7118.6416 7.7664 0.0000 67.1925 39.2421 5.2728 0.0000 2.2155 5.0106 Multiple Regression 1 8 Three Regressions Because Price and Promotion are not very Sales = 7512 when both are in the model the R2 is R = 54.0% correlated,  56.7 Price almost additive, and the slope coefficients are about the same across models.
Multiple Regression 1 9 Interpretation of bj values Price effect, first equation: Price effect, third equation: Multiple Regression 1 10 Formulae are very complex
Few "first course" texts show the formulae, even for k=2 (the simplest of multiple regressions).
"Second course" texts show in matrix notation. This is totally a computer problem.
Multiple Regression 1 11 A big data set
Measuring the percentage of fat in pork bellies is apparently an expensive procedure. It is therefore important to determine if this percentage can be predicted from other, more easily measured properties of the pork carcass. The data set PORKBELLY.XLS contains information on the following variables from a sample of 45 pork carcasses: BackFat Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt PctFAT BackFat Thin Thin Muscle 10 10 Three categories for the back fat layer A muscling score. The higher the score, the more muscle and hence less fat. Loin eye area. The average of three measures of fat depth at the 10th rib. Live weight of the carcass. Weight of the slaughtered carcass. A measure used to determine specific gravity. Average of three determinations of depth of the belly. Average measure of leanness of three cross sections of the belly. Total weight of the belly. Percentage of fat. LoinEye FatDepth 5.50 1.00 5.50 1.00 LiveWt 215 215 Yield SpecGrav BllyDpth Leanness 164 4.21 1.47 12.00 164 4.21 1.57 12.67 BellyWt 11.10 12.43 PctFAT 46.0 46.0 Multiple Regression 1 12 Comments
There are 9 continuous X variables. Even drawing graphs is a lot of work. 2. One variable (BackFat) is categorical with values Thin, Medium, Thick. 3. We will need some new tools to sort things out.
1. Multiple Regression 1 13 14.2: Multiple regression output
Things that are pretty much the same as in simple regression: 1. R2 2. SYX
3. Ttests and intervals for bj values Multiple Regression 1 14 New or different
1. 2. 3. 4. Adjusted R2 Ftest from the ANOVA table Special type of Xs will require different interpretations Some new output statistics Multiple Regression 1 15 Adjusted R 2 In essence, it "penalizes" or adjusts the regular R2 for the number of variables used in the model. For example, a 2X model with R2 = .80 would probably have a higher adjusted R2 than a 6X model with R2 = .82. If so, you would think the extra complexity wasn't "worth it".
Multiple Regression 1 16 Adjusted R formula (page 534)
2 r 2 adj n 1 2 = 1  (1  R ) n  k  1 OmniPower: n=34, k=2, R2 = .7577 Adjusted R2 =
Multiple Regression 1 17 Regression on 9 numeric predictors Regression Statistics Multiple R 0.8841 R Square 0.7816 Adjusted R Square 0.7254 Standard Error 2.5959 Observations 45 ANOVA df Regression Residual Total 9 35 44 SS 843.9881 235.8599 1079.8480 MS F Significance F 93.7765 13.9158 0.0000 6.7389 PorkBelly.xls data Intercept Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt Coefficients Standard Error 39.5222 16.9797 0.6180 0.3100 1.5859 1.7535 3.0111 2.9164 0.0305 0.1189 0.1675 0.1381 3.3317 2.7291 2.9537 3.6726 0.4098 0.2943 0.6201 0.4721 t Stat 2.3276 1.9939 0.9044 1.0325 0.2565 1.2130 1.2208 0.8043 1.3924 1.3134 Pvalue Lower 95% Upper 95% 0.0258 5.0516 73.9928 0.0540 1.2473 0.0112 0.3720 5.1457 1.9740 0.3089 2.9094 8.9316 0.7990 0.2109 0.2719 0.2333 0.1128 0.4477 0.2303 8.8720 2.2086 0.4267 10.4094 4.5019 0.1726 1.0074 0.1877 0.1976 0.3383 1.5785 Multiple Regression 1 18 Comments R2 = 78.2% and adjusts to 5.5% lower This may indicate that there are more variables in the equation than we really need. If you look at Tratios you see many that are not significant. We will need to "trim the fat"? Multiple Regression 1 19 Whole model test (page 535) This is a test for the significance of the entire model (all of the Xs working together). Although there are other ways to formulate it, here we look at the SSR and SSE in the ANOVA table. It is an F test similar to the one we had in ANOVA problems. Multiple Regression 1 20 Ftest hypotheses
H0: 1 = 2 = ...= k = 0 H1: Not all s are 0 (None of the Xs help explain Y) (At least one X is useful) Test statistic: F = MSR/MSE from ANOVA
Multiple Regression 1 21 Results for OmniPower
ANOVA df Regression Residual Total 2 31 33 SS 39472730.77 12620946.67 52093677.44 MS 19736365.39 407127.31 F Significance F 48.4771 0.0000 We would use an F distribution with 2 numerator and 31 denominator degrees of freedom. From table E.5, critical value is 3.32 F = MSR/MSE = Pvalue =
Multiple Regression 1 22 How about for the piggies? Critical value df = Critical F = FStat = Multiple Regression 1 23 Alternative form of F test
Can also be stated in terms of R2. Explained variation = R2 (has k d.f.) Unexplained variation = 1  R2 (has nk1) R n  k 1 F= 2 1 R k
2 Multiple Regression 1 24 Another example
Listed below are some of the variables used to predict the crime rate in n=141 standard metropolitan statistical areas. Name CrimeRate CntCityPct Doctors HospBeds PctHS Description Crimes per 1000 SMSA residents % of residents living in the SMSA's central city Number of professionallyactive physicians Total number of hospital beds in SMSA Percentage of SMSA adult population who finished high school
Multiple Regression 1 25 Partial output
Variable Estimate Std. Error t Value p Value Constant 21.9668 8.23751 2.67 0.0086 CntCityPct 0.134932 0.0615847 2.19 0.0302 Doctors 0.00489209 0.0015708 3.11 0.0022 PctHS 0.522265 0.142322 3.67 0.0003 HospBeds 0.00145406 0.000549731 2.65 0.0091 Rsquared = 24.50% Std err of estimation = 12.7242 What is the FRatio? What is adjusted Rsquare? Multiple Regression 1 26 ...
View
Full Document
 Spring '08
 Thompson
 Linear Regression, Regression Analysis

Click to edit the document details