multiple regression ppt

multiple regression ppt - Multiple Regression Part 1:...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Multiple Regression Part 1: Introduction 14.1-14.3 Multiple Regression 1 1 Multiple Regression The data set consists of n observations (often called cases). Each case has Y and two or more Xs: X1, X2, ..., Xk Some Xs may be functions of other Xs: X3 = X1/X2 or X3 = X1X1 Multiple Regression 1 2 Book's Example n = 34 stores in a chain Y = Monthly sales of the OmniPower bar X1 = Price of the bar in cents. X2 = In-store promotion expenditures (signs, displays, coupons, etc.) Used three prices and three promo levels Multiple Regression 1 3 Exploratory Tools Multiple scatter plots can use PhStat or Excel directly Correlations use Correl function several times Correlation matrix (requires Data Analysis Tool Kit) Multiple Regression 1 4 OmniPower Sales Sales versus Price 6000 5000 4000 Sales versus Promotion Budget 6000 5000 4000 Sales 3000 2000 1000 0 50 60 70 80 90 100 Sales 3000 2000 1000 0 100 200 300 400 500 600 700 Price Promotion Budget Sales Sales Price Promotion 1 -0.7351 0.5351 Price 1 -0.0968 Promotion 1 5 Multiple Regression 1 Quick impressions The relationships appear sensible: sales increase as price drops or promotion rises. Both variables look like useful predictors with price (r = -.735) better than promotion (r = .535). There is very little correlation between price and promotion, which is good. Multiple Regression 1 6 Estimation of Parameters Use the least squares approach again ^ Y = b0 + b1X1 + b2X2 + ... + bkXk Find (b0, b1, b2, ..., bk) to minimize the sum of squared residuals Multiple Regression 1 7 Two-variable regression PhStat Output Regression Statistics Multiple R 0.8705 R Square 0.7577 Adjusted R Square 0.7421 Standard Error 638.0653 Observations 34 ANOVA df Regression Residual Total 2 31 33 Coefficients 5837.5208 -53.2173 3.6131 SS 39472730.77 12620946.67 52093677.44 Standard Error 628.1502 6.8522 0.6852 MS 19736365.39 407127.31 F Significance F 48.4771 0.0000 Intercept Price Promotion t Stat P-value Lower 95% Upper 95% 9.2932 0.0000 4556.3999 7118.6416 -7.7664 0.0000 -67.1925 -39.2421 5.2728 0.0000 2.2155 5.0106 Multiple Regression 1 8 Three Regressions Because Price and Promotion are not very Sales = 7512 when both are in the model the R2 is R = 54.0% correlated, - 56.7 Price almost additive, and the slope coefficients are about the same across models. Multiple Regression 1 9 Interpretation of bj values Price effect, first equation: Price effect, third equation: Multiple Regression 1 10 Formulae are very complex Few "first course" texts show the formulae, even for k=2 (the simplest of multiple regressions). "Second course" texts show in matrix notation. This is totally a computer problem. Multiple Regression 1 11 A big data set Measuring the percentage of fat in pork bellies is apparently an expensive procedure. It is therefore important to determine if this percentage can be predicted from other, more easily measured properties of the pork carcass. The data set PORKBELLY.XLS contains information on the following variables from a sample of 45 pork carcasses: BackFat Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt PctFAT BackFat Thin Thin Muscle 10 10 Three categories for the back fat layer A muscling score. The higher the score, the more muscle and hence less fat. Loin eye area. The average of three measures of fat depth at the 10th rib. Live weight of the carcass. Weight of the slaughtered carcass. A measure used to determine specific gravity. Average of three determinations of depth of the belly. Average measure of leanness of three cross sections of the belly. Total weight of the belly. Percentage of fat. LoinEye FatDepth 5.50 1.00 5.50 1.00 LiveWt 215 215 Yield SpecGrav BllyDpth Leanness 164 4.21 1.47 12.00 164 4.21 1.57 12.67 BellyWt 11.10 12.43 PctFAT 46.0 46.0 Multiple Regression 1 12 Comments There are 9 continuous X variables. Even drawing graphs is a lot of work. 2. One variable (BackFat) is categorical with values Thin, Medium, Thick. 3. We will need some new tools to sort things out. 1. Multiple Regression 1 13 14.2: Multiple regression output Things that are pretty much the same as in simple regression: 1. R2 2. SYX 3. T-tests and intervals for bj values Multiple Regression 1 14 New or different 1. 2. 3. 4. Adjusted R2 F-test from the ANOVA table Special type of Xs will require different interpretations Some new output statistics Multiple Regression 1 15 Adjusted R 2 In essence, it "penalizes" or adjusts the regular R2 for the number of variables used in the model. For example, a 2-X model with R2 = .80 would probably have a higher adjusted R2 than a 6-X model with R2 = .82. If so, you would think the extra complexity wasn't "worth it". Multiple Regression 1 16 Adjusted R formula (page 534) 2 r 2 adj n -1 2 = 1 - (1 - R ) n - k - 1 OmniPower: n=34, k=2, R2 = .7577 Adjusted R2 = Multiple Regression 1 17 Regression on 9 numeric predictors Regression Statistics Multiple R 0.8841 R Square 0.7816 Adjusted R Square 0.7254 Standard Error 2.5959 Observations 45 ANOVA df Regression Residual Total 9 35 44 SS 843.9881 235.8599 1079.8480 MS F Significance F 93.7765 13.9158 0.0000 6.7389 PorkBelly.xls data Intercept Muscle LoinEye FatDepth LiveWt Yield SpecGrav BllyDpth Leanness BellyWt Coefficients Standard Error 39.5222 16.9797 -0.6180 0.3100 -1.5859 1.7535 3.0111 2.9164 0.0305 0.1189 0.1675 0.1381 -3.3317 2.7291 -2.9537 3.6726 -0.4098 0.2943 0.6201 0.4721 t Stat 2.3276 -1.9939 -0.9044 1.0325 0.2565 1.2130 -1.2208 -0.8043 -1.3924 1.3134 P-value Lower 95% Upper 95% 0.0258 5.0516 73.9928 0.0540 -1.2473 0.0112 0.3720 -5.1457 1.9740 0.3089 -2.9094 8.9316 0.7990 -0.2109 0.2719 0.2333 -0.1128 0.4477 0.2303 -8.8720 2.2086 0.4267 -10.4094 4.5019 0.1726 -1.0074 0.1877 0.1976 -0.3383 1.5785 Multiple Regression 1 18 Comments R2 = 78.2% and adjusts to 5.5% lower This may indicate that there are more variables in the equation than we really need. If you look at T-ratios you see many that are not significant. We will need to "trim the fat"? Multiple Regression 1 19 Whole model test (page 535) This is a test for the significance of the entire model (all of the Xs working together). Although there are other ways to formulate it, here we look at the SSR and SSE in the ANOVA table. It is an F test similar to the one we had in ANOVA problems. Multiple Regression 1 20 F-test hypotheses H0: 1 = 2 = ...= k = 0 H1: Not all s are 0 (None of the Xs help explain Y) (At least one X is useful) Test statistic: F = MSR/MSE from ANOVA Multiple Regression 1 21 Results for OmniPower ANOVA df Regression Residual Total 2 31 33 SS 39472730.77 12620946.67 52093677.44 MS 19736365.39 407127.31 F Significance F 48.4771 0.0000 We would use an F distribution with 2 numerator and 31 denominator degrees of freedom. From table E.5, critical value is 3.32 F = MSR/MSE = P-value = Multiple Regression 1 22 How about for the piggies? Critical value df = Critical F = FStat = Multiple Regression 1 23 Alternative form of F test Can also be stated in terms of R2. Explained variation = R2 (has k d.f.) Unexplained variation = 1 - R2 (has n-k-1) R n - k -1 F= 2 1- R k 2 Multiple Regression 1 24 Another example Listed below are some of the variables used to predict the crime rate in n=141 standard metropolitan statistical areas. Name CrimeRate CntCityPct Doctors HospBeds PctHS Description Crimes per 1000 SMSA residents % of residents living in the SMSA's central city Number of professionally-active physicians Total number of hospital beds in SMSA Percentage of SMSA adult population who finished high school Multiple Regression 1 25 Partial output Variable Estimate Std. Error t Value p Value Constant 21.9668 8.23751 2.67 0.0086 CntCityPct 0.134932 0.0615847 2.19 0.0302 Doctors 0.00489209 0.0015708 3.11 0.0022 PctHS 0.522265 0.142322 3.67 0.0003 HospBeds -0.00145406 0.000549731 -2.65 0.0091 R-squared = 24.50% Std err of estimation = 12.7242 What is the F-Ratio? What is adjusted R-square? Multiple Regression 1 26 ...
View Full Document

This note was uploaded on 02/14/2011 for the course QMB 3250 taught by Professor Thompson during the Spring '08 term at University of Florida.

Ask a homework question - tutors are online