chap14 - Business Statistics (BUSA 3101) Dr. Lari H....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Business Statistics (BUSA 3101) Dr. Lari H. Arjomand lariarjomand@clayton.edu Slide 1 Chapter 14 Multiple Regression s Multiple Regression Model s Least Squares Method s Multiple Coefficient of Determination s Model Assumptions s Testing for Significance s Using the Estimated Regression Equation for Estimation and Prediction s Qualitative Independent Variables Slide 2 Multiple Regression Model Everything you have learned about simple linear Everything regression model is a special case of multiple regression. regression is multiple The interpretation of regression results is similar. The Since all calculations are done by computer, there is no extra computational burden. no In fact, statisticians do not make any distinction between simple regression and multiple regression—the just call it regression. just regression. Slide 3 Multiple Regression Model Multiple regression is required when a single­predictor model (simple regression model) is inadequate to describe the true relationship between Y (the response variable or the dependent variable) and its potential predicators (the independent variables­­ x1, x2, . . . xp ). Slide 4 Multiple Regression Model The equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . xp and an error term is called the multiple regression model. y = β0 + β1x1 + β2x2 + . . . + βpxp + ε where: β0, β1, β2, . . . , βp are the parameters, and ε is a random variable called the error term Slide 5 Multiple Regression Equation The equation that describes how the mean value of y is related to x1, x2, . . . xp is called the multiple regression equation. E(y) = β0 + β1x1 + β2x2 + . . . + βpxp Slide 6 Estimated Multiple Regression Equation A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters β0, β1, β2, . . . , βp. The estimated multiple regression equation is: ^ y = b y = b0 + b1x1 + b2x2 + . . . + bpxp Slide 7 Estimated Multiple Regression Equation bP is the net change in Y for each unit change in XP holding all other values constant, where p=1 to k. Note that bP is called a regression coefficient. The least squares estimation is used to estimation develop this equation. Because determining b1, b2, etc. is very tedious, a software package such as Excel is as recommended. ^ y = b y = b0 + b1x1 + b2x2 + . . . + bpxp Slide 8 Multiple Regression Model s Example: Programmer Salary Survey A software firm collected data for a sample software of 20 computer programmers. A suggestion computer was made that regression analysis could be used to determine if salary was related be to the years of experience and the score on the firm’s programmer aptitude test. The years of experience, score on the aptitude The test, and corresponding annual salary ($1000s) for a test, sample of 20 programmers is shown on the next sample slide. Slide 9 Multiple Regression Model (Example Continued) Exper. Score Salary Exper. Score Salary 4 7 1 5 8 10 0 1 6 6 78 100 86 82 86 84 75 80 83 91 24 43 23.7 34.3 35.8 38 22.2 23.1 30 33 9 2 10 5 6 8 4 6 3 3 88 73 75 81 74 87 79 94 70 89 38 26.6 36.2 31.6 29 34 30.1 33.9 28.2 30 Slide 10 Multiple Regression Model (Example Continued) Suppose we believe that salary (y) is related to the years of experience (x1) and the score on related the programmer aptitude test (x2) by the following the regression model: regression y = β0 + β1x1 + β2x2 + ε where y x1 x2 = annual salary ($1000) = years of experience = score on programmer aptitude test Slide 11 Solving for the Estimates of β0, β1, β2 (Example Continued) Least Squares Output Input Data x1 x2 y 4 78 24 7 100 43 . . . . . . 3 89 30 Excel is used for Solving this Multiple Regression Problem b0 = b1 = b2 = R2 = etc. Slide 12 Using SWTStat+ to Solve the Problem Creating Data Area Slide 13 Using SWTStat+ to Solve the Problem Slide 14 Using SWTStat+ to Solve the Problem (Results) Multiple Coefficient of Multiple Determination R2 Determination Note: results are rounded to two decimal places. Slide 15 Solving for the Estimates of β0, β1, β2 (Example Continued) s Excel’s Regression Equation Output SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) OR Ý = 3.174 + 1.404 X11 + 0.251 X 22 OR Ý = 3.174 + 1.404 X + 0.251 X Note: Predicted salary will be in thousands of dollars. Slide 16 Interpreting the Coefficients ^ y = b y = b0 + b1x1 + b2x2 + . . . + bpxp In multiple regression analysis, we interpret each In regression coefficient as follows: regression bP represents an estimate of the change in y corresponding to a 1-unit change in xP when all 1-unit when other independent variables are held constant. other Slide 17 Interpreting the Coefficients (Example Continued) b11 = 1. 404 b = 1. 404 SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) Ý = 3.174 + 1.404 X11 + 0.251 X 22 Ý = 3.174 + 1.404 X + 0.251 X Conclusion: Salary is expected to increase by Conclusion: $1,404 for each additional year of experience (when $1,404 the variable score on programmer attitude test is held score constant). constant). Slide 18 Interpreting the Coefficients (Example Continued) b22 = 0.251 b = 0.251 SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) Ý = 3.174 + 1.404 X11 + 0.251 X 22 Ý = 3.174 + 1.404 X + 0.251 X Conclusion: Salary is expected to increase by Conclusion: $251 for each additional point scored on the $251 programmer aptitude test (when the variable years of experience is held constant). years Slide 19 Multiple Correlation or Coefficient of Multiple Multiple Determination (R2) Determination Recall that for simple regression we can compute the simple coefficient of determination r2. In multiple regression, because there are at least two independent (explanatory) variables, we compute the multiple correlation or coefficient of multiple determination R2. R2 represents the proportion of the variation in Y that is explained by the set of independent (explanatory) variables. Slide 20 Multiple Correlation Coefficient, R s s s The strength of the association is measured by the The Multiple Correlation Coefficient, R. Multiple R. R can be any value from 0 to +1. +1 he • The closer R is to one, tthe stronger the linear association is. f hen • IIf R equals zero, tthen there is no linear association between the dependent variable (Y) (Y) and the independent variables (Xp). (X R is never a negative value. Unlike the simple correlation coefficient, r, which tells both the strength and direction of the association, R tells only the strength of the association. Slide 21 Assumptions In Multiple Regression and Correlation The independent variables and the dependent variable have a linear relationship. The dependent variable must be continuous and at least interval-scaled. The residuals should follow the normal distributed with mean 0. The variation in (Y-Y’) or residual must be the same residual for all values of Y. When this is the case, we say the difference exhibits homoscedasticity. homoscedasticity Successive values of the dependent variable must be uncorrelated. Slide 22 Using SWTStat+ to Solve the Problem Creating Data Area Slide 23 Using SWTStat+ to Solve the Problem Slide 24 Using SWTStat+ to Solve the Problem (Results) Multiple Coefficient of Multiple Determination R2 Determination Note: results are rounded to two decimal places. Slide 25 Testing for Significance IIn simple linear regression,, the F and tt ttests provide n simple linear regression the F and ests provide simple provide simple provide the same conclusion. tthe same conclusion. the he IIn multiple regression,, tthe F and tt ttests have different n multiple regression the F and ests have different multiple he have multiple the have purposes. purposes. purposes. purposes. Slide 26 Testing for Significance: F Test The F ttest is used to determine whether a significant The F est is used to determine whether a significant relationship exists between the dependent variable rrelationship exists between the dependent variable relationship elationship and the set of all tthe independent variables.. and the set of all he independent variables all all The F ttest is referred to as the ttest for overall The F est is referred to as the est for overall overall overall significance.. significance significance significance Slide 27 Testing for Significance: t Test (Individual Variables) IIff the F ttest shows an overall significance,, the tt test is the F est shows an overall significance the ttest is overall overall test est used to determine whether each of the individual used to determine whether each of the iindividual individual ndividual independent variables is significant. iindependent variables is significant. independent ndependent A separate tt test is conducted for each of the A separate ttest is conducted for each of the test est independent variables in the model. iindependent variables in the model. independent ndependent We refer to each of these tt ttests as a ttest for individual We refer to each of these ests as a est for iindividual We We individual ndividual significance.. significance significance significance Slide 28 Testing for Significance Note that all computer packages report the t-statistic Note (actual t) and the p-value for each independent variable. (actual Also note that to test for a zero coefficient (H0: β j = 0) to we could alternatively construct a confidence interval for we the true coefficient β j and see whether the interval the includes zero. Excel provides all these information, you only have to know how to interpret the results Slide 29 Testing for Overall Significance: F Test Hypotheses H0: β1 = β2 = . . . = βp = 0 Ha: One or more of the parameters is not equal to zero. is Test Statistics F = MSR/MSE Rejection Rule Where p = number of the independent variables in the regression equation. regression Reject H0 if p-value < α or if actual F > critical Fα where critical Fα iis based on s where critical an F distribution with p d.f. in the d.f. numerator and n - p - 1 d.f. in the denominator. denominator. Slide 30 NOTE NOTE As we indicated, in the case of multiple regression As model, the value of Fα (critical value of F ) is based on an F model the critical distribution with p degrees of freedom in the numerator and distribution n – p - 1 degrees of freedom in the denominator. Where p = number of the independent variables in the regression equation, and n = sample size. regression F Test for Overall Significance (Example Continued) s Excel’s ANOVA Output A 32 33 34 35 36 37 38 B C D E F ANOVA df SS MS F Significance F Regression 2 500.3285 250.1643 42.76013 2.32774E-07 Residual 17 99.45697 5.85041 Total 19 599.7855 MSR MSE Actual value of F = MSR / MSE Actual MSR p-value used to test for overall significance overall Slide 32 F Test for Overall Significance (Example Continued) Test StatisticsF = MSR/MSE = 250.16/5.85 = 42.76 This is from our Excel Output Conclusion Since p-value < .05, so we can reject H0. -value so reject (Also, actual F = 42.76 > critical F =3.59) (Also, Slide 33 Testing for Significance: t Test (Individual Variables) Hypotheses H 0 : βi = 0 H a : βi ≠ 0 Test Statistics t= bi sbi This is from our Excel Output Reject H0 if p-value < α or Rejection Rule iif actual t < critical ­tα/2 or actual t > tα/2 f actual or where tα/2 iis based on a t distribution s where with n - p - 1 degrees of freedom. with p = number of the independent variables in the regression equation. equation. Slide 34 Testing for Significance: t Test (Individual Variables­­Example) Hypotheses H 0 : βi = 0 H a : βi ≠ 0 Rejection Rule For α = .05 and d.f. = 17, t.025 = 2.11 Reject H0 if p­value < .05 or if actual t > critical t.025 = 2.11 Slide 35 Using SWTStat+ to Solve the Problem (Results) Actual t Values Actual F Value Note: results are rounded to two decimal places. Slide 36 Using the Regression Equation for Estimation and Prediction The procedures for estimating tthe mean value of y The procedures for estimating he mean value of y The The and predicting an individual value of y iin multiple and predicting an individual value of y n multiple predicting predicting regression are similar to those in simple regression. rregression are similar to those in simple regression. regression egression We substitute the given values of x11, x22, . . . , xpp into We substitute the given values of x , x , . . . , x into the estimated regression equation and use the tthe estimated regression equation and use the the he corresponding value of y as the point estimate. corresponding value of y as the point estimate. Slide 37 Qualitative Independent Variables IIn many situations we must work with qualitative n many situations we must work with qualitative qualitative qualitative independent variables such as gender (male, female), iindependent variables such as gender (male, female), such independent ndependent such method of payment (cash, check, credit card), etc. method of payment (cash, check, credit card), etc. method method For example, x22 might represent gender. Where, for For example, x might represent gender. Where, for might might example, x22 = 0 indicates male and x22 = 1 indicates female.. example, x = 0 indicates male and x = 1 indicates female example, example, IIn this case, x22 iis called a dummy orr indicator variable.. n this case, x s called a dummy or iindicator variable dummy or indicator dummy o ndicator Slide 38 Qualitative Independent Variables Example (Continued): Programmer Salary Survey As an extension of the problem involving the extension computer programmer salary survey, suppose that management also believes that the annual salary is related to whether the annual whether iindividual has a graduate degree in ndividual graduate computer science or information systems. The years of experience, the score on the programmer aptitude test, whether the individual has a relevant graduate degree, and the annual salary ($1000) for each of the sampled 20 programmers are shown on the next of 20 slide. s Slide 39 Qualitative Independent Variables (Example Continued) Exper. Score Degr. Salary 4 7 1 5 8 10 0 1 6 6 78 100 86 82 86 84 75 80 83 91 No Yes No Yes Yes Yes No No No Yes 24 43 23.7 34.3 35.8 38 22.2 23.1 30 33 Exper. Score Degr. Salary 9 2 10 5 6 8 4 6 3 3 88 73 75 81 74 87 79 94 70 89 Yes No Yes No No Yes No Yes No No 38 26.6 36.2 31.6 29 34 30.1 33.9 28.2 30 Slide 40 Qualitative Independent Variables (Example Continued) y = b0 + b1x1 + b2x2 + b3x3 where: ^ y = annual salary ($1000) y = annual salary ($1000) x1 = years of experience x2 = score on programmer aptitude test x3 = 0 if individual does not have a graduate degree 1 if individual does have a graduate degree x3 is a dummy variable Slide 41 Using SWTStat+ to Solve the Problem (Creating Data Area) Slide 42 Using SWTStat+ to Solve the Problem Slide 43 Using SWTStat+ to Solve the Problem (Results) Variance Inflation Factor Variance Multiple correlation coefficient R Multiple Slide 44 Variance Inflation Factor (VIF) s s s s s s s Variance inflation factor (VIF) measures the impact of multicollinearity Variance (VIF measures (MC) among the X's (i.e., the independent variables) iin a regression n (MC) (i.e., model on the precision of estimation. In other words, multicollinearity can result in numerically unstable In estimates of the regression coefficients (small changes in X can result in large changes to the estimated regression coefficients). in estimated The higher VIF, the higher the variance of βi and the grater the The VIF the chance of finding βi insignificant. chance Typically a VIF value greater than 10 is of concern. Typically VIF is If the multiple correlation coefficient (Ri ) equals zero, then VIFi equals If 1. This is the minimum value. There are a number of approaches to dealing with MC. There MC. One approach is to delete one or more of the independent variables One from the regression equation. Slide 45 Using SWTStat+ to Plot the Problem Slide 46 Using SWTStat+ to Plot the Problem Slide 47 Qualitative Independent Variables (Example Continued) s What salary would you estimate (predict) for a person with no graduate degree in IT, who she has 3 years of experience, and with a score of 76 on the programmer aptitude test? y = b0 + b1x1 + b2x2 + b3x3 = From Excel Output y = 7.94 + 1.15 x1 + 0.2 x2 + 2.28 x3 = y = 7.94 + 1.15 (3) + 0.2 (76) + 2.28 (0) = 26.59 x $1000 = $26,590 Slide 48 Using SWTStat+ to Predict Slide 49 Using SWTStat+ to Predict (Results) $26,355 Compared to $26,590 Slide 50 Thinking Challenge Example Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches. Student Solve this Problem Oil (Gal) Temp(0F) I nsulation 275.30 40 3 363.80 27 3 164.30 40 10 40.80 73 6 94.30 64 6 230.90 34 6 366.70 9 6 300.60 8 10 237.80 23 10 121.40 63 3 31.40 65 10 203.50 41 6 441.10 21 3 323.00 38 3 52.50 58 10 Slide 51 Continued------- >> Thinking Challenge Example (Continued) s 1. 2. 3. 4. 5. 6. 7. Questions Explain your regression coefficients Find r and r2 and explain your answers Do an overall hypothesis testing using α = 0.05 Do a single test for each of the regression coefficients using α = 0.05 Predict the amount of heating oil used if the average temperature is 24 0F and amount of insulation used is 4 inches Predict the amount of heating oil used if the average temperature is 75 0F and amount of insulation used is 12 inches Construct a confidence interval using α = 0.05 Slide 52 Thinking Challenge Example (Continued) s 8. 9. 10. Questions (Continued): Find SST, SSE, SSR and explain your answers Find Standard Error of Estimation and explain your answer Why we use regression equation model —explain some applications of regression analysis (model.) Slide 53 FROM: My Family To: All of You Slide 54 Business Statistics By now, you should be ready to move to your new house!!!! THE END Lari Student Student Slide 55 End of Chapter 14 Slide 56 ...
View Full Document

Ask a homework question - tutors are online