Chapter+12

Chapter+12 - Chapter 12 Simple Linear Regression Part A •...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 12 Simple Linear Regression Part A • Simple Linear Regression Model • Least Squares Method • Coefficient of Determination • Model Assumptions • Testing for Significance Simple Linear Regression Simple Linear Regression Managerial decisions often are based on the relationship between two or more variables. Regression analysis can be used to develop an equation showing how the variables are related. The variable being predicted is called the dependent variable and is denoted by y. The variables being used to predict the value of the dependent variable are called the independent variables and are denoted by x. Simple Linear Regression Simple Linear Regression Simple linear regression involves one independent variable and one dependent variable. The relationship between the two variables is approximated by a straight line. Regression analysis involving two or more independent variables is called multiple regression. Simple Linear Regression Model The equation that describes how y is related to x and an error term is called the regression model. The simple linear regression model is: y = β 0 + β 1x +ε where: β 0 and β 1 are called parameters of the model, ε is a random variable called the error term. Simple Linear Regression Equation Simple Linear Regression Equation s The simple linear regression equation is: E(y) = β 0 + β 1x • Graph of the regression equation is a straight line. • β 0 is the y intercept of the regression line. • β 1 is the slope of the regression line. • E(y) is the expected value of y for a given x value. Simple Linear Regression Equation s Positive Linear Relationship E (y ) Regression line Intercept β0 Slope β 1 is positive x Simple Linear Regression Equation Simple Linear Regression Equation s Negative Linear Relationship E (y ) Intercept β0 Regression line Slope β 1 is negative x Simple Linear Regression Equation Simple Linear Regression Equation s No Relationship E (y ) Intercept Regression line β0 Slope β 1 is 0 x Estimated Simple Linear Regression Equation Estimated Simple Linear Regression Equation s The estimated simple linear regression equation ˆ y = b0 + b1 x • The graph is called the estimated regression line. • b0 is the y intercept of the line. • b1 is the slope of the line. ˆ •y is the estimated value of y for a given x value. Estimation Process Regression Model y = β 0 + β 1x +ε Regression Equation E(y) = β 0 + β 1x Unknown Parameters β 0, β 1 Sample Data: x y x1 y1 . . . . xn yn b0 and b1 provide estimates of β 0 and β 1 Estimated Regression Equation ˆ y = b0 + b1 x Sample Statistics b0, b1 Least Squares Method • Least Squares Criterion m in ∑ ( y i − y i ) 2 where: yi = observed value of the dependent variable for the ith observation ^ yi = estimated value of the dependent variable for the ith observation Least Squares Method • Slope for the Estimated Regression Equation b1 ∑ (x − x )( y − y ) = ∑(x − x ) i i i 2 Least Squares Method s y­Intercept for the Estimated Regression Equation b0 = y − b1 x where: xi = value of independent variable for ith observation yi = value of dependent variable for ith _ observation x = mean value for independent variable _ y = mean value for dependent variable n = total number of observations Simple LinearRegression s Example: Reed Auto Sales Reed Auto periodically has a special week­long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide. Simple Linear Regression Simple Linear Regression s Example: Reed Auto Sales Number of TV Ads 1 3 2 1 3 Number of Cars Sold 14 24 18 17 27 Estimated Regression Equation • Slope for the Estimated Regression Equation b1 ∑( x − x )( y − y ) = 20 = 5 = 4 ∑(x − x ) i i i 2 • y­Intercept for the Estimated Regression Equation b0 = y − b1 x = 20 − 5(2) = 10 • Estimated Regression Equation ˆ y = 10 + 5x Estimated Regression Equation s Excel Worksheet (showing data) 1 2 3 4 5 6 7 A Week 1 2 3 4 5 B TV Ads 1 3 2 1 3 C Cars Sold 14 24 18 17 27 D Estimated Regression Equation s Producing a Scatter Diagram Step 1 Select cells B1:C6 Step 2 Click the Chart Wizard Step 3 When the Chart Type dialog box appears: Choose XY (Scatter) in the Chart type list Choose Scatter from the Chart sub­type display Select Next > Step 4 When the Chart Source Data dialog box appears Select Next > Estimated Regression Equation s Producing a Scatter Diagram Step 5 When the Chart Options dialog box appears: Select the Titles tab and then Delete Cars Sold in the Chart title box Enter TV Ads in the Value (X) axis box Enter Cars Sold in the Value (Y) axis box Select the Legend tab and then Remove the check in the Show Legend box Select Next > Estimated Regression Equation s Producing a Scatter Diagram Step 6 When the Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the scatter diagram Estimated Regression Equation s Adding the Trendline Step 1 Position the mouse pointer over any data point and right click to display the Chart menu Step 2 Select the Add Trendline option Step 3 When the Add Trendline dialog box appears: On the Type tab select Linear On the Options tab select the Display equation on chart box Click OK Scatter Diagram and Trend Line 30 Cars Sold 25 20 y = 5x + 10 15 10 5 0 0 1 2 TV Ads 3 4 Coefficient of Determination • Relationship Among SST, SSR, SSE SST = SSR + SSE ˆ ˆ ( y i − y )2 = ∑ ( y i − y )2 + ∑ ( y i − y i )2 ∑ where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error Coefficient of Determination s The coefficient of determination is: r2 = SSR/SST where: SSR = sum of squares due to regression SST = total sum of squares Coefficient of Determination Coefficient of Determination r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 88% of the variability in the number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold. Sample Correlation Coefficient rxy = (sign of b1 ) Coefficient of Determination rxy = (sign of b1 ) r 2 where: b1 = the slope of the estimated regression ˆ equation y = b0 + b1 x Sample Correlation Coefficient rxy = (sign of b1 ) r 2 ˆ 5x The sign of b1 in the equationy = 10 + is “+”. rxy = + .8772 rxy = +.9366 Assumptions About the Error Term ε 1. The error ε is a random variable with mean of zero. 2. The variance of ε , denoted by σ 2, is the same for all values of the independent variable. 3. The values of ε are independent. are independent. 4. The error ε is a normally distributed random variable. Testing for Significance To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of β 1 is zero. Two tests are commonly used: t Test and F Test Both the t test and F test require an estimate of σ 2, the variance of ε in the regression model. in the regression model. Testing for Significance • An Estimate of σ The mean square error (MSE) provides the estimate of σ 2, and the notation s2 is also used. s 2 = MSE = SSE/(n − 2) where: ˆ SSE = ∑ ( yi − yi ) 2 = ∑ ( yi − b0 − b1 xi ) 2 Testing for Significance • An Estimate of σ • To estimate σ we take the square root of σ 2. • The resulting s is called the standard error of the estimate. SSE s = MSE = n−2 Testing for Significance: t Test • Hypotheses • Test Statistic H 0 : β 1 = 0 H a : β 1 ≠ 0 b1 t= sb1 Testing for Significance: t Test s Rejection Rule Reject H0 if p­value < α or t < ­tα/2 or t > tα/2 or where: tα/2 is based on a t distribution with n ­ 2 degrees of freedom Testing for Significance: t Test 1. Determine the hypotheses. 1. Determine the hypotheses. H 0 : β 1 = 0 H a : β 1 ≠ 0 2. Specify the level of significance. 3. Select the test statistic. 4. State the rejection rule. α = .05 b1 t= sb1 Reject H0 if p­value < .05 or |t| > 3.182 (with 3 degrees of freedom) Testing for Significance: t Test Testing for Significance: 5. Compute the value of the test statistic. b1 5 t= = = 4.63 sb1 1.08 6. Determine whether to reject H0. t = 4.541 provides an area of .01 in the upper tail. Hence, the p­value is less than .02. (Also, t = 4.63 > 3.182.) We can reject H0. Confidence Interval for β 1 We can use a 95% confidence interval for β 1 to test the hypotheses just used in the t test. H0 is rejected if the hypothesized value of β 1 is not included in the confidence interval for β 1. Confidence Interval for β 1 • The form of a confidence interval for β 1 is: b1 is the point estimator b1 ± tα / 2 sb1 wheretα is the t value providing an area /2 of α/2 in the upper tail of a t distribution with n ­ 2 degrees of freedom tα / 2 sb1 is the margin of error Confidence Interval for β 1 • Rejection Rule Reject H0 if 0 is not included in the confidence interval for β 1. 95% Confidence Interval for β • b1 ± tα / 2 sb1 = 5 +/­ 3.182(1.08) = 5 +/­ 3.44 1 or 1.56 to 8.44 • Conclusion 0 is not included in the confidence interval. Reject H0 Testing for Significance: F Test s Hypotheses s H 0 : β 1 = 0 H a : β 1 ≠ 0 Test Statistic F = MSR/MSE Testing for Significance: F Test s Rejection Rule Reject H0 if p­value < α or F > Fα where: Fα is based on an F distribution with 1 degree of freedom in the numerator and n ­ 2 degrees of freedom in the denominator Testing for Significance: F Test 1. Determine the hypotheses. 1. Determine the hypotheses. H 0 : β 1 = 0 H a : β 1 ≠ 0 2. Specify the level of significance. 3. Select the test statistic. 4. State the rejection rule. α = .05 F = MSR/MSE Reject H0 if p­value < .05 or F > 10.13 (with 1 d.f. in numerator and 3 d.f. in denominator) Testing for Significance: F Test Testing for Significance: 5. Compute the value of the test statistic. F = MSR/MSE = 100/4.667 = 21.43 6. Determine whether to reject H0. F = 17.44 provides an area of .025 in the upper tail. Thus, the p­value corresponding to F = 21.43 is less than 2(.025) = .05. Hence, we reject H0. The statistical evidence is sufficient to conclude that we have a significant relationship between the number of TV ads aired and the number of cars sold. Some Cautions about the Interpretation of Significance T Rejecting H : β = 0 and concluding that the ests 0 1 relationship between x and y is significant does not enable us to conclude that a cause­and­effect relationship is present between x and y. Just because we are able to reject H0: β 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. Chapter 12 Simple Linear Regression Part B • Excel’s Regression Tool • Using the Estimated Regression Equation for Estimation and Prediction • Residual Analysis: Validating Model Assumptions • Outliers and Influential Observations Using Excel’s Regression Tool Using Excel’s Regression Tool Up to this point, you have seen how Excel can be used for various parts of a regression analysis. Excel also has a comprehensive tool in its Data Analysis package called Regression. The Regression tool can be used to perform a complete regression analysis. Using Excel’s Regression Tool Using Excel’s Regression Tool s Excel Worksheet (showing data) 1 2 3 4 5 6 7 A Week 1 2 3 4 5 B TV Ads 1 3 2 1 3 C Cars Sold 14 24 18 17 27 D Using Excel’s Regression Tool Using Excel’s Regression Tool s Performing the Regression Analysis Step 1 Select the Tools menu Step 2 Choose the Data Analysis option Step 3 Choose Regression from the list of Analysis Tools Using Excel’s Regression Tool Using Excel’s Regression Tool s Excel Regression Dialog Box Using Excel’s Regression Tool Using Excel’s Regression Tool s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Excel Value Worksheet A Week 1 2 3 4 5 B TV Ads 1 3 2 1 3 C Cars Sold 14 24 18 17 27 SUMMARY OUTPUT Intercept TV Ads SS 1 3 4 F G H I Data ANOVA ANOVA Output Output ANOVA Regression Residual Total E Regression Statistics Regression Output Output Regression Statistics Multiple R 0.936585812 R Square 0.877192982 Adjusted R Square 0.83625731 Standard Error 2.160246899 Observations 5 df D MS F Significance F 100 100 21.42857 0.018986231 14 4.666667 114 Coefficients Standard Error t Stat P-value 10 2.366431913 4.225771 0.024236 5 1.08012345 4.6291 0.018986 Estimated Estimated Regression Equation Output Equation Lower 95% Upper 95% Lower 95.0% Upper 95.0% 2.468950436 17.53104956 2.468950436 17.53104956 1.562561893 8.437438107 1.562561893 8.437438107 Using Excel’s Regression Tool Using Excel’s Regression Tool s Excel Value Worksheet (bottom­left portion) A B C D E 22 23 Coeffic. Std. Err. t Stat P-value 24 Intercept 10 2.36643 4.2258 0.02424 25 TV Ads 5 1.08012 4.6291 0.01899 26 Note: Columns F­I are not shown. Using Excel’s Regression Tool Using Excel’s Regression Tool s Excel Value Worksheet (bottom­right portion) A B F G H I 22 23 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0% 24 Intercept 10 2.46895 17.53105 2.46895044 17.5310496 25 TV Ads 5 1.562562 8.437438 1.56256189 8.43743811 26 Note: Columns C­E are hidden. Using Excel’s Regression Tool Using Excel’s Regression Tool s Excel Value Worksheet (middle portion) A 16 17 18 19 20 21 22 B C D E F ANOVA df Regression Residual Total SS MS F Significance F 1 100 100 21.4286 0.018986231 3 14 4.66667 4 114 Using Excel’s Regression Tool Using Excel’s Regression Tool s Excel Value Worksheet (top portion) A 9 10 11 12 13 14 15 16 B Regression Statistics Multiple R 0.936585812 R Square 0.877192982 Adjusted R Square 0.83625731 Standard Error 2.160246899 Observations 5 C Using the Estimated Regression Equation Using the Estimated Regression Equation for Estimation and Prediction s Confidence Interval Estimate of E(y ) y p ± t α / 2 sy p s p Prediction Interval Estimate of y p y p ± tα / 2 sind where: confidence coefficient is 1 ­ α and and tα/2 is based on a t distribution with n ­ 2 degrees of freedom Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: ^ y = 10 + 5(3) = 25 cars y = 10 + 5(3) = 25 cars Confidence Interval for E(y ) p s Formula Worksheet (confidence interval portion) D 1 2 3 4 E F CONFIDENCE INTERVAL xp 3 x bar =AVERAGE(B2:B6) x p -x bar =F2-F3 5 (x p -x bar)2 =F4^2 2 p (x p -x bar) Variance of y hat Std. Dev of y hat t Value Margin of Error Point Estimate Lower Limit Upper Limit =DEVSQ(B2:B6) =D20*(1/B15+F5/F6) =SQRT(F7) =TINV(0.05,3) =F9*F8 =B24+B25*F2 =F11-F10 =F11+F10 6 7 8 9 10 11 12 13 G Confidence Interval for E(y ) p s Excel’s Confidence Interval Output D 1 2 3 4 E CONFIDENCE INTERVAL xp 3 x bar 2.0 x p -x bar 1.0 5 (x p -x bar)2 6 7 8 9 10 11 12 13 1.0 2 Ξ (x p -x bar) Variance of y hat Std. Dev of y hat t Value Margin of Error Point Estimate Lower Limit Upper Limit 4.0 2.1000 1.4491 3.1824 4.6118 25.0 20.39 29.61 F G Confidence Interval for E(y ) p The 95% confidence interval estimate of the mean The 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: 25 + 4.61 = 20.39 to 29.61 cars Prediction Interval for y p s Excel Formula Worksheet 1 2 3 4 H I PREDICTION INTERVAL Variance of y ind =D20+F7 Std. Dev. of y ind =SQRT(I2) Margin of Error =F9*I3 5 Lower Limit =F11-I4 6 7 Upper Limit =F11+I4 Prediction Interval for y p s Excel Value Worksheet 1 2 3 4 H PREDICTION INTERVAL Variance of y ind 6.76667 Std. Dev. of y ind 2.60128 Margin of Error 8.27844 5 Lower Limit 16.72 6 7 Upper Limit 33.28 I Prediction Interval for y p The 95% prediction interval estimate of the number of The 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: 25 + 8.28 = 16.72 to 33.28 cars Residual Analysis Residual Analysis If the assumptions about the error term ε appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid. The residuals provide the best information about ε . Residual for Observation i ˆ yi − yi Much of the residual analysis is based on an examination of graphical plots. Residual Plot Against x Residual Plot Against s If the assumption that the variance of ε is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then The residual plot should give an overall impression of a horizontal band of points Residual Plot Against x Residual ˆ y −y Good Pattern 0 x Residual Plot Against x Residual Plot Against Residual ˆ y −y Nonconstant Variance 0 x Residual Plot Against x Residual Plot Against Residual ˆ y −y Model Form Not Adequate 0 x Residual Plot Against x s Residuals Observation Predicted Cars Sold Residuals 1 15 -1 2 25 -1 3 20 -2 4 15 2 5 25 2 Residual Plot Against x s Using Excel to Produce a Residual Plot • The steps outlined earlier to obtain the regression output are performed with one change. • When the Regression dialog box appears, we must also select the Residual Plot option. • The output will include two new items: • A plot of the residuals against the independent variable, and • A list of predicted values of y and the corresponding residual values. Residual Plot Against x s Excel Value Worksheet (bottom portion) A B 28 29 RESIDUAL OUTPUT 30 31 Observation Predicted Cars Sold 32 1 15 33 2 25 34 3 20 35 4 15 36 5 25 37 C Residuals -1 -1 -2 2 2 Residual Plot Against x Residual Plot Against TV Ads Residual Plot 3 Residuals 2 1 0 -1 -2 -3 0 1 2 TV Ads 3 4 Standardized Residuals s Standardized Residual for Observation i ˆ yi − yi sy i −yˆ i where: sy i −yˆ i = s 1 − hi ( x i − x )2 1 hi = + n ∑ ( x i − x )2 Standardized Residual Plot Standardized Residual Plot s s The standardized residual plot can provide insight about the assumption that the error term ε has a normal distribution. If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution. Standardized Residual Plot s s s Excel’s Regression tool be used to obtain the standardized residuals. The steps described earlier to conduct a regression analysis are performed with one change: • When the Regression dialog box appears, we must select the Standardized Residuals option The Standardized Residuals option does not automatically produce a standardized residual plot. Standardized Residual Plot s Excel Value Worksheet A B 28 29 RESIDUAL OUTPUT 30 31 Observation Predicted Y 32 1 15 33 2 25 34 3 20 35 4 15 36 5 25 C D Residuals Standard Residuals -0.534522484 -0.534522484 -1.069044968 1.069044968 1.069044968 -1 -1 -2 2 2 Standardized Residual Plot s s Excel’s Chart Wizard can be used to construct the standardized residual plot. A scatter diagram is developed in which: • The values of the independent variable are placed on the horizontal axis • The values of the standardized residuals are placed on the vertical axis Standardized Residual Plot s Excel Standardized Residual Plot Standard Residuals 28 29 30 31 32 33 34 35 36 37 1.5 A B C D 1 RESIDUAL OUTPUT 0.5 Observation 0 -0.5 0 -1 -1.5 1 2 3 4 5 Predicted Y 15 10 25 20 15 25 Residuals Standard Residuals -1 -0.534522 20 30 -1 -0.534522 -2 -1.069045 2 1.069045 2 1.069045 Cars Sold Standardized Residual Plot Standardized Residual Plot s All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that ε has a normal distribution. Outliers and Influential Observations Outliers and Influential Observations s Detecting Outliers • An outlier is an observation that is unusual in comparison with the other data. • Minitab classifies an observation as an outlier if its standardized residual value is < ­2 or > +2. • This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier. • This rule’s shortcoming can be circumvented by using studentized deleted residuals. • The |i th studentized deleted residual| will be larger than the |i th standardized residual|. ...
View Full Document

This note was uploaded on 08/28/2011 for the course BUS 300 taught by Professor White during the Spring '09 term at Rutgers.

Ask a homework question - tutors are online