chap13 - Business Statistics (BUSA 3101) Dr. Lari H....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Business Statistics (BUSA 3101) Dr. Lari H. Arjomand lariarjomand@clayton.edu Slide 1 NOTE When in a problem raw data are given, then you may use either Data Analysis Add-Ins or SWStat+ Add-Ins in Excel to solve the problem Slide 2 Chapter 13 Simple Linear Regression s s s s s s s Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction for Computer Solution Slide 3 Types of Regression Models Regression Models 1 Explanatory Variable 2+ Explanatory Variables Multiple Simple Linear NonLinear Linear NonLinear Slide 4 Simple Linear Regression Model The equation that describes how y (as a dependent variable) is related to x (as an independent variable) and an error term is called the regression model. The simple linear regression model is: y = β0 + β1x +ε where: β0 and β1 are called parameters of the model, ε is a random variable called the error term. Slide 5 Simple Linear Regression Equation s The simple linear regression equation is: E(y) = β0 + β1x • Graph of this regression equation is a straight line. • β0 is the y intercept of the regression line. • β1 is the slope of the regression line. • E(y) is the expected value (mean) of y for a given x value. Slide 6 Simple Linear Regression Equation s Positive Linear Relationship E( y ) Regression line Intercept β0 Slope β1 is positive x Slide 7 Simple Linear Regression Equation s Negative Linear Relationship E( y ) Intercept β0 Regression line Slope β1 is negative x Slide 8 Simple Linear Regression Equation s No Relationship E( y ) Intercept Regression line β0 Slope β1 is 0 x Slide 9 Estimated Simple Linear Regression Equation s The estimated simple linear regression equation ˆ y = b0 + b1 x • The graph is called the estimated regression line. • b0 is the y intercept of the line. • b1 is the slope of the line. ˆ •y is the estimated value of y for a given x value. Slide 10 Estimation Process Regression Model y = β0 + β1x +ε Regression Equation E(y) = β0 + β1x Unknown Parameters β0, β1 Sample Data: x y x1 y1 . . . . xn yn b0 and b1 provide estimates of β0 and β1 Estimated Regression Equation ˆ y = b0 + b1 x Sample Statistics b0, b1 Slide 11 Least Squares Method The least squares method is a procedure for using sample data to find the estimated regression equation, i.,e. b0 and b1. The least squares method uses the sample data to provide the values of b0 and b1 that minimize the sum of the squares of the deviations between observed values of the dependent variable yi and the estimated values of the dependent variable­­ . See next ^ yi slide. Slide 12 Least Squares Method s Least Squares Criterion ˆ m in ∑ ( y i − y i )2 where: yi = observed value of the dependent variable for the ith observation ^ yi = estimated value of the dependent variable for the ith observation Slide 13 Least Squares Graphically n ∑ LS minimizes i =1 Y 2 ei = 2 e1 + 2 e2 + 2 e3 + Regression Regression Model Model Y2 = b0 + b1X 2 + e2 e e 1 e Regression Equation 4 2 e 3 2 e4 ˆ Yi = b0 + b1 X i X Slide 14 Least Squares Method s Slope for the Estimated Regression Equation b1 where: ∑ (x − x )( y − y ) = ∑(x − x ) i i i 2 xi = value of independent variables yi = value of dependent variables Slide 15 Least Squares Method s y­Intercept for the Estimated Regression Equation b0 = y − b1 x where: _ _ x = mean value for independent variable y = mean value for dependent variable Slide 16 The Simple Linear Regression Model Illustrated Slide 17 Example 1 John Sherman, the student body president at Clayton State, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Use Excel to develop a regression equation. See next slide for the data. Slide 18 Example 1 (Continued) Book Page Price($) Introduction to History 500 84 Basic Algebra 700 75 Introduction to Psychology 800 99 Introduction to Sociology 600 72 Business Management 400 69 Introduction to Biology 500 81 Fundamentals of Jazz 600 63 Principles of Nursing 800 93 Slide 19 Example 1 (Solution) The regression equation is: ^ yi = 48 + .05X The slope of the line is .05. Each addition page costs about a nickel. The equation crosses the Y-axis at $48.00. A book with no pages would cost $48.00. Slide 20 Example 1 (Continued) We can use the regression equation to estimate (predict) values of Y. The estimated selling price of an 800 page book is $88.00, found by: Price = $48 + .05(Number of Pages) = $48 + .05(800) = $88.00 Slide 21 Thinking Challenge Example Student: Solve this Problem You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Use Excel to see what is the relationship between sales & advertising? Slide 22 Using Excel’s Regression Tool Up to this point, you have seen how Excel can be Up used for various statistical analysis. used Excel also has a comprehensive tool in its Data comprehensive Analysis package called Regression. Regression The Regression tool can be used to perform a The complete regression analysis. complete Slide 23 Using Excel’s Regression Tool s First enter the data into Excel worksheet, and then: Step 1 Select the Tools menu Tools Step 2 Choose the Data Analysis option Data Step 3 Choose Regression from the list of Regression Analysis Tools Analysis Slide 24 Using Excel’s Regression Tool s Excel Regression Dialog Box Click Click Slide 25 Excel Solution Regression Statistics Output ANOVA Output Data Estimated Regression Estimated Equation Output Equation Ý = -0.10 + 0.70 X Thinking Challenge Example Student: Solve the Problem You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 Use Excel to see what is the relationship between fertilizer and crop yield? Slide 27 Excel Solution Ý = 0.80 + 0.65 X SWStat+ Solution Simple Linear Regression Another Example: Reed Auto Sales Reed Auto periodically has a special week­long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide. s Slide 30 Simple Linear Regression s Example (Continued): Reed Auto Sales Number of TV Ads 1 3 2 1 3 Number of Cars Sold 14 24 18 17 27 What is the relationship between car sold and TV Ads? Use Excel to solve the problem and to graph the scatter graph and trendline. and s Slide 31 Excel Solution ˆ y = 10 + 5x Scatter Diagram and Trendline 30 Car s Sold 25 20 y = 5x + 10 15 10 5 0 0 1 2 TV Ads 3 4 Slide 33 SST, SSR, & SSE s Relationship Among SST, SSR, SSE SST = SSR + SSE ˆ ˆ ( y i − y ) 2 = ∑ ( y i − y )2 + ∑ ( y i − y i ) 2 ∑ where: SST = Total variations SSR = Explained variation by regression SSE = Unexplained variation or residual Note: Excel refers to SSE as sum of squares residual. Slide 34 Variation Measures Y Yi SST = Total variations (Yi -Y)2 variations SSE = Unexplained SSE ^ variation (Yi -Yi)2 ˆ Yi = b0 + b1 X i SSR = Explained ^ variation (Yi -Y)2 variation Y X X i Slide 35 Coefficient of Determination The coefficient of determination (r2) is the coefficient proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation ( r). It ranges from 0 to 1. It does not give any information on the direction of the relationship between the variables. Slide 36 Coefficient of Determination (Continued) s The coefficient of determination equation is: r2 = SSR/SST Explained variation r= Total variation 2 0 ≤ r2 ≤ 1 Slide 37 The Coefficient of Correlation (r) is a measure of the Coefficient strength of the relationship between two variables. Also called Pearson’s r and It requires interval or Pearson’s product moment ratio-scaled data. correlation coefficient. P e a r s o n 's r It can range from -1.00 to 1.00. Values of -1.00 or +1.00 indicate perfect and strong correlation. Negative values indicate an inverse relationship and positive values indicate a direct relationship. -1 0 1 Values close to 0.0 indicate weak correlation. Slide 38 Different Values of the Correlation Coefficient Slide 39 We calculate the coefficient of correlation from the following formula. Formula for r r= + r if b1 is positive, and 2 r= − r if b1 is negative 2 Where, b1 is the slope of the estimated regression equation. Slide 40 Coefficient of Determination ( r2 ) s Example Reed Auto Sales (Continued) Reed Auto periodically has a special week­long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide. Slide 41 Coefficient of Determination (r2) s Reed Auto Sales Example (Continued) Number of TV Ads 1 3 2 1 3 Number of Cars Sold 14 24 18 17 27 Find the coefficient of determination, and explain the answer. Slide 42 Coefficient of Determination (Solution) r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 88% 88% of the variability in the number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold. Slide 43 Excel Solution r2 SSR SST ˆ y = 10 + 5x Sample Correlation Coefficient As we said: rxy = (sign of b1 ) Coefficient of Determination rxy = (sign of b1 ) r 2 where: b1 = the slope of the estimated regression ˆ equation y = b0 + b1 x Slide 45 Example Continued Sample Correlation Coefficient Solution rxy = (sign of b1 ) r 2 ˆ 5x The sign of b1 in the equationy = 10 + is “+”. rxy = + .8772 rxy = +.9366 b1 Slide 46 Excel Solution r ˆ y = 10 + 5x Example 1 (Continued) John Sherman, the student body president at Clayton State, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Draw scatter diagram and compute the correlation coefficient (r) and r2 . See next slide for the data. Explain. Slide 48 Example 1 (Continued) Book Page Price($) Introduction to History 500 84 Basic Algebra 700 75 Introduction to Psychology 800 99 Introduction to Sociology 600 72 Business Management 400 69 Introduction to Biology 500 81 Fundamentals of Jazz 600 63 Principles of Nursing 800 93 Slide 49 Scatter Diagram of Number of Pages and Selling Price of Text 100 90 Price ($) 80 70 60 400 500 600 700 800 Page Example 1 (Continued) Slide 50 Excel Solution r r2 Example 1 (Continued) r = 0.61 r2 = .612 = 0.38 The correlation between the number of pages and the selling price of the book is 0.61. This indicates a moderate association between the variables. Slide 52 t­test of significance of r Did a computed r (correlation coefficient) come from a population of paired observations with zero correlation? Ho: r = 0 (The correlation in the population is zero.) H1: r ≠ 0 (The correlation in the population is different from zero.) Actual t test for the coefficient of correlation t = r √ n- 2 √ 1- r2 With n-2 d.f. Slide 53 Question: Test the hypothesis that there is no correlation in the population . Use an alpha of 0.02. Step 1 H0: the correlation in the population is zero. H1:The correlation in the population is not zero. Step 3 The statistic to use follows the t distribution. Step 2 Significance level is .02. Step 4 H0 is rejected if actual t > critical t value of 3.143 or if actual t < critical t value of -3.143 (or if p ≤ α = .02)Slide 54 Example 1 (Continued) Example 1 (Continued) 2 t = r √ n- 2 √ 1- r Step 5 Find the value of the test statistic. = .61 √ 8 – 2 √ 1 - .612 = 1.898 = 1.90 Since actual t = 1.90 < critical t = 3.143 (and p=0.11> 0.02) H0 is not rejected. We cannot reject the null hypothesis that there is no correlation in the population. The amount of association could be due to chance. See next slide for Excel Salutation Slide 55 Excel Solution Actual t Value Conclusion: Since actual t = 1.90 < critical t = 3.143 (and p=0.11> 0.02) accept H0 The Regression Model Assumptions Model y= β0 + β1 x + ε Assumptions about the model error terms, ε ’s Assumptions 1-Mean Zero The mean of the error terms is equal to 0. 2- Constant Variance The variance of the error terms σ 2 Constant is the same for all values of x. is 3- Normality The error terms follow a normal distribution Normality for all values of x. x. 4- Independence The values of the error terms are Independence statistically independent of each other. Slide 57 Mean Square Error and Standard Error of Estimation SSE Mean Square Error, point estimate s = MSE = of residual variance σ2 n-2 2 Where sum of square errors (SSE) is: ˆ SSE = ∑ ( yi − yi ) 2 = ∑ ( yi − b0 − b1 xi ) 2 SSE s = MSE = n-2 Standard Error of Estimate, point estimate of residual standard deviation σ The Standard Error of the Estimate measures the scatter, or dispersion, of the observed values around the line of regression. Slide 58 Example 1 (Continued) John Sherman, the student body president at Clayton State, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Find standard error of estimate. For the data, see next slide. Slide 59 Example 1 (Continued) Book Page Price($) Introduction to History 500 84 Basic Algebra 700 75 Introduction to Psychology 800 99 Introduction to Sociology 600 72 Business Management 400 69 Introduction to Biology 500 81 Fundamentals of Jazz 600 63 Principles of Nursing 800 93 See next slide for Excel Solution to this question. Slide 60 Excel Solution Standard Error of Estimate + SWStat Solution SWStat+ Solution Testing for Significance (Slope: β1) To test for a significant regression relationship, we To test for a significant regression relationship, we must conduct a hypothesis test to determine whether must conduct a hypothesis test to determine whether the value of β11 (slope) is zero. the value of β (slope) is zero. Two tests are commonly used: Two tests are commonly used: t Test and F Test Both the tt test and F test require an estimate of σ 22,, Both the test and F test require an estimate of σ the variance of ε in the regression model­­S. the variance of ε in the regression model­­S. Slide 64 Testing for Significance (Slope: β1): t Test (Continued) s Hypotheses H 0 : β 1 = 0 H a : β 1 ≠ 0 Slide 65 Testing for Significance (Slope: β1): t Test Testing for Significance ( (Continued) s Rejection Rule Reject H0 if p­value < α or if actual t < ­tα/2 or if actual t > tα/2 or if actual where: tα/2 is based on a t distribution with n ­ 2 degrees of freedom Slide 66 Testing for Significance (Slope: β1): t Test (Example) s Reed Auto Sales (Continued) Reed Auto periodically has a special week­long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide. Slide 67 Testing for Significance (Slope: β1): t Test (Example) s Reed Auto Sales Example (Continued) Number of TV Ads 1 3 2 1 3 Number of Cars Sold 14 24 18 17 27 Test for β1= 0 apply t test, and also use a 95% confidence interval for β1 to test the hypotheses you used in the t test. Slide 68 Testing for Significance (Slope: β1): t Test 1. Determine the hypotheses. H 0 : β 1 = 0 H a : β 1 ≠ 0 2. Specify the level of significance. 3. Select the test statistic. 4. State the rejection rule. α = .05 Example Continued b1 t= sb1 Sb1 is standard error of slope (b1) b1 Reject H0 if p­value < .05 or if actual|t| > critical t = 3.182 (from t Table with 3 degrees of freedom) Slide 69 Testing for Significance (Slope: ): t Test 5. Compute the value of the test statistic. b1 5 t= = = 4.63 sb1 1.08 6. Determine whether to reject H0. Example Continued Since the p­value of .02 ( from Excel output) is less than .05, and since actual t = 4.63 > critical t = 3.182. We can reject H0. In other words: The statistical evidence is sufficient to conclude that we have a significant relationship between the number of TV ads aired and the number of cars sold. Slide 70 Excel Solution Excel Solution Standard Error of b1 (Sb1) Since the p­value of .02 <.05, and since actual t = 4.63 > critical t = 3.182. We reject H0. Confidence Interval for Slope: β1 Confidence Interval for Optional Reading Testing for Significance (Slope: β1 ): F Test Only in the case of Simple Regression Analysis, the F test will provide the same conclusion as the t test; that is, if the t test indicates β1 # 0 and hence a significant relationship, the F test will also indicate a significant relationship. But with more than one independent variable (see next chapter), only the F test can be used to test for an overall significant relationship. Test Statistic F = MSR/MSE Where: MSR is called mean square regression, and: MSR = SSR / Number of independent variables in the regression equation = SSR/1 = SSR Slide 73 Testing for Significance (Slope: β1 ): F Test s Hypotheses H 0 : β 1 = 0 H a : β 1 ≠ 0 Slide 74 Testing for Significance (Slope: β1 ): F Test s Rejection Rule Reject H0 if p­value < α Or if actual F > critical Fα where: In case of Simple Regression, Fα is based on an F distribution with: 1 degree of freedom in the numerator and n ­ 2 degrees of freedom in the denominator. Value of Fα is found from the F distribution table Slide 75 NOTE To find the value of Fα (critical value of F ) from the F table, you need to have three pieces of information: value of α , and degrees of freedom for both numerator and and denominator. As we said, in the case of simple linear regression model the Fα (critical value of F ) is based on an F distribution with 1 degree of freedom in the numerator and n – 2 degrees of freedom in the denominator. Where n = sample size. Slide 76 Testing for Significance (Slope: β1 ): F Test 1. Determine the hypotheses. H 0 : β 1 = 0 H a : β 1 ≠ 0 2. Specify the level of significance. α = .05 3. Select the test statistic. F = MSR/MSE 4. State the rejection rule. Reject H0 if p­value < .05 or if actual F > critical Fα (where Fα is based on an F distribution with 1 d.f. in Example Continued numerator and n ­ 2 d.f. in Slide 77 denominator.) Testing for Significance (Slope: β1 ): F Test Testing for Significance 5. Compute the value of the test statistic. F = MSR/MSE = 100/4.667 = 21.43 Example Continued Actual F value Actual 6. Determine whether to reject H0. Actual F = 21.43 is > Critical F = 10.13 (from F Table.) Also, the p­value = 0.02 (from Excel output) corresponding to F = 21.43 is less than . Hence, we reject H0. α = .05 The statistical evidence is sufficient to conclude that we have a significant relationship between the number of TV ads aired and the number of cars sold . See next slide for Excel solution Slide 78 Excel Solution Actual F Value P value Since Actual F = 21.43 is > Critical F = 10.13, and also since P = 0.02 < α =0.05, therefore H0 is rejected. Using SWStat+ Using SWStat+ Using SWStat+ Since Actual F = 21.43 is > Critical F = 10.13, and also since P = 0.02 < α =0.05, therefore H0 is rejected. Some Cautions about the Interpretation of Significance Tests Rejecting H0: β1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause­and­effect relationship is present between x and y. In other words, just because we are able to reject H0: β1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. Slide 83 Slide 84 Simple Linear Regression Example 2 Using Excel A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected • • • Dependent variable (y) = house price in $1000s Independent variable (x) = square feet For data, see next slide Slide 85 Sample Data for House Price Model (Example 2) House Price in $1000s (y) 245 Square Feet (x) 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Slide 86 Regression Using Excel (Solution) Tools / Data Analysis / Regression Slide 87 Excel Solution (Example 2) r R Squared Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations ANOVA The regression equation is: house price = 98.24833 + 0.10977 (square feet) 10 df df SSR SS MS F Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 1708.1957 Total 9 Significance F 32600.5000 Coefficients Coefficients Intercept Square Feet Standard Error 11.0848 SSE t Stat P­value 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 ­35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 SST Slide 88 Excel Solution (Example 2) Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations ANOVA •The calculated t statistic and p­value for testing whether the regression slope is = 0. In other words to see if b1 =0 10 df df SS MS Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 9 32600.5000 Coefficients Coefficients Intercept Square Feet Standard error of b1 Standard Error t Stat 11.0848 Significance F 1708.1957 Total F P­value 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 ­35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 Slide 89 Interpretation of the Intercept, b0 (Example 2) house price = 98.24833 + 0.10977 (square feet) b0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values) • Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet. Slide 90 Interpretation of the Slope Coefficient, b1 (Example 2) house price = 98.24833 + 0.10977 (square feet) b1 measures the estimated change in the average value of Y as a result of a one­unit change in X • Here, b1 = 0.10977 tells us that the average value of a house increases by 0.10977($1000) = $109.77, on average, for each additional one square foot of size Slide 91 Simple Linear Regression (Example 2 Continued) More Questions What are the values of r and R squared. Explain your answers. Do a hypothesis test of the slope, b1,. Apply both the t test and the F test. In other words, test to see if there is any relationship between the house price (Y) and the size of house (square feet; X) ? Student: Solve this part of the Problem Slide 92 Using SWSat+ to Solve Real Estate Problem Slide 93 End of Chapter 13 Your house is finished! Lari Arjomand Slide 94 ...
View Full Document

This document was uploaded on 11/25/2011.

Ask a homework question - tutors are online