2103 CHAP 10 SIMPLE REGR Nov 24, 29 FALL 2010(V1)

2103 CHAP 10 SIMPLE REGR Nov 24, 29 FALL 2010(V1) - Chapter...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 10: Simple Linear Regression Introduction to Regression Using X to Predict Y Regression: Extent to which one can improve prediction of Y "beyo How do Colleges predict future performance of applicants? Job Applicants: Section 10.1: Probabilistic Models First Order (straight-Line) Probabilistic Model (p. 563) Y= Dependent or Response variable (variable to be modeled) X= Independent Variable or Predictor Variable (variable used as a p Beta Zero = Y intercept of the line Beta One = Slope of the line E = Epsilon = random error component Section 10.2: Fitting the Model: The Least Squares Approach The Least Squares Line Y hat = Beta Zero + (Beta One * X) has thes 1) Sum of the Predictive Errors equals "Zero" 2) The Sum of Squared Errors (SSE) is smaller than for any other s Interpreting Estimates of Beta Zero and Beta One in Simple Linear Beta Zero = Y intercept represents the predicted value of Y when X Beta One = Slope represents the change in Y (increase or decrease Section 10.3: Model Assumptions Section 10.4: Assessing the Utility of the Model: Making Inferences about t Section 10.5: The Coefficients of Correlation, Coefficient of Determination, Section 10.6: Using the Model for Estimation and Prediction tion of Y "beyond" merely guessing the Mean of Y (worst case) odeled) ble used as a predictor of Y) ne * X) has these Properties for any other straight-line model one could use Simple Linear Regression e of Y when X = Zero se or decrease) for every unit increase in X rences about the Slope Coefficient Determination, and Coefficient of Non-Determination Learning Objectives for Simple Regression and Correlation: 1) Learn concepts of regression and correlation e.g., Correlation, regression, Coeff of Determin and, Std. Error of Estimate, Predicted (fitted Va 2) Learn when regression methods are applied eg. For Simple Regression, it involves a 2-vari both X and Y are NUMERIC random variables We want to develop a "model" for predicting Y We want to assess Significance and Accuracy 3) Learn how to interpret results of regression analyses (results p 1) e.g., are X and Y appropriate variables for a 2) is X a statistically significant predictor of Y? 3) if X meets above, is it also sufficiently "accu Statistical significance is "necessary," but not an "accurate" predictor In Class Example with Data, and the Excel Regression Analysis & Research Scenario: Can Ad length (X=sec's) predict Consumer Re X=Length of commercial (sec's) Y=Recall Test score (recalling message of commercial) Does Recall Vary as a function of Commercials' Length? Is there a statistically significant relationship between Comm. Len n=60 "X,Y" pairs Remember for Chapter 7: On Exam, all T test Statistics, and their Pvalues and Critical Value Incl T test for 2 Indep Sample Means, and T Te On Exam, you need to calculate Z test for 2 Sample Proportions (n Chapter 10: Nearly all Statistics are Provided Using Excel to obtain Regression Analysis Output Go to 1) Data, 2) Data Analysis, 3) Regression, follow instruc's in B For Scatterplot: you can use Line Fit option in "Regression," also, Using Insert Menu Graphs, First, Select Data, i go to : Insert Menu, Scatter, select Top Left Gr Examine the Scatterplot: Check for Linearity (should not be curvil Pairs 1 2 3 4 5 Lengthof Commercial RecallScore (X)Length (Y)Test 52 24 40 20 36 16 28 11 44 10 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 16 48 52 60 44 36 44 60 24 32 40 24 32 52 36 60 56 20 40 48 20 24 36 60 44 52 28 56 20 52 4 24 18 16 15 14 15 24 10 1 8 9 0 17 9 26 28 15 8 2 0 11 8 24 10 15 7 26 11 18 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 16 20 40 16 44 32 20 56 56 60 24 48 16 32 16 40 24 36 28 56 32 28 48 28 48 16 8 12 10 14 19 8 11 24 15 9 18 14 14 11 15 11 27 5 17 8 15 8 24 21 SUMMARY OUTPUT Regression Statistics Multiple R 0.54 If Only 1 X variable, this is "r," not "R." r=Correlation Coefficient; range is from Simple Correlation Coefficient for X and Y: Indicates Two Things 1) Strength of the Linear (straight line) Relationship: as | r | approaches 1.0 e.g., r= -1.00 is as strong as r = +1.00 e.g., r = -0.87 is stronger linear relationship than r = 2) Direction of the Relationship; (+) indicates Direct, or positive Relationsh (-) indicates Indirect, or Negative Relatio Sign on Correlation Coefficient, "r," alw R Square 0.29 If only 1 X variable, it is "r square", Coe r square ranges from Zero to +1.00 Interpretation: Proportion of Variability Interpretation: Proportion of Variability .29, or 29% of Variability in Recall Score by Commercial Length in Seco Best (ideal) Case? When r squared appr Worst Case? When r squared approach Also, 1- rsquare 0.71 Coefficient of Non-Determination Interpretation: Proportion of Variability Interpretation: Proportion of Variability e.g., Correlation between SAT(x) and GP rsquare between Sat and GPA = 0.50^2 SAT only explains about 25% of variabi How much Variability in GPA is NOT ex Adjusted R Square 0.28 Adjusted r square: this is given to cons the sample size is adequate; Shrinkage to which sample size is inadequate for # Here, shrinkage is only 1%, so not an is if Shrinkage was 15-20 percent or more More relevant with multiple predictors i Standard Error 5.89 Average amount the model is in error in SEest = SE = syx = ideally we want the Seest to be zero Observations ANOVA df Regression Residual Total 1 58 59 SS 818.5 2011.1 2829.6 MS 818.5 34.67 60.00 perfect prediction if all points fell on the Intercept (X)Length Coefficients Standard Error t Stat 3.64 2.23 1.63 0.27 0.06 4.86 1 write out the regression equation y predicted = Bo + B1X Bo = population B1 = slope x = value from data yp = 3.635664 +(0.267483*x) Lengthof Commercial RecallScore (X)Length (Y)Test y pred 52 24 40 20 36 16 28 11 44 10 16 4 48 24 52 18 60 16 44 15 36 14 44 15 60 24 24 10 32 1 40 8 24 9 32 0 52 17 36 9 60 26 56 28 20 15 40 8 Pairs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 17.5 14.3 13.3 11.1 15.4 7.9 16.5 17.5 19.7 15.4 13.3 15.4 19.7 10.1 12.2 14.3 10.1 12.2 17.5 13.3 19.7 18.6 9.0 14.3 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 48 20 24 36 60 44 52 28 56 20 52 16 20 40 16 44 32 20 56 56 60 24 48 16 32 16 40 24 36 28 2 0 11 8 24 10 15 7 26 11 18 16 8 12 10 14 19 8 11 24 15 9 18 14 14 11 15 11 27 5 16.5 9.0 10.1 13.3 19.7 15.4 17.5 11.1 18.6 9.0 17.5 7.9 9.0 14.3 7.9 15.4 12.2 9.0 18.6 18.6 19.7 10.1 16.5 7.9 12.2 7.9 14.3 10.1 13.3 11.1 55 56 57 58 59 60 56 32 28 48 28 48 17 8 15 8 24 21 18.6 12.2 11.1 16.5 11.1 16.5 SUMMARY OUTPUT Regression Statistics Multiple R 0.54 Correlation Coefficient: This is actually rxy with only 1 X in eq #1) rxy indicates the Magnitude (strength) of the r "weak" |rxy|, regardless of sign #2) rxy indicates the Direction (sign) of the relation For Example, which is Stronger relationship, rxy = Strength is based on absolute value (d |rxy| is from Zero to +1.0000 The closer the absolute valu Multiple R 0.54 stronger the relationship R Square 0.29 rxy^2= r-square = Coeff of Determinat (.54^2) = .29 What does Coeff of Determination Indicate? Because it is "squared," it no longer indicates "dire Definition: It shows the proportion of Variability in Y, that can be explain If the Regression Model ( Y = a + bx) was perfect, and X would explain 100% of variability in Y. Here, what % of variability in Message Recall, can message length? Answer is 29% consider rxy between SAT & GPA What does variability in this context mean exactly Variability refers to when Y increases, and decrea If X is a perfect Predictor of Y, e.g., if SAT predict then SAT would explain 100% of variability (move rxy SAT & GPA is close to 0.50, so r-square is 0.5 SAT explains only about 25% of the variability in C so 1.000 - rsquare = .75, 75% of variabilty is NOT Coeff of NON determin = 1.00 - rsquare Adjusted R Square 0.28 Standard Error Observations 5.89 60 ANOVA df Regression Residual Total SS 1 818.5 58 2011.1 59 2829.6 MS 818.5 34.67 Coefficients Standard Error t Stat Intercept 3.6357 2.2259 1.6333 (X)Length 0.2675 0.0551 4.8585 For every unit increase (one second) in Commercial Length, there is a .27 increase in Test scores (1/4 of a test point). RESIDUAL OUTPUT Observation Predicted (Y)Test Standard Residuals Residuals 1 17.54 6.46 1.11 2 14.33 5.67 0.97 3 13.27 2.73 0.47 4 11.13 -0.13 -0.02 5 15.40 -5.40 -0.93 6 7.92 -3.92 -0.67 7 16.47 7.53 1.29 8 17.54 0.46 0.08 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 19.68 15.40 13.27 15.40 19.68 10.06 12.20 14.33 10.06 12.20 17.54 13.27 19.68 18.61 8.99 14.33 16.47 8.99 10.06 13.27 19.68 15.40 17.54 11.13 18.61 8.99 17.54 7.92 8.99 14.33 -3.68 -0.40 0.73 -0.40 4.32 -0.06 -11.20 -6.33 -1.06 -12.20 -0.54 -4.27 6.32 9.39 6.01 -6.33 -14.47 -8.99 0.94 -5.27 4.32 -5.40 -2.54 -4.13 7.39 2.01 0.46 8.08 -0.99 -2.33 -0.63 -0.07 0.13 -0.07 0.74 -0.01 -1.92 -1.09 -0.18 -2.09 -0.09 -0.73 1.08 1.61 1.03 -1.09 -2.48 -1.54 0.16 -0.90 0.74 -0.93 -0.44 -0.71 1.26 0.35 0.08 1.38 -0.17 -0.40 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 7.92 15.40 12.20 8.99 18.61 18.61 19.68 10.06 16.47 7.92 12.20 7.92 14.33 10.06 13.27 11.13 18.61 12.20 11.13 16.47 11.13 16.47 2.08 -1.40 6.80 -0.99 -7.61 5.39 -4.68 -1.06 1.53 6.08 1.80 3.08 0.67 0.94 13.73 -6.13 -1.61 -4.20 3.87 -8.47 12.87 4.53 0.36 -0.24 1.17 -0.17 -1.30 0.92 -0.80 -0.18 0.26 1.04 0.31 0.53 0.11 0.16 2.35 -1.05 -0.28 -0.72 0.66 -1.45 2.21 0.78 d Correlation: n, Coeff of Determination and Non-Determination , Predicted (fitted Values) and Residuals etc. , it involves a 2-variable case, when random variables del" for predicting Y based on values of X cance and Accuracy of the Regression Model analyses (results provided) riate variables for a regression model? icant predictor of Y? (using signif testing) so sufficiently "accurate" in predicting Y necessary," but not sufficient, must also be redictor ression Analysis & Graph redict Consumer Recall of Message(Y)? mercial) s' Length? between Comm. Length & Message Recall? s and Critical Values are Given ple Means, and T Test for 2 Paired Sample Means mple Proportions (not avail in Excel) , follow instruc's in Box "Regression," also, you can , First, Select Data, including Headings r, select Top Left Graph, should not be curvilinear, or nonlinear) 30 25 20 RECALL TEST SCORE (Y) 15 25 20 RECALL TEST SCORE (Y) 15 10 5 0 10 20 30 40 LENGTH OF COMMERC le, this is "r," not "R." efficient; range is from -1.00 to +1.00 s Two Things : as | r | approaches 1.00, stronger = +1.00 r relationship than r = +0.75 , or positive Relationship ect, or Negative Relationship ion Coefficient, "r," always be same as on Slope le, it is "r square", Coefficient of Determination from Zero to +1.00 oportion of Variability in Y explained by X oportion of Variability in Y "determined" by X riability in Recall Scores(Y) are explained mercial Length in Seconds(X) ? When r squared approaches 1.00 en r squared approaches Zero n-Determination oportion of Variability in Y NOT explained by X oportion of Variability in Y NOT determined by X between SAT(x) and GPA(Y) is about r=0.50 Sat and GPA = 0.50^2 = 0.25 s about 25% of variability in College GPA bility in GPA is NOT explained by SAT? 75% e: this is given to consider whether or not s adequate; Shrinkage in rsquare indicates degree size is inadequate for # of predictors in model is only 1%, so not an issue 15-20 percent or more….definitely an issue th multiple predictors in regression model the model is in error in predicting Y (SUM(Yobs-Yp)^2)^1/2 n-2 he Seest to be zero n if all points fell on the line test question F 23.61 Significance F 0 P-value 0.11 0.00000935 Lower 95% Upper 95% Lower 95.0% -0.82 8.09 -0.82 0.16 0.38 0.16 summation of (y obs - yp) = 0 residual y obs - yp (y obs - yp)^2 6.5 41.6701849 5.7 32.0926212 2.7 7.4800337 -0.1 0.0156687 -5.4 29.2128911 -3.9 15.3302367 7.5 56.6282561 0.5 0.2072478 -3.7 13.5763905 -0.4 0.1639400 0.7 0.5401736 -0.4 0.1639400 4.3 18.6225444 -0.1 0.0030520 -11.2 125.3303736 -6.3 40.1317820 -1.1 1.1135415 -12.2 148.7205834 -0.5 0.2967583 -4.3 18.1905233 6.3 39.8840828 9.4 88.0841317 6.0 36.1764394 -6.3 40.1317820 -14.5 -9.0 0.9 -5.3 4.3 -5.4 -2.5 -4.1 7.4 2.0 0.5 8.1 -1.0 -2.3 2.1 -1.4 6.8 -1.0 -7.6 5.4 -4.7 -1.1 1.5 6.1 1.8 3.1 0.7 0.9 13.7 -6.1 209.5205638 80.7358800 0.8925625 27.7205932 18.6225444 29.2128911 6.4757793 17.0170673 54.5428730 4.0589569 0.2072478 65.3610059 0.9708450 5.4520617 4.3456213 1.9737303 46.3065974 0.9708450 57.9834324 29.0016143 21.9456213 1.1135415 2.3261582 37.0225444 3.2576463 9.5148521 0.4422715 0.8925625 188.6492645 37.5177666 -1.6 -4.2 3.9 -8.5 12.9 4.5 0.0 2.6072087 17.5989051 15.0142701 71.8226617 165.7611233 20.4772072 2011.10350 xy with only 1 X in equation ude (strength) of the relationship between X & Y of sign n (sign) of the relationship between X & Y Positive (Direct) ger relationship, rxy = -.98, vs rxy = +.75? d on absolute value (disregard sign) from Zero to +1.0000 ser the absolute value is to 1.000, the r the relationship = Coeff of Determination |r-square| from Zero to +1.000 longer indicates "direction," only strength Y, that can be explained by X = a + bx) was perfect, r-square would be 1.00, f variability in Y. Message Recall, can be explained by GPA context mean exactly? ncreases, and decreases. Y, e.g., if SAT predicts GPA pefectly, % of variability (movement up or down) in GPA .50, so r-square is 0.50^2, or .25, 25% of the variability in College Grades % of variabilty is NOT explained by SAT 0 - rsquare F 23.61 Significance F 0 P-value 0.1078 0.0000 ercial Length, test point). Lower 95% Upper 95% -0.8199 8.0913 0.1573 0.3777 esiduals (Y)Test (Y)Test Linear Regression for (Y)Test 20 30 40 50 60 70 LENGTH OF COMMERCIAL (X) Upper 95.0% 8.09 0.38 30 25 20 25 20 RECALL TEST SCORE (Y) 15 10 5 0 10 20 30 40 50 60 LENGTH OF COMMERCIAL (X) 70 (Y)Test Linear Regression for (Y)Test 40 50 60 70 MMERCIAL (X) Statistical Analysis with Dr. Pred: Checklist for Simple Regression Analysis 1 Inspect Scatterplot: a) linear? B) positive or negative? C) Weak or Strong? D) bivariate outliers? 2 Interpret a) Correlation Coeff=rxy, b) Coeff of Determination=r^2xy c) Coeff of Non-Determ=(1.00-r^2xy) 3 Conduct 5-Step Hypothesis Test of Significance of Slope Coefficient "Beta" (alpha = 0.05) 4 Interpret: a) Yintercept=a, b) Slope Coefficient=b; c) Regression Equation: Ypred=a + (b*X) 5 Calculate Predicted Y values (aka, fitted-Y) for each X in the distribution 6 Calculate Residual=(Yobserved-Ypredicted) for each predicted value of Y 7 Define & Interpret Residuals: For a given X, does regression model over- or under-estimate Y? 8 Calculate Se=Syx=Standard Error of Estimate=SquareRoot [(sum of squared residuals) / (n -2) ] 9 Define & Interpret Standard Error of Estimate: compare to its Maximum & Minimum values 10 Inspect Residual Plot: There should be no relationship between X & Residuals; is this assumption met? Bivariate Outlier Detection using Standardized Residuals 1) Moderate Bivariate Outlier is more than 2 SR's away 2) Severe Bivariate Outlier is more than 3 SR's away Factors that Affect Interpretation of Correlation & Regression Analyses (see Supplemental 1) Correlation and Causation 2) Assumption of Linearity 3) Bivariate Outliers and effect on correlation and regression 4) Restriction of Range of X and Y (aka, range of talent) 5) Assumption of Independence of Errors (residual plot with X) 6) Assumption of Equal Scatter Along Regression Line (Homoscedasticity) 7) Sample Size and Correlation (n affects pvalues, not size of correlation coefficient) (see other Excel Worksheet, "Correl & Sample Size" es (see Supplemental Readings) tion coefficient) From Fisher & Yates "Statistical Tables for Biological, Agricultural, and medical Research," Critical Values for Correlation Coefficient For Two Tailed Tests of Significance: Alpha = 0.05(*) When n=30, & df=28 0.36 When df =30 When df=60 When df=120 When df=500 0.35 0.25 0.18 0.09 iological, Agricultural, Correlation Coefficient iled Tests of Significance: Alpha = 0.01(**) 0.46 0.45 0.33 0.23 0.12 LEARNING OBJECTIVES FOR SIMPLE CORRELATION AND REGRESSION 1 Inspect Scatterplot (Line Fit Plot): a) linear? B) positive or negative? C) Weak or Strong? D) bivari 2 Interpret a) Correlation Coeff=rxy, b) Coeff of Determination=r^2xy c) Coeff of Non-Determ=(1.003 Conduct 5-Step Hypothesis Test of Simple Correlation Coefficient=rxy (alpha = 0.05) 4 Interpret: a) Yintercept=a, b) Slope Coefficient=b; c) Regression Equation: Ypred=a + (b*X) 5 Calculate Predicted Y values (aka, fitted-Y) for each X in the distribution 6 Calculate Residual=(Yobserved-Ypredicted) for each predicted value of Y 7 Define & Interpret Residuals: For a given X, does regression model over- or under-estimate Y? 8 Calculate Se=Syx=Standard Error of Estimate=SquareRoot [(sum of squared residuals) / (n -2) ] 9 Define & Interpret Standard Error of Estimate: compare to its Maximum & Minimum values 10 Inspect Residual Plot: There should be no relationship between X & Residuals; is this assumptio Optional: For first X, calculate prediction interval for the Ypredicted value (X)Age 110 113 114 134 93 141 115 115 115 142 96 139 89 93 91 109 138 83 100 137 (Y) Repairs 327.67 376.68 392.52 443.14 342.62 476.16 324.74 338.98 433.45 526.37 362.42 448.76 335.27 350.94 291.81 467.80 474.48 354.15 420.11 416.04 Predicted Y Residuals SUMMARY OUTPUT Multiple R R Square Adjusted R Square Standard Error Observations Regression Statistics 0.75 0.57 0.54 43.32 20 ANOVA Regression Residual Total Intercept Age df 1 18 19 44024.24 33776.83 77801.07 Standard Error SS 44024.24 1876.49 MS Coefficients 114.8525 2.4733 58.6854 0.5106 1.9571 4.8436 t Stat Detect Bivariate RESIDUAL OUTPUT Ypred=a + (b*X) Observation Predicted Repairs Outliers when |SR| is (Yobserved­Ypred) Residuals greater than 2 Standard Residuals (SRs) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 386.92 394.34 396.81 446.28 344.87 463.59 399.29 399.29 399.29 466.07 352.29 458.65 334.98 344.87 339.93 384.45 456.17 320.14 362.19 453.70 ­59.25 ­17.66 ­4.29 ­3.14 ­2.25 12.57 ­74.55 ­60.31 34.16 60.30 10.13 ­9.89 0.29 6.07 ­48.12 83.35 18.31 34.01 57.92 ­37.66 ­1.41 ­0.42 ­0.10 ­0.07 ­0.05 0.30 ­1.77 ­1.43 0.81 1.43 0.24 ­0.23 0.01 0.14 ­1.14 1.98 0.43 0.81 1.37 ­0.89 ND REGRESSION tive? C) Weak or Strong? D) bivariate outliers? xy c) Coeff of Non-Determ=(1.00-r^2xy) nt=rxy (alpha = 0.05) Equation: Ypred=a + (b*X) tribution value of Y del over- or under-estimate Y? m of squared residuals) / (n -2) ] aximum & Minimum values n X & Residuals; is this assumption met? Age Line Fit Plot Squared Residuals 600 550 500 Repairs 450 400 350 300 70 80 90 100 Age Age Residual Plot 12 110 120 130 140 10 8 Residuals 6 4 2 Residuals 6 4 2 23.46 F Significance F 0 0 70 80 90 100 Age 110 120 130 P-value 0.0660 0.0001 Conf Interval for Slope Lower 95% Upper 95% ­8.44 1.4 238.15 3.55 Moderate Bivariate Outlier is over 2 std resids away Severe Bivariate Outlier is over 3 std residuals away 130 140 esidual Plot 100 Age 110 120 130 140 1 Inspect Scatterplot (Line Fit Plot): a) linear? B) positive or negative? C) Weak or Strong? D) bivari 2 Interpret a) Correlation Coeff=rxy, b) Coeff of Determination=r^2xy c) Coeff of Non-Determ=(1.003 Conduct 5-Step Hypothesis Test of Simple Correlation Coefficient=rxy (alpha = 0.05) 4 Interpret: a) Yintercept=a, b) Slope Coefficient=b; c) Regression Equation: Ypred=a + (b*X) 5 Calculate Predicted Y values (aka, fitted-Y) for each X in the distribution 6 Calculate Residual=(Yobserved-Ypredicted) for each predicted value of Y 7 Define & Interpret Residuals: For a given X, does regression model over- or under-estimate Y? 8 Calculate Se=Syx=Standard Error of Estimate=SquareRoot [(sum of squared residuals) / (n -2) ] 9 Define & Interpret Standard Error of Estimate: compare to its Maximum & Minimum values 10 Inspect Residual Plot: There should be no relationship between X & Residuals; is this assumptio Optional: For first X, calculate prediction interval for the Ypredicted value Example: HRD(Human Resource Development) team designs Assessment Centers to evaluate the Executive D Making skills of applicants for middle and upper level management positions. If they devise a test to “screen” potential applicants, how can they evaluate whether or not the test will accurat differentiate the applicants with “high” potential from those with “low” potential? Solution: Offer all “reasonable” applicants a probationary employment period of 3 to 6 months, on the conditi that continuation with firm is based on satisfactory performance. Prepare a battery of questions and or work­sample tests that all applicants will complete prior to their starting At the end of 3 months, ask the supervisors of all managers to evaluate their subordinates using official compa performance appraisal forms. Next, compute the correlation between test scores(X) and actual on­the­job performance(Y). If the correlation significant, the test may possibly be a valid predictor of performance. Other considerations are also important EC Problem: Analyze the following Data According to the Regression Checklist shown in Course Documents: Employee #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 (X)TestScore 20 55 23 26 52 32 43 39 35 47 (Y)Appraisal 32 97 39 49 79 47 72 70 64 81 Predicted Y Residuals SUMMARY OUTPUT Regression Statistics Multiple R 0.97 R Square 0.94 Adjusted R Square 0.94 Standard Error 5.20 Observations 10.00 ANOVA Regression Residual Total df 1 8 9 3640.09 215.91 3856.00 Standard Error SS 3640.09 26.99 MS 134.88 F Intercept (X)TestScore Coefficients 1.3092 1.6584 5.5602 0.1428 0.2355 11.6136 t Stat P-value 0.8198 0.0000 Detect Bivariate RESIDUAL OUTPUT Observation Predicted (Y)Appraisal Residuals Outliers when |SR| is Standard Residuals Moderate Bivariate Outlier is over 2 std resids away Severe Bivariate Outlier is over 3 std residuals away Ypred=a + (b*X) (Yobserved­Ypred)greater than 2 1 2 3 4 5 6 7 8 9 10 34.48 92.52 39.45 44.43 87.54 54.38 72.62 65.99 59.35 79.25 ­2.48 4.48 ­0.45 4.57 ­8.54 ­7.38 ­0.62 4.01 4.65 1.75 ­0.51 0.91 ­0.09 0.93 ­1.74 ­1.51 ­0.13 0.82 0.95 0.36 ) Weak or Strong? D) bivariate outliers? Coeff of Non-Determ=(1.00-r^2xy) (alpha = 0.05) ion: Ypred=a + (b*X) er- or under-estimate Y? uared residuals) / (n -2) ] & Minimum values esiduals; is this assumption met? rs to evaluate the Executive Decision her or not the test will accurately f 3 to 6 months, on the condition complete prior to their starting day. ordinates using official company ormance(Y). If the correlation is siderations are also important. shown in Course Documents: Squared Residuals (X)TestScore Line Fit Plot 60 55 50 45 40 35 30 20.00 (Y)Appraisal 25.00 30.00 (X)TestScore 35.00 40.00 30 20.00 25.00 30.00 (X)TestScore 35.00 40.00 (X)TestScore Residual Plot 5 3 Significance F 0.0000028 1 -1 Conf Interval for Slope Lower 95% ­11.5127 1.3291 Upper 95% Residuals 14.1310 1.9876 -3 -5 -7 te Bivariate Outlier 2 std resids away Bivariate Outlier 3 std residuals away -9 20 22 24 26 28 30 32 34 36 38 40 (X)TestScore 40.00 40.00 INTRODUCTION TO SIMPLE CORRELATION AND REGRESS Learning Objectives: Familiarize yourself with Concepts for Building Regre See how Regression models can be applied to various in business Applications? to predict job success of applicants to build stock portfolio's with investments that a all positively correlated Divesification? In statistical terms, is building po with investments with a variety of relations among individual components some pos, some negatively, some not at al Value? This is a topic that is integrated in the Upper D FIN, RISK, MSOM, ECON, ACCT Regression? What is a "regression model" recall Univariate ChiSquare Goodness of Fit Test, which tested "models" Here, what we test is how well a Straight Line Model fi The Single Predictor Case where we have 1 X, and Y Simple Regression (only 1 X variable, ie. 1 predic Regression Regression: The process of using equations to predict values of variables Value of Regression? It will be included in many required Upper Division Course Risk, FIN, ECON, MSOM, MKT RES Models used to predict Variables, such as Profits, ROI, Revenue Values of funds given an investment already in the p Application: In a stock portfolio, do we want all investments to be positively related? Take for example, a 2-stock portfolio Diversifying means that the investments will not all be po correlated some will be inversely relate some not related at all What does Regression actually Mean? Application: Predicting Student Success in College Regression? It is the degree to which the Model improves our ab to predict Y, better than using Ybar (mean of Y) for every Keep in Mind: The Worst Case in Predicting Y, is to Predict Mea AND REGRESSION uilding Regression Models plied to various disciplines stments that are NOT , is building portfolios ety of relationships some not at all related in the Upper Division Curric. of Fit Test, t Line Model fits the data ve 1 X, and Y ble, ie. 1 predictor) redict values of numeric Division Course already in the portfolio vestments to be rtfolio means that the will not all be positively inversely related, and ated at all improves our ability to of Y) for every case of X. s to Predict Mean of Y for all. ...
View Full Document

This note was uploaded on 03/29/2011 for the course STAT 2103 taught by Professor Pred during the Fall '10 term at Temple.

Ask a homework question - tutors are online