Unformatted text preview: Ch. 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation
1 Draw and Interpret Scatter Diagrams SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 1) The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Construct a scatter diagram for the data. Hours, x Scores, y 3 5 2 8 2 4 4 5 6 3 65 80 60 88 66 78 85 90 90 71 2) The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Construct a scatter diagram for the data. Temperature, x Number of absences, y 72 85 91 90 88 98 75 100 80 3 7 10 10 8 15 4 15 5 3) The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Construct a scatter diagram for the data. Age, x Pressure, y 38 41 45 48 51 53 57 61 65 116 120 123 131 142 145 148 150 152 4) The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. Construct a scatter diagram for the data. Number of absences, x Final grade, y 0 3 6 4 9 2 15 8 5 98 86 80 82 71 92 55 76 82 5) A manager wishes to determine the relationship between the number of miles (in hundreds of miles) the managerʹs sales representatives travel per month and the amount of sales (in thousands of dollars) per month. Construct a scatter diagram for the data. Miles traveled, x Sales, y 2 3 10 7 8 15 3 1 11 31 33 78 62 65 61 48 55 120 6) In order for applicants to work for the foreign  service department, they must take a test in the language of the country where they plan to work. The data below show the relationship between the number of years that applicants have studied a particular language and the grades they received on the proficiency exam. Construct a scatter diagram for the data. Number of years, x Grades on test, y 3 4 4 5 3 6 2 7 3 61 68 75 82 73 90 58 93 72 Page 79 7) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). Construct a scatter diagram for the data. 10.5 8.8 13.4 12.5 18.8 10.3 7.0 15.6 16.0 Rain fall (in inches), x Yield (bushels per acre), y 50.5 46.2 58.8 59.0 82.4 49.2 31.9 76.0 78.8 8) Construct a scatter diagram for the data. All measurements are in milligrams per cigarette. Cigarette Brand A Brand B Brand C Brand D Brand E Tar Nicotine 16 1.2 13 1.1 16 1.2 18 1.4 6 0.6 9) The scores of nine members of a local community college womenʹs golf team in two rounds of tournament play are listed below. Player 1 2 3 4 5 6 7 8 9 Round 1 85 90 87 78 92 85 79 93 86 Round 2 90 87 85 84 86 78 77 91 82 Construct a scatter diagram for the data. Page 80 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Make a scatter diagram for the data. Use the scatter diagram to describe how, if at all, the variables are related. 10) x y A)
y 16 14 12 10 8 6 4 2 2 46 8 10 12 14 16 x 16 14 12 10 8 6 4 2 2 46 8 10 12 14 16 x 4 9 7 6 10 5 3 6 4 4 5 2 B)
y The variables appear to be positively, linearly related. C)
y 16 14 12 10 8 6 4 2 2 46 8 10 12 14 16 x The variables do not appear to be linearly related. D)
y 16 14 12 10 8 6 4 2 2 46 8 10 12 14 16 x The variables appear to be negatively, linearly related. The variables do notappear to be linearly related. Page 81 11) x y A)
y 24 20 16 12 8 4 12 8 4 4 8 12 x 12 8 4 24 20 16 12 8 4 4 8 12 x 4 2  3 1 0 6  1 12 14 10 13 16 12 19 B)
y The variables do not appear to be linearly related. C)
y 24 20 16 12 8 4 12 8 4 4 8 12 x The variables appear to be negatively, linearly related. D)
y 24 20 16 12 8 4 12 8 4 4 8 12 x The variables do not appear to be linearly related. The variables appear to be positively, linearly related. Page 82 12) Subject x Time watching TV y Time on Internet A)
y 20 18 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 x 20 18 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 x ABCD E FG 10 6 4 9 9 7 8 8 6 2 11 12 3 12 B)
y The variables appear to be positively, linearly related. C)
y 20 18 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 x The variables do not appear to be linearly related. D)
y 20 18 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 x The variables appear to be negatively, linearly related. The variables do not appear to be linearly related. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 13) An agricultural business wants to determine if the rainfall in inches can be used to predict the yield per acre on a wheat farm. Identify the predictor variable and the response variable. 14) A college counselor wants to determine if the number of hours spent studying for a test can be used to predict the grades on a test. Identify the predictor variable and the response variable. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 15) The ______________ variable is the variable whose value can be explained by the ________________ variable. A) Response; predictor C) Lurking; response B) Response; lurking D) Predictor Response Page 83 2 Understand the Properties of the Linear Correlation Coefficient MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Use the scatter diagrams shown, labelled a through f to solve the problem. 1) a
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 4 5 6 7x y b c
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 y d 4 5 6 7 x e
14 12 10 8 6 4 2 1 2 3 4 5 6 7 x y 14 12 10 8 6 4 2 1 2 3 y f 4 5 6 7x In which scatter diagram is r = 0.01? A) e B) c C) f D) d Page 84 2) a
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 4 5 6 7x y b c
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 y d 4 5 6 7 x e
14 12 10 8 6 4 2 1 2 3 4 5 6 7 x y 14 12 10 8 6 4 2 1 2 3 y f 4 5 6 7x In which scatter diagram is r = 1? A) b B) a C) f D) d Page 85 3) a
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 4 5 6 7x y b c
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 y d 4 5 6 7 x e
14 12 10 8 6 4 2 1 2 3 4 5 6 7 x y 14 12 10 8 6 4 2 1 2 3 y f 4 5 6 7x In which scatter diagram is r =  1? A) a B) b C) f D) d Page 86 The scatter diagram shows the relationship between average number of years of education and births per woman of child bearing age in selected countries. Use the scatter plot to determine whether the statement is true or false. 4)
10 9 8 Births per Woman 7 6 5 4 3 2 1 2 4 6 8 10 12 14 Average number of years of education of Married Women of Child  Bearing Age There is a strong positive correlation between years of education and births per woman. A) False 5)
10 9 8 B) True Births per Woman 7 6 5 4 3 2 1 2 4 6 8 10 12 14 Average number of years of education of Married Women of Child  Bearing Age There is no correlation between years of education and births per woman. A) False B) True Page 87 6)
10 9 8 Births per Woman 7 6 5 4 3 2 1 2 4 6 8 10 12 14 Average number of years of education of Married Women of Child  Bearing Age There is a strong negative correlation between years of education and births per woman. A) True 7)
10 9 8 B) False Births per Woman 7 6 5 4 3 2 1 2 4 6 8 10 12 14 Average number of years of education of Married Women of Child  Bearing Age There is a causal relationship between years of education and births per woman. A) False B) True SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 8) Construct a scatter diagram for the given data. Determine whether there is a positive linear correlation, negative linear correlation, or no linear correlation. x y  5  3 4 1  1  2 0 2 3  4  10  8 9 1  2  6  1 3 6  8 Page 88 9) Construct a scatter diagram for the given data. Determine whether there is a positive linear correlation, negative linear correlation, or no linear correlation. x y  5  3 4 1  1  2 0 2 3  4 11 6  6  1 3 4 1  4  5 8 10) Construct a scatter diagram for the given data. Determine whether there is a positive linear correlation, negative linear correlation, or no linear correlation. x y  5  3 4 1  1  2 0 2 3  4 11  6 8  3  2 1 5  5 6 7 11) The numbers of home runs that Mark McGwire hit in the first 13 years of his major league baseball career are listed below. (Source: Major League Handbook) Home Runs 33 39 22 42 9 9 39 52 58 70 Batting Average .231 .235 .201 .268 .33 .252 .274 .312 .274 .299 Construct a scatter diagram for the data. Is there a relationship between the home runs and the batting averages? 12) The data below represent the numbers of absences and the final grades of 15 randomly selected students from a statistics class. Construct a scatter diagram for the data. Do you detect a trend? Student Number of Absences Final Grade as a Percent 1 5 79 2 6 78 3 2 86 4 12 56 5 9 75 6 5 90 7 8 78 8 15 48 9 0 92 10 1 78 11 9 81 12 3 86 13 10 75 14 3 89 15 11 65 Page 89 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Use the scatter diagrams shown, labelled a through f to solve the problem. 13) a
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 4 5 6 7x y b c
14 12 10 8 6 4 2 1 2 3 4 5 6 7x y 14 12 10 8 6 4 2 1 2 3 y d 4 5 6 7 x e
14 12 10 8 6 4 2 1 2 3 4 5 6 7 x y 14 12 10 8 6 4 2 1 2 3 y f 4 5 6 7x Which scatter diagram indicates a perfect positive correlation? A) b B) a C) c D) f 14) A researcher determines that the linear correlation coefficient is 0.85 for a paired data set. This indicates that there is A) A strong positive linear correlation B) A strong negative linear correlation C) No linear correlation but that there may be some other relationship D) Insufficient evidence to make any decision about the correlation of the data Page 90 15) An instructor wishes to determine if there is a relationship between the number of absences from his class and a studentʹs final grade in the course. What is the predictor variable? A) Absences C) The instructorʹs point scale for attendance B) Final Grade D) Studentʹs performance on the final examination 16) A medical researcher wishes to determine if there is a relationship between the number of prescriptions written by medical professionals, per 100, children and the childʹs age. She surveys all the pediatricianʹs in a geographical region to collect her data. What is the response variable? A) Age of the child C) Pediatricians surveyed B) Number of prescriptions written D) 100 prescriptions 17) True or False: A doctor wishes to determine the relationship between a maleʹs age and that maleʹs total cholesterol level. He tests 200 males and records each maleʹs age and that maleʹs total cholesterol level. The males cholesterol level is the predictor variable? A) False B) True 18) A variable that is related to either the response variable or the predictor variable or both, but which is excluded from the analysis is a A) Lurking variable C) Discrete variable B) Random Variable D) Qualitative variable 19) A scatter diagram locates a point in a two dimensional plane. The diagram locates the variable on the horizontal axis and the A) Predictor; response C) Response: study variable on the vertical axis. B) Response: predictor D) Study; predictor 20) A history instructor has given the same pretest and the same final examination each semester. He is interested in determining if there is a relationship between the scores of the two tests. He computes the linear correlation coefficient and notes that it is 1.15. What does this correlation coefficient value tell the instructor? A) The history instructor has made a computational error. B) There is a strong positive correlation between the tests. C) There is a strong negative correlation between the tests. D) The correlation is something other than linear. 21) A traffic officer is compiling information about the relationship between the hour or the day and the speed over the limit at which the motorist is ticketed. He computes a correlation coefficient of 0.12. What does this tell the officer? A) There is a weak positive linear correlation. B) There is a moderate positive linear correlation. C) There is a moderate negative linear correlation. D) There is insufficient evidence to make any conclusions about the relationship between the variables. Page 91 3 Compute and Interpret the Linear Correlation Coefficient MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Calculate the correlation coefficient, r, for the data below. x y 3 5 12 9 7 6 8 10 11 4  18  16 1  7  10  14  9  5  2  16 A) 0.990 B) 0.881 C) 0.819 D) 0.792 2) Calculate the correlation coefficient, r, for the data below. x y 3 5 12 9 7 6 8 10 11 4 5 0  12  7  3  2  5  10  11 2 A)  0.995 B)  0.671 C)  0.778 D)  0.885 3) Calculate the correlation coefficient, r, for the data below. x y  6  4 3 0  2  3  1 1 2  5 1  16  2  13  12  9  5  15  4  3 A)  0.104 B)  0.132 C)  0.549 D)  0.581 4) The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Calculate the correlation coefficient r. Hours, x Scores, y A) 0.847 5 7 4 10 4 6 6 7 8 5 63 78 58 86 64 76 83 88 88 69 B) 0.991 C) 0.761 D) 0.654 5) The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Calculate the correlation coefficient, r. Temperature, x Number of absences, y A) 0.980 74 87 93 92 90 100 77 102 82 12 16 19 19 17 24 13 24 14 B) 0.890 C) 0.881 D) 0.819 6) The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Calculate the correlation coefficient, r. Age, x Pressure, y A) 0.960 42 45 49 52 55 57 61 65 69 118 122 125 133 144 147 150 152 154 B) 0.998 C) 0.890 D) 0.908 7) The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. Calculate the correlation coefficient, r. Number of absences, x Final Grade, y A)  0.991 1 4 7 5 10 3 16 9 6 100 88 82 84 73 94 57 78 84 B)  0.888 Page 92 C)  0.918 D)  0.899 8) A manager wishes to determine the relationship between the number of miles (in hundreds of miles) the managerʹs sales representatives travel per month and the amount of sales (in thousands of dollars) per month. Calculate the correlation coefficient, r. Miles traveled, x Sales, y A) 0.632 3 4 11 8 9 16 4 2 12 21 23 68 52 55 51 38 45 110 B) 0.561 C) 0.717 D) 0.791 9) In order for applicants to work for the foreign  service department, they must take a test in the language of the country where they plan to work. The data below shows the relationship between the number of years that applicants have studied a particular language and the grades they received on the proficiency exam. Calculate the correlation coefficient, r. Number of years, x Grades on test, y A) 0.934 2 3 3 4 2 5 1 6 2 64 71 78 85 76 93 61 96 75 B) 0.911 C) 0.891 D) 0.902 10) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). Calculate the correlation coefficient, r. Rain fall (in inches), x 12.8 11.1 15.7 14.8 21.1 12.6 9.3 17.9 18.3 Yield (bushels per acre), y 51.5 47.2 59.8 60 83.4 50.2 32.9 77 79.8 A) 0.981 B) 0.998 C) 0.900 D) 0.899 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 11) Calculate the coefficient of correlation, r, letting Row 1 represent the x  values and Row 2 represent the y values. Now calculate the coefficient of correlation, r, letting Row 2 represent the x  values and Row 1 represent the y values. What effect does switching the explanatory and response variables have on the correlation coefficient? Row 1 Row 2  6  4 3 0  2  3  1 1 2  5  13 5 6  2  5  9  4 0 3 5 4.2 LeastSquares Regression
1 Find the LeastSquares Regression Line and Use the Line to Make Predictions MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Find the equation of the regression line for the given data. x y 5 3 4 1 1 2 0 2 3 4  10  8 9 1  2  6  1 3 6  8 A) y = 2.097x  0.552 ^ B) y = 0.522x  2.097 ^ C) y = 2.097x + 0.552 ^ D) y =  0.552x + 2.097 ^ Page 93 2) Find the equation of the regression line for the given data. x y  5  3 4 1  1  2 0 2 3  4 11 6  6  1 3 4 1  4  5 8 A) y =  1.885x + 0.758
^ B) y = 0.758x + 1.885 ^ C) y =  0.758x  1.885 ^ D) y = 1.885x  0.758 ^ 3) Find the equation of the regression line for the given data. x y  5  3 4 1  1  2 0 2 3  4 11  6 8  3  2 1 5  5 6 7 A) y =  0.206x + 2.097
^ B) y= 2.097x  0.206 ^ C) y = 0.206x  2.097 ^ D) y =  2.097x + 0.206 ^ 4) The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Find the equation of the regression line for the given data. Hours, x Scores, y
^ 3 5 2 8 2 4 4 5 6 3 65 80 60 88 66 78 85 90 90 71 B) y = 56.11x  5.044
^ A) y = 5.044x + 56.11 C) y =  56.11x  5.044 ^ D) y =  5.044x + 56.11 ^ 5) The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. Find the equation of the regression line for the given data. Temperature, x Number of absences, y A) y = 0.449x  30.27
^ 72 85 91 90 88 98 75 100 80 3 7 10 10 8 15 4 15 5 B) y = 30.27x  0.449
^ C) y = 0.449x + 30.27 ^ D) y = 30.27x + 0.449 ^ 6) The data below are ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. Find the equation of the regression line for the given data. Age, x Pressure, y
^ 38 41 45 48 51 53 57 61 65 116 120 123 131 142 145 148 150 152 B) y = 60.46x  1.488
^ A) y = 1.488x + 60.46 C) y = 1.448x  60.46 ^ D) y = 60.46x + 1.488 ^ 7) The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. Find the equation of the regression line for the given data. Number of absences, x Final grade, y A) y =  2.75x + 96.14
^ 0 3 6 4 9 2 15 8 5 98 86 80 82 71 92 55 76 82 B) y = 96.14x  2.75
^ C) y =  2.75x  96.14 ^ D) y =  96.14x + 2.75 ^ Page 94 8) A manager wishes to determine the relationship between the number of miles (in hundreds of miles) the managerʹs sales representatives travel per month and the amount of sales (in thousands of dollars) per month. Find the equation of the regression line for the given data. Miles traveled, x Sales, y
^ 2 3 10 7 8 15 3 1 11 31 33 78 62 65 61 48 55 120 B) y = 37.92x  3.53
^ A) y = 3.53x + 37.92 C) y = 3.53x  37.92 ^ D) y = 37.92x + 3.53 ^ 9) In order for applicants to work for the foreign  service department, they must take a test in the language of the country where they plan to work. The data below shows the relationship between the number of years that applicants have studied a particular language and the grades they received on the proficiency exam. Find the equation of the regression line for the given data. Number of years, x Grades on test, y A) y = 6.91x + 46.26
^ 3 4 4 5 3 6 2 7 3 61 68 75 82 73 90 58 93 72 B) y = 6.91x  46.26
^ C) y = 46.26x  6.91 ^ D) y = 46.26x + 6.91 ^ 10) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). Find the equation of the regression line for the given data. Rain fall (in inches), x 10.5 8.8 13.4 12.5 18.8 10.3 7.0 15.6 16.0 Yield (bushels per acre), y 50.5 46.2 58.8 59.0 82.4 49.2 31.9 76.0 78.8 A) y = 4.379x + 4.267
^ B) y =  4.379x + 4.267 ^ C) y = 4.267x + 4.379 ^ D) y = 4.267x  4.379 ^ SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 11) Find the equation of the regression line by letting Row 1 represent the x  values and Row 2 represent the y values. Now find the equation of the regression line letting Row 2 represent the x  values and Row 1 represent the y values. What effect does switching the explanatory and response variables have on the regression line? Row 1  5  3 4 1  1  2 0 2 3  4 Row 2  10  8 9 1  2  6  1 3 6  8 12) Is the number of games won by a major league baseball team in a season related to the teamʹs batting average? Data from 14 teams were collected and the summary statistics yield: ∑y = 1,134, ∑x = 3.642, ∑y2 = 93,110, ∑x2 = .948622, and ∑xy = 295.54 Find the least squares prediction equation for predicting the number of games won, y, using a straight  line relationship with the teamʹs batting average, x. Page 95 13) The data in the table are typical prices for a gallon of regular leaded gasoline and a barrel of crude oil for the indicated years. Year 1975 1976 1977 1978 1979 1980 1981 1982 Gasoline Crude Oil Year Gasoline Crude Oil y(¢ per gallon) ($ per barrel) y(¢ per gallon) ($ per barrel) 57 10.38 1983 116 28.99 59 10.89 1984 113 28.63 62 11.96 1985 112 36.75 63 12.46 1986 86 14.55 86 17.72 1987 90 17.90 119 28.07 1988 90 14.67 131 35.24 1989 100 17.97 122 31.87 1990 115 22.23 Summary statistics yield: SSxx = 1222.2771, SSxy = 3031.7125, SSyy = 9144.9375, x = 21.2675, and y = 95.0625. Find the least squares line that uses crude oil price to predict gasoline price. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 14) A residual is the difference between A) The observed value of y and the predicted value of y . B) The observed value of x and the predicted value of x . C) The observed value of y and the predicted value of x . D) The observed value of x and the predicted value of y . 15) The least squares regression line A) Minimizes the sum of the residuals squared B) Maximizes the sum of the residuals squared C) Minimizes the mean difference between the residuals squared D) Maximizes the mean difference between the residuals squared 16) For a given data set, the equation of the least squares regression line will always pass through A) (x , y ) C) At least two point in the given data set B) Every point in the given data set D) The y intercept and the slope Page 96 2 Interpret the Slope and y intercept of the Least Squares Regression Line MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A county real estate appraiser wants to develop a statistical model to predict the appraised value of houses in a section of the county called East Meadow. One of the many variables thought to be an important predictor of appraised value is the number of rooms in the house. Consequently, the appraiser decided to fit the simple linear regression model: y = β 0 + β 1 x, where y = appraised value of the house (in $thousands) and x = number of rooms. Using data collected for a sample of n = 74 houses in East Meadow, the following results were obtained: y = 74.80 + 19.70x s β = 71.24, t = 1.05 (for testing β 0 ) s β = 2.63, t = 7.49 (for testing β 1 ) SSE = 60,775, MSE = 841, s = 29, r2 = .44 Range of the x  values: 5  11 Range of the y values: 160  300 Give a practical interpretation of the estimate of the slope of the least squares line. A) For each additional room in the house, we estimate the appraised value to increase $ 19,700. B) For each additional room in the house, we estimate the appraised value to increase $74,800. C) For each additional dollar of appraised value, we estimate the number of rooms in the house to increase by 19.70 rooms. D) For a house with 0 rooms, we estimate the appraised value to be $74,800.
^ ^ Page 97 2) A county real estate appraiser wants to develop a statistical model to predict the appraised value of houses in a section of the county called East Meadow. One of the many variables thought to be an important predictor of appraised value is the number of rooms in the house. Consequently, the appraiser decided to fit the simple linear regression model: y = β 0 + β 1 x, where y = appraised value of the house (in $thousands) and x = number of rooms. Using data collected for a sample of n = 74 houses in East Meadow, the following results were obtained: y = 74.80 + 19.72x s β = 71.24, t = 1.05 (for testing β 0 ) s β = 2.63, t = 7.49 (for testing β 1 ) SSE = 60,775, MSE = 841, s = 29, r2 = .44 Range of the x  values: 5  11 Range of the y values: 160  300 Give a practical interpretation of the estimate of the y  intercept of the least squares line. A) There is no practical interpretation, since a house with 0 rooms is nonsensical. B) For each additional room in the house, we estimate the appraised value to increase $74,800. C) For each additional room in the house, we estimate the appraised value to increase $19,720. D) We estimate the base appraised value for any house to be $74,800. 3) Is there a relationship between the raises administrators at State University receive and their performance on the job? A faculty group wants to determine whether job rating (x) is a useful linear predictor of raise (y). Consequently, the group considered the straight  line regression model y = β 0 + β 1 x. Using the method of least squares, the faculty group obtained the following prediction equation: y = 14,000  2,000x Interpret the estimated slope of the line. A) For a 1 point increase in an administratorʹs rating, we estimate the administratorʹs raise to decrease $2,000. B) For a 1 point increase in an administratorʹs rating, we estimate the administratorʹs raise to increase $2,000. C) For an administrator with a rating of 1.0, we estimate his/her raise to be $2,000. D) For a $1 increase in an administratorʹs raise, we estimate the administratorʹs rating to decrease 2,000 points.
^ ^ ^ ^ Page 98 4) Is there a relationship between the raises administrators at State University receive and their performance on the job? A faculty group wants to determine whether job rating (x) is a useful linear predictor of raise (y). Consequently, the group considered the straight  line regression model y = β 0 + β 1 x. Using the method of least squares, the faculty group obtained the following prediction equation: y = 14,000  2,000x Interpret the estimated y  intercept of the line. A) For an administrator who receives a rating of zero, we estimate his or her raise to be $14,000. B) The base administrator raise at State University is $14,000. C) For a 1 point increase in an administratorʹs rating, we estimate the administratorʹs raise to increase $14,000. D) There is no practical interpretation, since rating of 0 is nonsensical and outside the range of the sample data. 5) A large national bank charges local companies for using its services. A bank official reported the results of a regression analysis designed to predict the bankʹs charges (y), measured in dollars per month, for services rendered to local companies. One independent variable usedto predict service charge to a company is the companyʹs sales revenue (x), measured in millions of dollars. Data for 21 companies who use the bankʹs services were used to fit the model y = β 0 + β 1 x. The results of the simple linear regression are provided below. _____________________________________________________________________ ^ y = 2,700 + 20x, s = 65, 2  tailed p  value = .064 (for testing β 1 ) Interpret the estimate of β 0 , the y intercept of the line. A) There is no practical interpretation since a sales revenue of $0 is a nonsensical value. B) All companies will be charged at least $2,700 by the bank. C) About 95% of the observed service charges fall within $2,700 of the least squares line. D) For every $1 million increase in sales revenue, we expect a service charge to increase $2,700. 6) Civil engineers often use the straight line equation y = β 0 + β 1 x to model the relationship between the mean shear strength of masonry joints and precompression stress, x. To test this theory, a series of stress tests were performed on solid bricks arranged in triplets and joined with mortar. The precompression stress was varied for each triplet and the ultimate shear load just before failure (called the shear strength) was recorded. The stress results for n = 7 triplet tests is shown in the accompanying table followed by a SAS printout of the regression analysis. Page 99
^ ^ ^ ^ Triplet Test Shear Strength, y (tons) Precomp. Stress, x (tons) 1 2 3 4 5 6 7 1.00 2.18 2.24 2.41 2.59 2.82 3.06 0 .60 1.20 1.33 1.43 1.75 1.75 Analysis of Variance Source DF Model Error C Total Root MSE Dep Mean C.V. 1 5 6 Sum of Mean Squares Square 2.39555 0.25094 2.64649 0.22403 2.32857 9.62073 2.39555 0.05019 F Value 47.732 Prob > F 0.0010 R square Adj R sq 0.9052 0.8862 Parameter Estimates Parameter Estimate 1.191930 0.987157 Standard Error 0.18503093 0.14288331 T for HO: Parameter=0 6.442 6.909 Variable INTERCEP X DF 1 1 Prob > T 0.0013 0.0010 Give a practical interpretation of the estimate of the slope of the least squares line. A) For every 1 ton increase in precompression stress, we estimate the shear strength of the joint to increase by .987 ton. B) For a triplet test with a precompression stress of 1 ton, we estimate the shear strength of the joint to be .987 ton. C) For every .987 ton increase in precompression stress, we estimate the shear strength of the joint to increase by 1 ton. D) For a triplet test with a precompression stress of 0 tons, we estimate the shear strength of the joint to be 1.19 tons. 7) Civil engineers often use the straight line equation y = β 0 + β 1 x to model the relationship between the mean shear strength of masonry joints and precompression stress, x. To test this theory, a series of stress tests were performed on solid bricks arranged in triplets and joined with mortar. The precompression stress was varied for each triplet and the ultimate shear load just before failure (called the shear strength) was recorded. The stress results for n = 7 triplet tests is shown in the accompanying table followed by a SAS printout of the regression analysis. Triplet Test Shear Strength, y (tons) Precomp. Stress, x (tons) 1 2 3 4 5 6 7 1.00 2.18 2.24 2.41 2.59 2.82 3.06 0 .60 1.20 1.33 1.43 Page 100 1.75 1.75
^ Analysis of Variance Source DF Model Error C Total Root MSE Dep Mean C.V. 1 5 6 Sum of Mean Squares Square 2.39555 0.25094 2.64649 0.22403 2.32857 9.62073 2.39555 0.05019 F Value 47.732 Prob > F 0.0010 R square Adj R sq 0.9052 0.8862 Parameter Estimates Parameter Estimate 1.191930 0.987157 Standard Error 0.18503093 0.14288331 T for HO: Parameter=0 6.442 6.909 Variable INTERCEP X DF 1 1 Prob > T 0.0013 0.0010 Give a practical interpretation of the estimate of the y  intercept of the least squares line. A) For a triplet test with a precompression stress of 0 tons, we estimate the shear strength of the joint to be 1.19 tons. B) For every 1 ton increase in precompression stress, we estimate the shear strength of the joint to increase by .987 ton. C) There is no practical interpretation since a triplet test with a precompression stress of 0 tons is outside the range of the sample data. D) For a triplet test with a precompression stress of 0 tons, we estimate the shear strength of the joint to increase 1.19 tons. 8) Each year a nationally recognized publication conducts its ʺSurvey of Americaʹs Best Graduate and Professional Schools.ʺ An academic advisor wants to predict the typical starting salary of a graduate at a top business school using GMAT score of the school as a predictor variable. Total GMAT scores range from 200 to 800. A simple linear regression of SALARY versus GMAT using 25 data points shown below. _____________________________________________________________________ ^ ^ β 0 =  92040 β 1 = 228 s = 3213 R2 = .66 r = .81 df = 23 t = 6.67 Give a practical interpretation of β 0 =  92040. A) The value has no practical interpretation since a GMAT of 0 is nonsensical and outside the range of the sample data. B) We expect to predict SALARY to within 2(92040) = $184,080 of its true value using GMAT in a straight line model. C) We estimate SALARY to decrease $92,040 for every 1  point increase in GMAT. D) We estimate the base SALARY of graduates of a top business school to be $  92,040.
^ Page 101 9) Each year a nationally recognized publication conducts its ʺSurvey of Americaʹs Best Graduate and Professional Schools.ʺ An academic advisor wants to predict the typical starting salary of a graduate at a top business school using GMAT score of the school as a predictor variable. Total GMAT scores range from 200 to 800. A simple linear regression of SALARY versus GMAT using 25 data points shown below. _____________________________________________________________________ ^ ^ β 0 =  92040 β 1 = 228 s = 3213 R2 = .66 r = .81 df = 23 t = 6.67 Give a practical interpretation of β 1 = 228. A) We estimate SALARY to increase $228 for every 1  point increase in GMAT. B) We expect to predict SALARY to within 2(228) = $456 of its true value using GMAT in a straight  line model. C) We estimate GMAT to increase 228 points for every $1 increase in SALARY. D) The value has no practical interpretation since a GMAT of 0 is nonsensical and outside the range of the sample data. 10) A real estate magazine reported the results of a regression analysis designed to predict the price (y), measured in dollars, of residential properties recently sold in a northern Virginia subdivision. One independent variable used to predict sale price is GLA, gross living area (x), measured in square feet. Data for 157 properties were used to fit the model y = β 0 + β 1 x. The results of the simple linear regression are provided below. _____________________________________________________________________ y = 96,600 + 22.5x s = 6500 r2 = .77 t = 6.1 (for testing β 1 ) Interpret the estimate of β 0 , the y intercept of the line. A) There is no practical interpretation, since a gross living area of 0 is a nonsensical value. B) All residential properties in Virginia will sell for at least $96,600. C) About 95% of the observed sale prices fall within $96,600 of the least squares line. D) For every 1 sq ft. increase in GLA, we expect a propertyʹs sale price to increase $96,600.
^ ^ Page 102 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 11) In a comprehensive road test on all new car models, one variable measured is the time it takes a car to accelerate from 0 to 60 miles per hour. To model acceleration time, a regression analysis is conducted on a random sample of 129 new cars. TIME60: y = Elapsed time (in seconds) from 0 mph to 60 mph MAX: x1 = Maximum speed attained (miles per hour) Initially, the simple linear model E(y) = β 0 + β 1 x1 was fit to the data. Computer printouts for the analysis are given below: UNWEIGHTED LEAST SQUARES LINEAR REGRESSION OF TIME60 PREDICTOR VARIABLES COEFFICIENT STD ERROR STUDENTʹS T P CONSTANT 18.7171 0.63708 29.38 0.0000 MAX 0.00491 0.0000  0.08365  17.05 R SQUARED ADJUSTED R  SQUARED SOURCE REGRESSION RESIDUAL TOTAL DF 1 127 128 0.6960 RESID. MEAN SQUARE (MSE) 0.6937 STANDARD DEVIATION MS 374.285 1.28695 F 290.83 P 0.0000 1.28695 1.13444 SS 374.285 163.443 537.728 CASES INCLUDED 129 MISSING CASES 0 Find and interpret the estimate b1 in the printout above. 12) In a study of feeding behavior, zoologists recorded the number of grunts of a warthog feeding by a lake in the 15 minute period following the addition of food. The data showing the weekly number of grunts and and the age of the warthog (in days) are listed below: Week 1 2 3 4 5 6 7 8 9 Number of Grunts 82 60 31 36 55 32 54 9 12 Age (days) 117 133 147 152 159 166 175 181 187 a. Write the equation of a straight  line model relating number of grunts (y) to age (x). b. Give the least squares prediction equation. ^ c. Give a practical interpretation of the value of β 0 , if possible. d. Give a practical interpretation of the value of β 1 , if possible.
^ Page 103 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 13) Given the following least squares prediction equation, y =  173 + 74x , we estimate y to _______ by _________ with each 1  unit increase in x . A) increase, 74 B) decrease, 74
^ ^ C) decrease, 173 D) increase 173 14) Given the equation of a regression line is y = 3x  10, what is the best predicted value for y given x = 2? A)  4 B) 16
^ C) 17 D)  5 15) Given the equation of a regression line is y =  4.5x 5.4, what is the best predicted value for y given x = 5.8? A)  31.50 B)  20.70 C) 20.70 D) 31.50 16) Use the regression equation to predict the value of y for x =  3.8. x y  5  3 4 1  1  2 0 2 3  4  10  8 9 1  2  6  1 3 6  8 A)  8.521 B)  7.417 C) 4.195 D)  0.001 17) Use the regression equation to predict the value of y for x = 0.5. x y  5  3 4 1  1  2 0 2 3  4 11 6  6  1 3 4 1  4  5 8 A)  0.184 B) 1.701 C)  1.506 D) 2.264 18) The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. What is the best predicted value for y given x = 6? Hours, x Scores, y A) 86 3 5 2 8 2 4 4 5 6 3 65 80 60 88 66 78 85 90 90 71 B) 85 C) 84 D) 87 19) The data below are the temperatures on randomly chosen days during a summer class and the number of absences on those days. What is the best predicted value for y given x = 97? Temperature, x Number of absences, y A) 13 72 85 91 90 88 98 75 100 80 3 7 10 10 8 15 4 15 5 B) 14 C) 15 D) 16 Page 104 20) The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly selected adults. What is the best predicted value for y given x = 43? Age, x Pressure, y A) 124 38 41 45 48 51 53 57 61 65 116 120 123 131 142 145 148 150 152 B) 126 C) 122 D) 120 21) The data below are the number of absences and the final grades of 9 randomly selected students from a statistics class. What is the best predicted value for y given x = 17? . Number of absences, x Final grade, y A) 49 0 3 6 4 9 2 15 8 5 98 86 80 82 71 92 55 76 82 B) 50 C) 51 D) 48 22) In order for applicants to work for the foreign  service department, they must take a test in the language of the country where they plan to work. The data below show the relationship between the number of years that applicants have studied a particular language and the grades they received on the proficiency exam. What is the best predicted value for y given x = 3.5? Number of years, x Grades on test, y A) 70 3 4 4 5 3 6 2 7 3 61 68 75 82 73 90 58 93 72 B) 68 C) 66 D) 72 23) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). Which is the best predicted value for y given x = 9.1? Rain fall (in inches), x 10.5 8.8 13.4 12.5 18.8 10.3 7.0 15.6 16.0 Yield (bushels per acre), y 50.5 46.2 58.8 59.0 82.4 49.2 31.9 76.0 78.8 A) 44.1 B) 44.4 C) 43.9 D) 44.6 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 24) A calculus instructor is interested in finding the strength of a relationship between the final exam grades of students enrolled in Calculus I and Calculus II at his college. The data (in percentages) are listed below. Calculus I 88 78 62 75 95 91 83 86 98 Calculus II 81 80 55 78 90 90 81 80 100 a) Graph a scatter diagram of the data. b) Find an equation of the regression line. c) Predict a Calculus II exam score for a student who receives an 80 in Calculus I. Page 105 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 25) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). The data for a 9 year period is as follows: Rain Fall x 13.1 11.4 16.0 15.1 21.4 12.9 9.6 18.2 18.6 Yield y 48.5 44.2 56.8 80.4 47.2 29.9 74.0 74.0 76.8 The equation of the line of least squares is given as y =  9.12 + 4.38x. How many bushels of wheat per acre can be predicted if it is expected that there will be 17 inches of rain? A) 65.34 B) 5.96 C) 61.18 D) 52.06 26) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). The data for a 9 year period is as follows: Rain Fall x 13.1 11.4 16.0 15.1 21.4 12.9 9.6 18.2 18.6 Yield y 48.5 44.2 56.8 80.4 47.2 29.9 74.0 74.0 76.8 The equation of the line of least squares is given as y =  9.12 + 4.38x. What would be the expected number of inches of rain if the yield is 60 bushels of wheat per acre? A) 15.78 B) 253.68 C) 11.62 D) 64.74 27) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). The data for a 9 year period is as follows: Rain Fall x 13.1 11.4 16.0 15.1 21.4 12.9 9.6 18.2 18.6 Yield y 48.5 44.2 56.8 80.4 47.2 29.9 74.0 74.0 76.8 The equation of the line of least squares is given as y =  9.12 + 4.38x. How many bushels of wheat per acre can be predicted if it is expected that there will be 30 inches of rain? A) Cannot be certain of the result because 30 inches of rain exceeds the observed data. B) 122.28 C) 140.52 D) 8.93 3 Compute the Sum of Squared Residuals MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) The regression line for the given data is y = 2.097x  0.552. x y 5 3 4 1 1 2 0 2 3 4  10  8 9 1  2  6  1 3 6  8
^ Determine the residual of a data point for which x =  4 and y =  8. A) 0.94 B)  16.94 C)  8.94 D) 13.328 Page 106 2) The regression line for the given data is y =  1.885x + 0.758. x y  5  3 4 1  1  2 0 2 3  4 11 6  6  1 3 4 1  4  5 8 ^ Determine the residual of a data point for which x =  3 and y = 6. A)  0.413 B) 12.413
^ C) 6.413 D) 7.552 3) The regression line for the given data is y =  0.206x + 2.097. x y  5  3 4 1  1  2 0 2 3  4 11  6 8  3  2 1 5  5 6 7 Determine the residual of a data point for which x = 1 and y =  3. A)  4.891 B)  1.109
^ C) 1.891 D)  1.715 4) The regression line for the given data is y = 5.044x + 56.11. Hours, x Scores, y 3 5 2 8 2 4 4 5 6 3 65 80 60 88 66 78 85 90 90 71 Determine the residual of a data point for which x = 5 and y = 90. A) 8.67 B) 171.33
^ C) 81.33 D)  505.07 5) The regression line for the given data is y = 0.449x  30.27. Temperature, x Number of absences, y 72 85 91 90 88 98 75 100 80 3 7 10 10 8 15 4 15 5 Determine the residual of a data point for which x = 85 and y = 7. A)  0.895 B) 14.895
^ C) 7.895 D) 112.127 6) The regression line for the given data is y = 1.488x + 60.46. Age, x Pressure, y 38 41 45 48 51 53 57 61 65 116 120 123 131 142 145 148 150 152 Determine the residual of a data point for which x = 48 and y = 131. A)  0.884 B) 262.884
^ C) 131.884 D)  207.388 7) The regression line for the given data is y =  2.75x + 96.14. Number of absences, x Final grade, y 0 3 6 4 9 2 15 8 5 98 86 80 82 71 92 55 76 82 Determine the residual of a data point for which x = 0 and y = 98. A) 1.86 B) 194.14 C) 96.14 D) 173.36 Page 107 8) The regression line for the given data is y = 3.53x + 37.92. Miles traveled, x Sales, y 2 3 10 7 8 15 3 1 11 31 33 78 62 65 61 48 55 120 ^ Determine the residual of a data point for which x = 15 and y = 61. A)  29.87 B) 151.87
^ C) 90.87 D)  238.25 9) The regression line for the given data is y = 6.91x + 46.26. Number of years, x Grades on test, y 3 4 4 5 3 6 2 7 3 61 68 75 82 73 90 58 93 72 Determine the residual of a data point for which x = 5 and y = 82. A) 1.19 B) 162.81
^ C) 80.81 D)  607.88 10) The regression line for the given data is y = 4.379x + 4.267. 10.5 8.8 13.4 12.5 18.8 10.3 7.0 15.6 16.0 Rain fall (in inches), x Yield (bushels per acre), y 50.5 46.2 58.8 59.0 82.4 49.2 31.9 76.0 78.8 Determine the residual of a data point for which x = 10.5 and y = 50.5. A) 0.2535 B) 100.7465 C) 50.2465 D)  214.9065 11) Compute the sum of the squared residuals of the least  squares line for the given data. x y 5 3 4 1 1 2 0 2 3 4  10  8 9 1  2  6  1 3 6  8 A) 7.624 B) 1.036 C) 2.097 D) 0 12) The data below are the final exam scores of 10 randomly selected statistics students and the number of hours they studied for the exam. Compute the sum of the squared residuals of the least  squares line for the given data. Hours, x Scores, y 3 5 2 8 2 4 4 5 6 3 65 80 60 88 66 78 85 90 90 71 B) 804.062 C) 1122.1 D) 39.755 A) 318.038 13) In an area of the Midwest, records were kept on the relationship between the rainfall (in inches) and the yield of wheat (bushels per acre). Compute the sum of the squared residuals of the least  squares line for the given data. 10.5 8.8 13.4 12.5 18.8 10.3 7.0 15.6 16.0 Rain fall (in inches), x Yield (bushels per acre), y 50.5 46.2 58.8 59.0 82.4 49.2 31.9 76.0 78.8 A) 87.192 B) 2207.628 Page 108 C) 4.379 D) 0 14) In a study of feeding behavior, zoologists recorded the number of grunts of a warthog feeding by a lake in a 15 minute time period following the addition of food. The data showing the weekly number of grunts and the age of the warthog (in days) are listed below. Week 1 2 3 4 5 6 7 8 9 Number of Grunts 90 68 39 44 63 40 62 17 20 Age (days) 125 141 155 160 167 174 183 189 195 Compute the sum of the squared residuals of the least squared line for the given data. A) 5533.53 B) 188.84 C) 74.39 D) 13.74 15) The data below are the ages and systolic blood pressure (measured in Millimeters of mercury) of 9 randomly selected adults. Age, x 38 41 45 48 51 53 57 61 65 Pressure, y 116 12. 123 131 142 145 148 150 152 Compute the sum of the squared residuals of the least squared line for the given data. A) 123.63 B) 1.41 C) 1.99 D) 11.11 16) A calculus instructor is interested the performance of his students from Calculus I that go on to Calculus II. Their final grades in each course (in percent) are given below. Calculus I Calculus II 88 81 78 80 62 55 75 78 95 90 91 90 83 81 86 80 98 100 Compute the sum of the squared residuals of the least squared line for the given data. A) 130.14 B) 30.85 C) 11.41 D) 1075.9 Page 109 4.3 Diagnostics on the LeastSquares Regression Line
1 Compute and Interpret the Coefficient of Determination SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 1) Calculate the coefficient of determination, given that the linear correlation coefficient, r, is 0.837. What does this tell you about the explained variation and the unexplained variation of the data about the regression line? 2) Calculate the coefficient of determination, given that the linear correlation coefficient, r, is  0.625. What does this tell you about the explained variation and the unexplained variation of the data about the regression line? 3) Calculate the coefficient of determination, given that the linear correlation coefficient, r, is 1. What does this tell you about the explained variation and the unexplained variation of the data about the regression line? 4) In a study of feeding behavior, zoologists recorded the number of grunts of a warthog feeding by a lake in the 15 minute period following the addition of food. The data showing the weekly number of grunts and and the age of the warthog (in days) are listed below: Number of Grunts 92 70 41 46 65 42 64 19 22 Age (days) 127 143 157 162 169 176 185 191 197 Find and interpret the value of R2 . Page 110 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 5) In a comprehensive road test on all new car models, one variable measured is the time it takes a car to accelerate from 0 to 60 miles per hour. To model acceleration time, a regression analysis is conducted on a random sample of 129 new cars. TIME60: MAX: y = Elapsed time (in seconds) from 0 mph to 60 mph x1 = Maximum speed attained (miles per hour) Initially, the simple linear model E(y) = β 0 + β 1x1 was fit to the data. Computer printouts for the analysis are given below: UNWEIGHTED LEAST SQUARES LINEAR REGRESSION OF TIME60 PREDICTOR VARIABLES COEFFICIENT STD ERROR STUDENTʹS T P CONSTANT 18.7171 0.63708 29.38 0.0000 MAX 0.00491 0.0000  0.08365  17.05 R SQUARED ADJUSTED R  SQUARED SOURCE REGRESSION RESIDUAL TOTAL DF 1 127 128 0.6960 RESID. MEAN SQUARE (MSE) 0.6937 STANDARD DEVIATION MS 374.285 1.28695 F 290.83 P 0.0000 1.28695 1.13444 SS 374.285 163.443 537.728 CASES INCLUDED 129 MISSING CASES 0 Approximately what percentage of the sample variation in acceleration time can be explained by the simple linear model? A) 70% B) 0% C)  17% D) 8% Page 111 6) A manufacturer of boiler drums wants to use regression to predict the number of man  hours needed to erect drums in the future. The manufacturer collected a random sample of 35 boilers and measured the following two variables: MANHRS: y = Number of man  hours required to erect the drum PRESSURE: x1 = Boiler design pressure (pounds per square inch, i.e., psi) Initially, the simple linear model E(y) = β 1 + β 1x1 was fit to the data. A printout for the analysis appears below: UNWEIGHTED LEAST SQUARES LINEAR REGRESSION OF MANHRS PREDICTOR VARIABLES CONSTANT PRESSURE COEFFICIENT 1.88059 0.00321 0.4342 0.4176 STD ERROR 0.58380 0.00163 STUDENTʹS T 3.22 2.17 P 0.0028 0.0300 4.25460 2.06267 R SQUARED ADJUSTED R  SQUARED SOURCE REGRESSION RESIDUAL TOTAL DF 1 34 35 RESID. MEAN SQUARE (MSE) STANDARD DEVIATION MS 111.008 4.25160 F 5.19 P 0.0300 SS 111.008 144.656 255.665 Give a practical interpretation of the coefficient of determination, R2 . A) About 43% of the sample variation in number of man  hours can be explained by the simple linear model. B) y = 1.88 + 0.00321x will be correct 43% of the time. C) Man hours needed to erect drums will be assosicated with boiler design pressure 43% of the time. D) About 2.06% of the sample variation in number of man  hours can be explained by the simple linear model. 7) Civil engineers often use the straight line equation E(y) = β 0 + β 1 x to model the relationship between the mean shear strength E(y) of masonry joints and precompression stress, x. To test this theory, a series of stress tests were performed on solid bricks arranged in triplets and joined with mortar. The precompression stress was varied for each triplet and the ultimate shear load just before failure (called the shear strength) was recorded. The stress results for n = 7 triplet tests is shown in the accompanying table followed by a SAS printout of the regression analysis. Triplet Test Shear Strength, y (tons) Precomp. Stress, x (tons) 1 2 3 4 5 6 7 1.00 2.18 2.24 2.41 2.59 2.82 3.06 0 .60 1.20 1.33 1.43 1.75 1.75
^ Analysis of Variance Source DF Sum of Mean Squares Square F Value Page 112 Prob > F Model Error C Total Root MSE Dep Mean C.V. 1 5 6 2.39555 0.25094 2.64649 0.22403 2.32857 9.62073 2.39555 0.05019 47.732 0.0010 R square Adj R sq 0.9052 0.8862 Parameter Estimates Parameter Estimate 1.191930 0.987157 Standard Error 0.18503093 0.14288331 T for HO: Parameter=0 6.442 6.909 Variable INTERCEP X DF 1 1 Prob > T 0.0013 0.0010 Give a practical interpretation of R2 , the coefficient of determination for the least squares model. A) About 91% of the total variation in the sample of y  values can be explained by (or attributed to) the linear relationship between shear strength and precompression stress. B) In repeated sampling, approximately 91% of all similarly constructed regression lines will accurately predict shear strength. C) We expect to predict the shear strength of a triplet test to within about .91 ton of its true value. D) We expect about 91% of the observed shear strength values to lie on the least squares line. 8) The dean of the Business School at a small Florida college wishes to determine whether the grade  point average (GPA) of a graduating student can be used to predict the graduateʹs starting salary. More specifically, the dean wants to know whether higher GPAʹs lead to higher starting salaries. Records for 23 of last yearʹs Business School graduates are selected at random, and data on GPA (x) and starting salary (y, in $thousands) for each graduate were used to fit the model E(y) = β 0 + β 1 x The results of the simple linear regression are provided below. ^ y = 4.25 + 2.75x, SSxy = 5.15, SSxx = 1.87 SSyy = 15.17, SSE = 1.0075 Range of the x values: 2.23  3.85 Range of the y values: 9.3  15.6 Calculate the value of R2 , the coefficient of determination. A) 0.934 B) 0.661 C) 0.872 D) 0.339 Page 113 9) Each year a nationally recognized publication conducts its ʺSurvey of Americaʹs Best Graduate and Professional Schools.ʺ An academic advisor wants to predict the typical starting salary of a graduate at a top business school using GMAT score of the school as a predictor variable. A simple linear regression of SALARY versus GMAT using 25 data points shown below. _____________________________________________________________________ b0 =  92040 b1 = 228 s = 3213 R2 = .66 r = .81 df = 23 t = 6.67 Give a practical interpretation of R2 = .66. A) 66% of the sample variation in SALARY can be explained by using GMAT in a straight  line model. B) 66% of the differences in SALARY are caused by differences in GMAT scores. C) We estimate SALARY to increase $.66 for every 1  point increase in GMAT. D) We can predict SALARY correctly 66% of the time using GMAT in a straight  line model. 10) A real estate magazine reported the results of a regression analysis designed to predict the price (y), measured in dollars, of residential properties recently sold in a northern Virginia subdivision. One independent variable used to predict sale price is GLA, gross living area (x), measured in square feet. Data for 157 properties were used to fit the model E(y) = β 0 + β 1 x. The results of the simple linear regression are provided below. _____________________________________________________________________ y = 96,600 + 22.5x s = 6500 R2 = .77 t = 6.1 (for testing β 1 ) Interpret the value of the coefficient of determination, R2 . A) 77% of the total variation in the sample sale prices can be attributed to the linear relationship between GLA (x) and (y). B) GLA (x) is linearlyrelated to sale price (y) 77% of the time. C) 77% of the observed sale prices (yʹs) will fall within 2 standard deviations of the least squares line. D) There is a moderately strong positive correlation between sale price (y) and GLA (x). Page 114 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 11) A company keeps extensive records on its new salespeople on the premise that sales should increase with experience. A random sample of seven new salespeople produced the data on experience and sales shown in the table. Months on Job 2 4 8 12 1 5 9 Monthly Sales y ($ thousands) 2.4 7.0 11.3 15.0 .8 3.7 12.0 Summary statistics yield SSxx = 94.8571, SSxy = 124.7571, SSyy = 176.5171, x = 5.8571, and y = 7.4571. Find and interpret the coefficient of determination. 12) To investigate the relationship between yield of potatoes, y, and level of fertilizer application, x, an experimenter divides a field into eight plots of equal size and applies differing amounts of fertilizer to each. The yield of potatoes (in pounds) and the fertilizer application (in pounds) are recorded for each plot. The data are as follows: x y 1 1.5 2 2.5 3 3.5 4 4.5 25 31 27 28 36 35 32 34 Summary statistics yield SSxx = 10.5, SSyy = 112, and SSxy = 25. Calculate the coefficient of determination. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 13) The coefficient of correlation between x and y is r = .59. Calculate the coefficient of determination R2 . A) .35 B) .59 C) .41 D) .65 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. 14) The coefficient of determination for a straight  line model relating selling price y to manufacturing cost x for a particular item is R2 = .83. Interpret this value. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 15) The measures the percentage of total variation in the response variable that is explained B) Coefficient of linear correlation D) Slope of the regression line by the least squares regression line. A) Coefficient of determination C) Sum of the residuals squared Page 115 16) If the coefficient of determination is close to 1, then A) The least squares regression line equation explains most of the variation in the response variable. B) The least squares regression line equation has no explanatory value. C) The sum of the square residuals is large compared to the total variation. D) The linear correlation coefficient is close to zero. 17) The coefficient of determination is the A) Square B) Square root of the linear correlation coefficient. C) Opposite D) Reciprocal Page 116 Ch. 4 Describing the Relation between Two Variables Answer Key 4.1 Scatter Diagrams and Correlation
1 Draw and Interpret Scatter Diagrams 1) 2) 3) Page 117 4) 5) 6) Page 118 7) 8) 9) Page 119 10) A 11) A 12) A 13) predictor variable: rainfall in inches; response variable: yield per acre 14) predictor variable: hours studying; response variable: grades on the test 15) A 2 Understand the Properties of the Linear Correlation Coefficient 1) A 2) A 3) A 4) A 5) A 6) A 7) A 8) There appears to be a positive linear correlation. 9) There appears to be a negative linear correlation. Page 120 10) There appears to be no linear correlation. 11) In general, there appears to be a relationship between the home runs and batting averages. As the number of home runs increased, the batting averages increased. 12) There appears to be a trend in the data. As the number of absences increases, the final grade decreases. 13) A Page 121 14) A 15) A 16) A 17) A 18) A 19) A 20) A 21) A 3 Compute and Interpret the Linear Correlation Coefficient 1) A 2) A 3) A 4) A 5) A 6) A 7) A 8) A 9) A 10) A 11) The correlation coefficient remains unchanged. 4.2 LeastSquares Regression
1 Find the LeastSquares Regression Line and Use the Line to Make Predictions 1) A 2) A 3) A 4) A 5) A 6) A 7) A 8) A 9) A 10) A 11) The regression lines are not necessarily the same. Page 122 12) SSxx = ∑x2  ∑x
n 2 = .948622  (3.642)2 = .00118171 14 SSxy = ∑xy  ∑x ∑y = 295.54  (3.642)(1,134) = .538
n 14 y = ∑y = 1,134 = 81
n 14 x = ∑x = 3.642 = .26014
n 14 SSxy SSxx
^ β 1 = ^ ^ = .538 = 455.27 .00118171 β 0 = y  β 1 x = 81  455.27(.26014) =  37.434 The least squares equation is y =  37.434 + 455.27x. SSxy 2,862.3375 ^ = = 2.4804 13) β 1 = SSxx 1,006.3773 β 0 = y  β 1 x = 95.0625  2.4804(21.2675) = 42.3106 The least squares prediction equation is y = 42.3106 + 2.4804x. 14) A 15) A 16) A 2 Interpret the Slope and y intercept of the Least Squares Regression Line 1) A 2) A 3) A 4) A 5) A 6) A 7) A 8) A 9) A 10) A 11) b1 =  .08365. For every 1 mile per hour increase in the maximum attained speed of a new car, we estimate the elapsed 0 to 60 acceleration time to decrease by .08365 seconds. 12) a. E(y) = β 0 + β 1 x b. y = β 0 + β 1 x = 170.24  .8195x c. We would expect approximately 170 grunts after feeding a warthog that was just born. However, since the value 0 in outside the range of the original data set, this estimate is highly unreliable. d. For each additional day, we estimate the number of grunts will decrease by .8195. A A A A Page 123
^ ^ ^ ^ ^ ^ ^ 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) A A A A A A A a) See graph below. b) y = 1.044x  5.990 c) When x = 80, y = 78. 25) A 26) A 27) A 3 Compute the Sum of Squared Residuals 1) A 2) A 3) A 4) A 5) A 6) A 7) A 8) A 9) A 10) A 11) A 12) A 13) A 14) A 15) A 16) A
^ ^ 4.3 Diagnostics on the LeastSquares Regression Line
1 Compute and Interpret the Coefficient of Determination 1) The coefficient of determination, R2 , = 0.701. That is, 70.1% of the variation is explained and 29.9% of the variation is unexplained. 2) The coefficient of determination, R2 , = 0.391. That is, 39.1% of the variation is explained and 60.9% of the variation is unexplained. 3) The coefficient of determination, R2 , = 1. That is, 100% of the variation is explained and there is no variation that is unexplained. Page 124 r2 = .627; Approximately 62.7% of the variation in the number of grunts is explained by age. A A A A A A 11) R2 = 92.96% of the variation in the sample monthly sales values about their mean can be explained by using months on the job in a linear model. 12) R2 = 0.5315 4) 5) 6) 7) 8) 9) 10) 13) 14) 15) 16) 17) A The model explains 83% of sample variation in cost. A A A Page 125 ...
View
Full
Document
This note was uploaded on 06/06/2010 for the course EC 11 taught by Professor All during the Spring '10 term at UCLA.
 Spring '10
 all

Click to edit the document details