This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ISyE 2028A, Summer 2010 Monday July 26 Sample Final Exam Solutions 1. A multiple regression model has A. One independent variable. B. Two dependent variables C. Two or more dependent variables. D. Two or more independent variables. E. One independent variable and one independent variable. ANSWER: D 2. In multiple regression models, the error term is assumed to have: A. a mean of 1. B. a standard deviation of 1. C. a variance of 0. D. negative values. E. normal distribution. ANSWER: E 3. The adjusted coefficient of multiple determination is adjusted for A. The value of the error term B. The number of dependent variables in the model C. The number of parameters in the model D. The number of outliers E. The level of significance ANSWER: C 4. In multiple regression analysis with n observations and k predictors (or equivalently k+1 parameters), inferences concerning a single parameter i are based on the standardized variable ˆ T ( i i ) / S ˆ , which has a t‐distribution with degrees of freedom equal to i A. B. C. D. E. n‐k+1 n‐k n‐k‐1 n+k‐1 n+k+1 ANSWER: C 5. A procedure used to estimate the regression parameters 1 and 2 , and to find the least squares line which provides the best approximation for the relationship between the explanatory variable x and the response variable Y is known as the A. least squares method B. best squares method C. regression analysis method D. coefficient of determination method E. prediction analysis method ANSWER: A ISyE 2028A, Summer 2010 Monday July 26 ˆ ˆ 6. The principle of least squares results in values of 0 and 1 that minimizes the sum of squared deviations between A. B. C. D. E. ˆ the observed values of the explanatory variable x and the estimated values x ˆ the observed values of the response variable y and the estimated values y the observed values of the explanatory variable x and the response variable y ˆ the observed values of the explanatory variable x and the response values y the estimated values of the explanatory variable x and the observed values of the response variable y ANSWER: B 7. If xi 28, yi 54, xi yi 156, xi2 82, and n 10, then the least squares estimate of the slope coefficient 1 of the true regression line y 0 1 x is A. 3.60 B. 0.75 C. 1.33 D. 4.80 E. 1.68 ANSWER: C 8. The quantity in the simple linear regression model Y 0 1 x is a random variable, assumed to be normally distributed with E ( ) 0 and V ( ) 2 . Based on 20 observations, if the residual ˆ sum of squares is 8, then the estimated standard deviation is A. 2.500 B. 0.400 C. 0.667 D. 0.444 E. None of the above answers are correct. ANSWER: C 9. In testing H 0. : 1 0 versus H a : 1 0, using a sample of 20 observations, the rejection region for .01 level of significance test is A. t ‐2.878 B. t 2.878 C. ‐2.878 t 2.878 D. either t 2.878 or t ‐2.878 E. t = 0 ANSWER: D 10. Which of the following statements are not true? A. The slope 1 of the population regression line is the true average change in the independent variable x associated with a 1 – unit increase in the dependent variable y. B. The slope of the least squares line, 1 of thepopulation regression line. C. Inferences about the slope 1 of the population regression line are based on thinking of the ˆ slope 1 of the least squares line as a statistic and investigating its sampling distribution. D. All of the above statements are true ISyE 2028A, Summer 2010 E. Non of the above statements are true. ANSWER: A Monday July 26 11. A data set consists of 20 pairs of observations ( x1 , y1 ), ( x2 , y2 ),.........( x20 , y20 ). If each xi is replaced by xi 1 and if each yi is replaced by yi 2, then the sample correlation coefficient r A. decreases by .05 B. decreases by .10 C. increases by.05 D. increases by .10 E. remains unchanged ANSWER: E 12. A scatter plot, along with the least squares line, of x = rainfall volume (m 3 ) and y = runoff volume (m 3 ) for a particular location were given. The accompanying values were read from the plot. X Y X Y a. b. c. d. e. Runoff volume
100 90 80 70 60 50 40 30 20 10 0 0 25 50 75 100 125 150 5 4 12 10 14 13 17 15 23 15 30 25 40 27 47 46 55 38 67 46 72 53 81 70 96 82 112 99 127 100 Does a scatter plot of the data support the use of the simple linear regression model? Calculate point estimates of the slope and intercept of the population regression line. Calculate a point estimate of the true average runoff volume when rainfall volume is 50. Calculate a point estimate of the standard deviation . What proportion of the observed variation in runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall? Rainfall volume ISyE 2028A, Summer 2010 ANSWER: Monday July 26 a. Yes, the scatterplot shows a strong linear relationship between rainfall volume and runoff volume, thus it supports the use of the simple linear regression model. b. x 53.200, y 42.867, S xx 63040 S yy 41,999 ˆ 1 S xy S xx (798) 2 20,586.4, 15 (643)2 (798)(643) 14, 435.7, and S xy 51, 232 17, 024.4. 15 15 17, 024.4 ˆ .82697 and 0 42.867 (.82697)53.2 1.1278. 20, 586.4 c. y 50 1.1278 .82697(50) 40.2207. ˆ d. SSE S yy 1 S xy 14, 435.7 (.82697)(17, 324.4) 357.07. ˆ s SSE n2 357.07 5.24. 13 e. r 2 1 SSE 357.07 1 .9753. So 97.53% of the observed variation in runoff volume SST 14, 435.7 can be attributed to the simple linear regression relationship between runoff and rainfall. 13. A study reports the results of a regression analysis based on n = 15 observations in which x = filter application temperature ( C) and y = % efficiency of BOD removal. Calculated quantities include xi 402, xi2 11, 098, s 3.725, and ˆ1 1.7035. a. Test at level .01 H o : 1 1, which states that the expected increase in % BOD removal is 1 when filter application temperature increases by 1 C, against the alternative H a : 1 1. b. Compute a 99% CI for 1 , the expected increase in % BOD removal for a 1 C increase in filter application temperature. ANSWER: a. We reject H o if t t.01,13 2.650. With t 1.7035 1 3.725 / 324.40 xi2 ( xi ) 2 n 324.40, .7035 3.40. Since 3.40 2.650, H o is rejected in favor of H a . .2068 ISyE 2028A, Summer 2010 (3.012)(3.725) 324.40 Monday July 26 b. t.005,13 3.012, so the C.I. is 1.7035 1.7035 .6229 = (1.08,2.32) 14. Toughness and fibrousness of asparagus are major determinants of quality. This was the focus of a study reported in “Post‐Harvest Glyphosphate Application Reduces Toughening, Fiber Content, and Lignification of Stored Asparagus Spears” (J. of the Amer. Soc. Of Horticultural Science, 1988: 569‐ 572). The article reported the accompanying data (read from a graph) on x = shear force (kg) and y = percent fiber dry weight. X Y X Y n 18 46 2.18 48 2.10 55 57 2.13 2.28 60 2.34 72 2.53 81 2.28 85 2.62 94 2.63 109 2.50 121 2.66 132 137 2.79 2.80 148 3.01 149 2.98 184 3.34 185 3.49 187 3.26 x i 1950 x 2 i 251,970 y i 47.92 yi2 130.6074 x y
i i 5530.92 a. Calculate the value of the sample correlation coefficient. Based on this value, how would you describe the nature of the relationship between the two variables? b. If a first specimen has a larger value of shear force than does a second specimen, what tends to be true of percent dry fiber weight for the two specimens. c. If shear force is expressed in pounds, what happens to the value of r? Why? d. If the simple linear regression model were fit to this data, what proportion of observed variation in percent fiber dry weight could be explained by the model relationship? e. Carry out a test at significance level .01 to decide whether there is a positive linear association between the two variables. ANSWER: a. S xx 251,970 S xy 5530.92 (1950)2 (47.92) 2 40, 720, S yy 130.6074 3.033711, and 18 18
(1950)(47.92) 339.586667, so r 18 339.586667 40, 720 3.033711 .9662. There is a very strong positive correlation between the two variables. b. Because the association between the variables is positive, the specimen with the larger shear force will tend to have a larger percent dry fiber weight. ISyE 2028A, Summer 2010 Monday July 26 c. Changing the units of measurement on either (or both) variables will have no effect on the calculated value of r, because any change in units will affect both the numerator and denominator of r, by exactly the same multiplicative constant. d. r 2 (.966) 2 .933 e. H o : 0 vs H a : 0. t r n2 1 r2 ; Reject H o at level .01 if t t.01,16 2.583. t .966 16 1 .966 2 14.94 2.583, so H o should be rejected. The data indicates a positive linear relationship between the two variables. 15. A sample of 12 radon detectors of a certain type was selected, and each was exposed to 100 pCi/L of radon. The resulting readings were as follows: 104.3 89.6 89.9 95.6 95.2 90.0 98.8 103.7 98.3 106.4 102.0 91.1 a. Does this data suggest that the population mean reading under these conditions differs from 100? State and test the appropriate hypotheses using =.05 ANSWER: n = 12, x 97.075, s 6.1095 a. Parameter of Interest: = true average reading of this type of radon detector when exposed to 100 pCi/L of radon. Null Hypotheses: H 0 : 100 , and Alternative Hypothesis: H a : 100 The test statistic value is t x 0 s/ n 97.075 100 6.1095 / 12 1.6585 The critical region is either t 2.201 or t 2.201 Fail to reject H 0 . The data do not indicate that these readings differ significantly from 100. 16. A study comparing different types of batteries showed that the average lifetimes of Duracell Alkaline AA batteries and Eveready Energizer Alkaline AA batteries were given as 4.5 hours and 4.2 hours, respectively. Suppose these are the population average lifetimes. a. Let X be the sample average lifetime of 150 Duracell batteries and Y be the sample average lifetime of 150 Eveready batteries. What is the mean value of X Y (i.e., where is the distribution of X Y centered)? How does your answer depend on the specified sample sizes? ISyE 2028A, Summer 2010 Monday July 26 b. Suppose the population standard deviations of lifetime are 1.8 hours for Duracell batteries and 2.0 hours for Eveready batteries. With the sample sizes given in part (a), what is the variance of the statistic X Y , and what is its standard deviation? c. For the sample sizes given in part (a), what is the approximate distribution curve of X Y (include a measurement scale on the horizontal axis)? Would the shape of the curve necessarily be the same for sample sizes of 10 batteries of each type? Explain. ANSWER: a. E ( X Y ) E ( X ) E (Y ) 4.5 4.2 .3, irrespective of sample sizes. b. V ( X Y ) V ( X ) V (Y ) X Y .0483 .2197. c. A normal curve with mean and standard deviation as given in parts “a” and “b” (because m n 150, the CLT implies that both X and Y have approximately normal distributions, so X Y does also). The shape is not necessarily that of a normal curve when m n 10, because the CLT cannot be invoked. So if the two lifetime population distributions are not normal, the distribution of X Y will typically be quite complicated. 17. Suppose 1 and 2 are true mean stopping distances at 50 mph for cars of a certain type equipped with two different types of braking systems. Use the two‐sample t test at significance level .01 to test H o : 1 2 10 versus H a : 1 2 10 for the following statistics: m 6, x 116, s1 5.0, n 6, y 129, and s2 5.5. ANSWER: For the given hypotheses, the test statistic t ( x y ) 0
2 2 s1 s2 n n 12
m 2 2 n (1.8) 2 (2.0) 2 .0483, and the standard deviation of 150 150 116 129 10 5.02 5.52 6 6 3.0 .988. 3.035 The number of d.f. is v (4.2168 4.8241) 2 (4.2168)2 (4.8241) 2 5 5 9.96, so use d.f. = 9. We will reject H o if t t.01,9 2.764; since .988 2.764, we don't reject H o . ISyE 2028A, Summer 2010 Normal table: Monday July 26 Remark: The other half is not necessary. ...
View
Full
Document
This note was uploaded on 09/02/2010 for the course ISYE 2028 taught by Professor Shim during the Summer '07 term at Georgia Institute of Technology.
 Summer '07
 SHIM

Click to edit the document details