hw12 - STAT 350 – Spring 2009 Homework#12 SOLUTION covers...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STAT 350 – Spring 2009 Homework #12 SOLUTION covers through Lecture S 1. An ecologist is studying the trade-offs individuals make between investment in survival and growth and the investment in reproduction. In particular she is studying this relationship in a highly variable species of prairie grasses. She has sampled 25 individual (randomly selected) plants across the range of this species. For each plant, she has measured seed mass and root volume. Let xi denote the root volume (in mL) of the ith plant and yi denote the seed mass (in mg) 25 of the ith plant. She finds ∑ x = 926 , ∑ y i =1 25 ∑ ( x y ) = 63,677. i =1 i i 25 i i =1 i 25 = 1,928, ∑x i =1 2 i 25 = 51,860, ∑y i =1 2 i = 153,997, and The root volumes ranged from 8.2 to 102.3 mL and the seed masses ranged from 44.7 to 102.5 mg. Warning: do not round-off too much in intermediate calculations! a. Calculate SSxy. ( 926 )(1928) = −7, 736.12 = 63677 − 25 b. Calculate SSxx. 9262 51860 − = 17,560.96 25 c. Calculate SSyy. 19282 153997 − = 5309.64 25 d. Calculate the standard deviation of root volume SS xx 17560.96 sx = = = 27.05077 n −1 25 − 1 e. Calculate the standard deviation of seed mass SS yy 5309.64 sy = = = 14.87397 n −1 25 − 1 f. Give a 95% CI for the true mean seed mass of individuals from this species. ⎛s ⎞ 95% CI for μ y : y ± tcrit ,df = 25− 2 ⎜ y ⎟ ⎝ n⎠ 1928 ⎛ 14.87397 ⎞ ± 2.069 ⎜ ⎟ 25 25 ⎠ ⎝ 77.12 → (70.965, 83.275) Homework #12 – Solution Page 1 of 1 g. The biologist finds another individual of this species, but during a time of year when the plant is not producing seeds, give a 95% prediction interval for the seed mass that would be produced by this individual. 1 1 77.12 ± 2.069 (14.87397 ) 1 + y ± tcrit , df = 25− 2 ( s y ) 1 + → 25 n → (45.736, 108.504) h. Find the Pearson Correlation between the root volume and seed mass of this species. SS xy −7736.12 = = −0.801155 r= 17560.96 5309.64 SS xx SS yy i. Find the least squares regression equation to predict seed mass from root volume for this species (give a and b to at least 5 decimal places). b= SS xy SS xx = −7736.12 = −0.4405294471 17560.96 ⎛ 1928 ⎞ ⎛ 926 ⎞ a = y − bx = ⎜ ⎟ − ( −0.4405294471) ⎜ ⎟ = 93.43721072 ⎝ 25 ⎠ ⎝ 25 ⎠ 77.12 37.04 ˆ Answer: y = 93.43721 – 0.44053x j. Create the ANOVA table for the regression analysis (use Excel to obtain the p-value for your table). Total Sums of Squares (SSTo)= SSyy = 5309.64 Residual Sums of Squares (SSResid) = SSTo – b(SSxy) = 5309.64 – (-0.44053)(-7736.12) = 1901.647056 Regression Sums of Squares (SSReg) = SSTo – SSResid = 5309.64 – 1901.647056 = 3407.992944 ANOVA TABLE Source Regression Residual (Error) Total df 1 23 24 SS 3407.992944 1901.647056 5309.64 MS 3407.992944 82.6803068 F 41.2189 p 1.5 × 10-6 Note, the degrees of freedom for the F statistic are: df1 = 1, df2 = 23 Homework #12 – Solution Page 2 of 2 k. What percent of the total variation in seed mass is explained by the linear relationship between root volume and seed mass? This is asking for the Coefficient of Determination (R2), but as a % instead of as a proportion. R2 = SSReg/SSTo = 3407.992944 / 5309.64 = 0.64185 Alternative calculation: Square the Pearson Correlation from (h): (-0.801155)2 = 0.64185 Answer: 64.185% l. Find the standard deviation of the least squares line (se) se = MSE = 82.6803068 = 9.09287 m. Obtain a 95% CI for β (the true average increase in seed mass with a 1-mL increase in root volume). se Where sb = b ± tcrit ,df = 25− 2 sb SS xx ⎛ 9.09287 ⎞ −0.44053 ± 2.069 ⎜ ⎟ ⎝ 17560.96 ⎠ 0.068616 → (-0.58250, -0.29856) n. Based on your answers to parts (j) through (m), what can you conclude about the relationship between root volume and seed mass in this species? Be sure to use proper statistical language and support your answer appropriately! There is a statistically significant relationship between root volume and seed mass in this species. We estimate that the seed mass declines by an average of 0.44 mg for every 1-mL increase in root volume. Conceptually, this would mean that there is an ecological trade-off between resources devoted to growth and survival (roots) and reproduction (seeds). Homework #12 – Solution Page 3 of 3 o. The biologist finds another individual of this species, but during a time of year when the plant is not producing seeds. However, she can measure the root volume and finds that it is 40 mL. Give a 95% prediction interval for the seed mass that would be produced by this individual. ˆ Predicted seed mass: y = 93.43721 – 0.44053(40) = 75.81601 mg * 1 (x − x) ˆ ± ( tcrit , df = n − 2 ) se 1 + + y n SS xx 75.81601 ± 2.069 ( 9.09287 ) 2 1 ( 40 − 37.04 ) 1+ + 25 17560.96 2 → (56.626, 95.006) p. The biologist finds another individual of this species, during a time of year when the plant is not producing seeds. She measures the root volume and finds that it is 100 mL. Give a 95% prediction interval for the seed mass that would be produced by this individual. ˆ Predicted seed mass: y = 93.43721 – 0.44053(100) = 49.38421 mg 49.38421 ± 2.069 ( 9.09287 ) 1 (100 − 37.04 ) 1+ + 25 17560.96 2 → (28.219, 70.550) q. Compare the 3 prediction intervals obtained from parts (g), (o), and (p). Which is the narrowest? Which is the widest? Explain why this is. Width in (g): (108.504 - 45.736) = 62.768 Width in (o): (95.006 - 56.626) = 38.380 Width in (p): (70.550 – 28.219) = 42.331 The narrowest interval was in (o), when we knew that x = 40 The widest interval was n (g), when we had no knowledge about x The reason the interval in (o) is narrower than that in (p) is because the value of x in (o) is closer to the mean of x than the value of x in (p). As long as x and y are correlated, a prediction interval for y should be smaller when we have a value for x. Remember correlation between x and y means that if we know something about x we should know something about the possible/likely values of y r. The 8th individual in the sample had root volume = 23.2 mL and seed mass = 74.5 mg. Give the residual for this observation. ˆ Predicted seed mass: y = 93.43721 – 0.44053(23.2) = 83.216914 The residual is the observed value of y minus the predicted value: 74.5 – 83.216914 = -8.716914 Note: the sign (+/-) is important. "8.716914" is WRONG! Homework #12 – Solution Page 4 of 4 2. For each of the following, determine whether the correlation is "statistically significant" (that is, test the null hypothesis ρ = 0 against the alternative that ρ ≠ 0). Be sure to give the value of the test statistic as well as the p-value. a. r2= 0.01, n = 400 Because the sample size is large, we can use z for the test statistic. Also, because we are doing a 2-tailed test, it doesn't matter if r = 0.1 or -0.1, we will get the same p-value r n − 2 0.1 398 z= = = 2.00504 1 − .01 1− r2 p-value = 2(0.0222) = 0.0444 Although the linear relationship between x and y explains only 1% of the variation in y, there is a statistically significant linear relationship between x and y. b. r = 0.90, n = 4 r n−2 0.9 2 = 2.92 1 − .89 1− r with only 2 degree of freedom, the p-value is 2(0.051) = 0.102. Thus we cannot conclude there is a statistically significant relationship between x and y, despite the apparently high correlation. This is due to the very small sample size which gives a very imprecise estimate of rho. t= 2 Homework #12 – Solution = Page 5 of 5 3. Jason is a sociology major. For his senior thesis, Jason randomly selected a number of residents from his hometown to survey. He asked each subject a range of demographic questions. Among the questions he asked were: "How many years of schooling have you had?" and "What is your annual income?" Limiting his sample to just those 30 subjects who were no longer in school (that is, who had completed their schooling), the number of years of schooling ranged from 9 to 22 years (mean 15.4 years) and the annual incomes ranged from $28,984 to $61,267. Using these 30 subjects, he conducted a regression analysis to explore whether the amount of schooling affects income. The SAS output from this analysis is given below. Use only the SAS output below and the appendix tables from your textbook to answer the following questions. The REG Procedure Model: MODEL1 Dependent Variable: income Number of Observations Read Number of Observations Used 30 30 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 1 28 29 297006095 1179094138 1476100233 297006095 42110505 Root MSE Dependent Mean Coeff Var 6489.26074 44790 14.48804 R-Square Adj R-Sq F Value Pr > F 7.05 0.0129 0.2012 0.1727 Parameter Estimates Variable Intercept years_school DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 30207 946.97283 5617.60177 356.57432 5.38 2.66 <.0001 0.0129 a. What percent of the variation in incomes is explained by the linear relationship between income and schooling? 20.12% b. What is the correlation between income and years of schooling? This is the positive square root of R-Square: (0.2012)0.5 = 0.44855 I know it is positive, because b is positive (946.97283) c. Based on the above analysis, what is the income you would expect for an individual from this town who has had 17 years of schooling? y-hat = 30207 + 946.97283(17) = $46,305.54 d. The 5th subject in this analysis had 17 years of schooling and has an annual income of $41,019. What is the value of the 5th residual? 41019 – 46305.54 = -5286.54 Homework #12 – Solution Page 6 of 6 e. How much extra money should an individual in this town expect to earn for every additional year of school he or she has completed? (i) Give a point estimate. $946.97 (ii) Give a 95% confidence interval 946.97283±2.048(356.57432) → (216.7086, 1677.2370) f. Jason wants to determine if the relationship between years of schooling and annual income is "statistically significant"? (i) Give the value of the appropriate test statistic t = 2.66 (equally acceptable is F = 7.05) (ii) Give the degrees of freedom for that test statistic if answer to (i) was t = 2.66 → answer: 28 if answer to (i) was F = 7.05 → answer: df1 = 1, df2 = 28 (iii) Give the p-value. 0.0129 (iv) Based on this, is the relationship "statistically significant"? Just answer "yes" or "no". yes g. Now assume that Jason wants to test the null hypothesis that years of schooling does not affect annual income (that is, average annual income does not change with an increase in the number of years of schooling) versus the alternative hypothesis that average annual income increases as the number of years of schooling increases. Here the alternative hypothesis is ONE-TAILED! Above it was 2-tailed (i) Give the value of the appropriate test statistic t = 2.66 (here F = 7.05 is NOT acceptable) (ii) Give the degrees of freedom for that test statistic 28 (iii) Give the p-value. 0.0129/2 = 0.00645 Homework #12 – Solution Page 7 of 7 h. According to the Bureau of Labor Statistics, nationwide, average income increased $1750 for each additional year of schooling. Jason wants to compare his town to the national average. He will test the null hypothesis that the trend in his town is the same as the national average against the alternative that the trend in his town is different than the national average. H0: β = 1750, Ha: β ≠ 1750 (i) Give the value of the appropriate test statistic 946.97283 − 1750 t= = −2.25206 356.57432 (ii) Give the degrees of freedom for that test statistic 28 (iii) Give the p-value. From appendix Table VI: (0.015)×2 = 0.030 i. Give a 95% confidence interval for the true mean income of all residents of this town (who have completed their schooling). SS yy SS ( Total ) 1476100233 sy = = = = 7134.42156 30 − 1 n −1 n −1 y ± tdf = n −1= 29 sy ⇒ n → (42126.26, 47453.74) j. 44790 ± ( 2.045 ) 7134.42156 30 Based solely on the preceding analysis, would it be appropriate for Jason to conclude that additional schooling causes increased income? Justify your answer. No! We cannot conclude anything about causation because the study was observational not experimental! Homework #12 – Solution Page 8 of 8 ...
View Full Document

This note was uploaded on 02/16/2010 for the course STAT 350 taught by Professor Staff during the Spring '08 term at Purdue.

Ask a homework question - tutors are online