1997_Final_Exam_Solutions - CEE 304 - UNCERTAINTY ANALYSIS...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam Friday, December 19, 1997 Exam is open notes and open-book. It lasts 150 minutes and there are 150 points. SHOW WORK! 1. (5 points) Give an example of a realistic physical situation and a corresponding random variable (of interest to CEE’s or ABEN’s; but not a chain, floods, wind or rain) that should have a WEIBULL distribution because of the way the variable’s value is physically determined. WHY? 2. (10 points) Ken Hover’s students love to cast and break concrete cylinders. One year the average breaking strength for the class was 5,140 psi with a standard deviation of 2,840 psi. Assume the strengths have a GUMBEL distribution with those moments; what then is the strength that an engineer can be 98% sure that another cylinder would exceed? [I promised a Gumbel problem] Conceptually, is a Gumbel distribution a reasonable model for breaking strengths? 3. (15 points) Kate and Sam were trying to use a stratified sampling design to estimate the level of student interest in an ASCE student chapter project. They took the interest scores from 6 students and weighted them based upon the size of the groups each student represented. The estimator of student interest was: SI = [411 +412+213+214+15+I6]/15 Assume for this analysis that there is NO difference between groups of students, and interest scores for a randomly selected student is a number between 1 and 5, with an unknown mean 113 = 3.2, and a standard deviation 0'51 of 0.70. [Note: 4+4+2+2+1 +1 = 14 < 15.] (a) What is the mean square error of the estimator SI for the mean “51? (b) What are advantages and disadvantages of maximum likelihood estimators relative to method-of—moments estimators? 4. (6 points) The College of Engineering conducts end-of—semester surveys in all of its courses each semester. The survey forms are distributed in classes by the instructor on a day the instructor chooses, either at the beginning or the end of class, and then are collected by a student selected by the professor. For what reasons might this procedure produce a biased description of student satisfaction with courses and instructors? [Can you give 3 good ones?] CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam Friday, December 19, 1997 5. (16 points) This year Boy Scouts sold Holiday Season wreaths at Pyramid Mall. Sales were slow: the scouts sold about 1 wreath every 20 minutes, on average. a) Why might wreath sales during the afternoon be well described by a Poisson process? b) Using a Poisson process model, what are the mean and variance of the number of sales in a 2 hour period? c) What is the probability of selling 5 or more wreaths in a 1-hour shift? d) On the last day the scouts have only 4 wreaths left. What is the mean and variance of the time it will take to sell all four? What parametric family of distributions describes that variable? 6. (15 points) Andy Goodwin and Prof. Stedinger went over the TA evaluations Andy received this semester. On question #18 on the 1st evaluation Andy obtained an average score of 4.66 (an estimator of ul) with a sample size of 26. On the 2nd evaluation Andy received an average score of 4.48 on the same question with a sample size of 30. In both cases the sample standard deviation was about 0.34. (a) Construct a 90% confidence interval for the true mean difference between the two scores, pl — uz. (b) What is the prob the actual value of ul — uz is contained in the interval whose end points you calculated above? (c) Test the hypothesis that there was no change in Andy’ 5 score on this question with an on = 5% using the data given. What p-value do you obtain? (d) Is this case, wherein individual scores have values {1, 2, 3, 4, 5} and the sample sizes are both about 30, a situation where a two-sample t test is justified, or should one use a nonparametric test? Explain WHY? 7. (8 points) The Ithaca Journal reported on 12/16/97 that quake tremors in the Mammoth Lakes region of California increased the concentrations of arsenic in water supplies from that area. A hydrologist wishes to test that hypothesis for a small stream in the region. He has 5 pre-quake water quality samples which have concentrations 4.1, 14.3, 5.6, 0.7, and 2.4. He obtained 3 post-quake observations spaced over two weeks. The values were 33.2, 12.8, 19.8 CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam Friday, December 19, 1997 For an appropriate non-parametric test — (a) What are the appropriate null and alternative hypotheses? (b) What is the rejection region for a test with a type I error of 5%. (c) What is the p-value for this data set? What do you conclude? 8 . (16 points) After two structural engineering firms merged, an internal review obtained the two firms’ bids on 8 recent projects. Is there any significant difference between the firms’ estimated costs? Project Firm W Firm V M 1 177.0 131.0 46.0 2 ' 2,230.0 1,400.0 830.0 3 129.0 78.0 51.0 4 11.2 11.9 -0.7 5 338.0 298.0 40.0 6 266.0 170.0 96.0 7 612.0 656.0 -44.0 8 408.0 282.0 126.0 Average 521.4 378.4 143.0 Stand. Dev. 714.4 457.6 282.5 (a) What are the appropriate null and alternative hypotheses? (b) For the sign test, what p-value do you obtain with this data? What is the rejection region for a sign test with a 10% type I error? NOTE: IF K ~ Bin(p=0.5. n=8) then Pr[K=0,1,2,3,4,5...]= 0.004; 0.031; 0.109; 0.219; 0.273; 0.219; 0.109 '(c) For the Wilcoxon Signed-Rank test, what p-value do you obtain with this data? What is rejection region for Wilcoxon Signed-Rank test with a = 10%? CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam Friday, December 19, 1997 9. (12 points) Using the data from problem #8: (a) What is the rejection region for a t test with g. = 5%? What p-value do you obtain with this data? (b) How big would the difference between the means need to be to have a type II error < 10% were CD for the differences Di = Wi-Vi equals 300? (c) How large a sample is needed to make the type II error < 10% when CD = 300 and uw — uV = 120? 10. (15 points) Consider the 12 observations 274, 437, 404, 1683, 561, 305, 39, 155, 166, 71, 149, 72 which are the number of pages in the first twelve books on the bookshelf next to my computer. [Websters New World Dictionary (2nd ed.) 1980 has 1683 pages; The Elements of Ster has 71 pages. CRC Standard Mathematics Tables had 561 pages. The mean was 154 pages and the standard deviation was 96.] Might the number of pages in my books follow a normal distribution? (a) Plot these data on attached probability paper. (b) From the plot what do you think about the normality of book lengths? WHY? (c) For the probability-plot correlation test of normality, specify the approximate rejection region for an on = 5% test. (d) Based on the kinds of books by my computer, what might be a reasonable probability distribution to describe their length? WHY? (e) The r for the probability plot correlation test of the data with their nscores was 0.797. The r for the probability plot correlation test of the logarithms of the data with their nscores was 0986. What would you conclude from those two results? CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam Friday, December 19, 1997 11. (32 pts) Due to global climate change, parts of the world may receive less rainfall. An environmental engineer is interested in predicting the impact on production of decreased rainfall. Consider the data, production, P: 214 415 170 220 k] seaonal rainfall, R: 23 42 17 26 inches n = 21 R =32.0 inches P = 276 k] SR = 7.1 inches sP = 74.0 k] 2(Ri- 1‘2 )(Pi- 13) = 5779 (k]*inches) a) What are the least-squares estimates of on and B for the model P = 0t + B R + 8 where 8 is the model error? b) Compute an unbiased estimate of the variance of the model's residual error. What fraction of the observed variability in P does the model explain? c) What is the standard deviation of b, the estimator of B? d) What is the rejection region for a 5% test of whether or not B = 0? Why have you selected either a one-sided or two-sided test? What is the p—value for this case? e) Using the values of the parameter that you obtained, what is a 90% prediction interval for P at a site with an annual rainfall rate of 35 inches? f) Prof. Stedinger talked with engineers at the US Bureau of Reclamation about seasonal snowmelt runoff prediction models they developed with multiple regression. They used seasonal precipitation and measured snowpack-water- equivalent depths at some 25-35 sites in their regression model, and searched for the set of sites that effectively maximized the R2 value they obtained. What is good or bad about that procedure? Your exams should be over. I hope you enjoy the holiday season. Drive safely. Jery Stetfirger CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING Solutions Final Exam 1997 1. Answers vary. A tall column or a long cable depends upon the strength of MANY segments, and the overall strength is that of WEAKEST (minumum value) of all such element. 5 pts. 2. Gumbel distribution has 02 = 1.645/oc2 which yields a = 0.0045 2 pts then u = u + 0.5772/oc yields u = 3862 2 pts Hence x002 = u - 1n[-ln(0.98)]/a = 841.5 3 pts Gumbel is not a good choice. Lower tail goes off to infinity, when strengths must be bounded below by zero. Gumbel is limiting distribution for LARGEST from a sample with a distribution UNBOUNDED above; strength often determined by WEAKEST (smallest) link in chain. Unnatural choice for this situation. 3 pts 3. (a) E[SI] = (4+4+2+2+1+1)uSI/15 = (14/15)p51. 3 pts Bias = -H51/ 15 1 pt Var[SI] = (16+16+4+4+1+1)0512/196 = 420512/225 = 0.187 0512 4 pts MSE = (1/ 15)2u512 + Var = (psi/15)2 +187 6512 = 0.137 3 pts (b) In large samples MLEs should be more efficient and have strong theoretical motivation. They work wellwith discrete data. Unfortuntately MLEs are often more difficult to compute than method of moment estimators. 4 pts 4. (i) faculty pick the day and whether questionnaire distributed at beginning or end of class --they are likely to pick a favorable time day, and time. Perhaps after good lecture, or when attendence is lower. (ii) Only students who attend class have opportunity to respond: undercoverage! Students who are really unhappy with class may not be attending. (iii) Actually filling out questionnaire is voluntary — students can just leave, particularly if little time is alloted for the valuation by the professor. Therefore students with strong feelings more likely to respond: voluntary response or nonresponse bias. 6 pts 5 (a) Happen INDEPENDENTLY over time as different people arrive, rate is constant over the period, and two do not happen at the same instant -- assumptions of a Poisson process. 3 pts (b) 7» = 3. For Poisson distrribution with v = kt = 6. E[K] = Var[K] = v = it = 6. 4 pts (c) Pr[ K 2 5 Iv =7tt=3] = 1 - Pr[ K s 4 Iv =3] = 1- 0.815 (using table) = 0.185 4 pts (d) Time to the 4th arrival has a Gamma distribution (with a = 4; B = K) 1 pts E[T] = (1/13 = 4/3 hours; VarlT] = (1/32 = 4/ 9 hour2 [note units] 4 pts CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING Solutions Final Exam 1997 6. (a) CI is given by :21 — i2 i z0.05*0.34*sqrt(1 / 26 + 1/30) = 0.030 - 0.33 where z0.05 = 1.645. 5 pts (b) The true difference in the means either is or is not in this interval -- we do not know which: p = 0 or 1. 2 pts (c) Compute test statistic: Z = [ i1 - i2 ] / [0.34*sqrt(1 / 26 + 1 / 30)] . For a two-sided test, reject Ho of no difference if IZI > 1.960 = 20025 for a = 5%. Observe z = 1.98 so observe a p—value of 2*(0.025) = 5%! 5 pts (d) The range of test scores is constrained to a relatively narrow region, and the observations are scattered across that range. Moreover the sample sizes are reasonable for the central limit theorem to apply. Z test is very reasonable here! On the exam almost everyone used a t test with 54 degrees of freedom. There is not justificiation I can see for use of t over Z: the original data are nothing like normal, which is what must be assumed to justify the use of the t distribution with degrees of freedom n1+n2-2. A nonparametric Wilcoxon test would NOT work well with all of the ties: one can only obtain the integers 1-2-3-4-5; how do you assign distinct ranks to a setthatlookslikel 1 1 1 1 2222 33333 ...444 5555. A "large-sample” Z test seems to be the way to go here. 3 pts 7. (a) Ho: FX = FY where X are pre-quake data, and Y are the post-quake data. Ha: FX stochastically less than Fy. Alternatively Ho: medianx = mediany versus Ha: medianx < mediany 2 pts (b) Let W = ranks of smaller sample (post-quake data), where large observations have large ranks. Reject Ho if W 2 20 (from Table A.10). 3 pts (c) Observe w = 20 -> p-value = 3.6% 3 pts 8. (a) H0: Fw = FV versus Ha: FW at FV. Alternatively Ho: medw = medV versus Ha: medwat medV 2 pts (b) For sign test observe 2 signs. K ~ Bin(p=0.5,n=8) => Pr[K s 2] = 0.144. Hence p-value 2-sided test = 2*(0.004 + 0.031+0.109) = 2*(0.144) = 29%. 4 pts Rejection region for 10% test would be S = 0, 1 or 7, 8. 3 pts (c) For signed-rank test, sum ranks of negative values when smallest value has rank 1 yields 8_ = 1+3=4 —> S+ = 8(9/2)—4 = 32. P-value = 2*(0.027) = 5.4% 4 pts Rejection region a = 10% is essentially [need to interpolate in table A.9.] if 5+ or S_ greater than 30 [two sided test]. (Actual a = 11%; conservative choice is 31) 3 pts 9. (a) Compute T = sqrt(n)*(d-bar)/sd where Di = Wi - Vi are the differences. For 2-sided test, a = 10%, reject Ho: no difference for IT | > toms-,17 = 2.365. 3 pts CEE 304 - UNCERTAINTY ANALYSIS IN ENGINEERING Solutions Final Exam 1997 Obtain t = sqrt(8)*143/ 282 = 1.43 -> p-value 2*(0.10) = 20%. 3 pts (b) From Table A.13 for a two-sided test with a = 5%, B = 10% requires d ~ 1.5, hence need a mean difference of 1.5(300) ~ i450. It can be plus or minus for a two-sided test. 3 pts (c) d = 120/300 = 0.4 hence need to have df = 74 —> n = 75. 3 pts In this data set, bids for different jobs are orders of magnitude different. Hence the differences do not capture the real relative difference in the bids. The t test does terrible because it is confused by the one or two very large differences. The Wilcoxon test does better, but still fails to recognize that a differnce of 51 in numbers of about 100 is more serious than an error of 44 in numbers that average 630, even though 44 and 50 are about the same size. 10. A probability plot can be obtained by plotting x6) as a function of (I)'l(pi), or equivalently using normal probability paper and plotting X“) as a function of pi. MINITAB and DataDesk can do it; I used EXCEL, even though plot is not as nice. Rank pi = (i-3/8)/(n+1/4) (D‘l(pi) x“) 1 0.051 -1.64 39 2 0.133 -1.11 71 3 0.214 -0.79 72 4 0.296 -0.54 149 5 0.378 -0.31 155 6 0.459 -0.10 166 7 0.541 0.10 274 8 0.622 0.31 305 9 0.704 0.54 404 10 0.786 0.79 437 11 0.867 1.11 561 12 0.949 1.64 1683 3 pts Prob Plot of Data 2000 1500 1000 500 0 -2.00 —1.00 0.00 1.00 2.00 Nscores G!) s: o .5 as > I- o «n .n O 3 pts (b) Does not look very straight -—> likely not normal. 3 pts CEE 304 - UNCEFITAINTY ANALYSIS IN ENGINEERING Solutions Final Exam 1997 (c) For n = 10 use 3 0.92 and for n = 15 use 5 0.94, hence s 0.93 about right 3 pts (d) Should be non-negative and positively skewed. Something like Weibull or Gamma. The tail should not be too thick. Books do not get much bigger than twice my big dictionary. Maybe should truncate the distribution above some threshold? 3 pts (e) Clearly a normal distribution does not fit. Result is very significant p-value < 1%. The lognormal fit is good. 3 pts 11. REGRESSION !!!! a) The least-squares estimates of on and B are a = 92.56 and b =5.73. 3+3 pts b) The unbiased estimate of the variance of s is se2 = 4020 = (63.4)2. 3 pts R2 = fraction of variability in y explained = 0.303 = r2 where r = E ( - R )( Pi- 13 )/[(n-1)sxsy] 3 pts c) Var(b) = se2 / [(n-1)sx2] = 4.00 3 pts d) Two-sided test, reject if ITI > t0'025,19 = 2.093 2 pts From Var(b) = 4.00 = (2.00)2; t = 5.73/2: 2.871 With df = 21-2=19. p—value about 2(0.005) = 1%. 3 pts Choose a two-sided test because I am not willing to speculate whether or not more rain will increase production. Actually it would be very resonable to argue that (to a point), more rain should increase production; if you make that case then a one-sided test is the proper choice. 2 pts e) Using your fitted line, the natural estimate of the mean value of P for R = 35 is a + bR = 293. A 90 percent prediction interval for a future observation is — 2 a+bxit0.05,195 / l+%—+#)—— ECG-EV i=1 which for n = 21, x = 35 and t0.05,19 = 1.729 yields 293 i t0.05,19 65.2 which is 181 to 406 (WHOW!) . 5 pts f) BAD idea. Adding any variable increases R2 a little. It is only reasonable to add variables that make the model a better predictor. Variables appear to increase prediction accuracy if adjusted-R2 increases because of a decrease in se. It is also reasonable to require that new variables have coefficients statistically different from zero (or another default value): this is more demanding. Either of these last two choices can be defended statistically. More advanced analyses indicate the best choice is probably inbetween the two. 5 pts ...
View Full Document

This note was uploaded on 02/02/2008 for the course CEE 3040 taught by Professor Stedinger during the Fall '08 term at Cornell University (Engineering School).

Page1 / 9

1997_Final_Exam_Solutions - CEE 304 - UNCERTAINTY ANALYSIS...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online