This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
1997 Final Exam
Friday, December 19, 1997 Exam is open notes and openbook. It lasts 150 minutes and there are 150 points.
SHOW WORK! 1. (5 points) Give an example of a realistic physical situation and a corresponding
random variable (of interest to CEE’s or ABEN’s; but not a chain, ﬂoods, wind or
rain) that should have a WEIBULL distribution because of the way the variable’s value is physically determined. WHY? 2. (10 points) Ken Hover’s students love to cast and break concrete cylinders.
One year the average breaking strength for the class was 5,140 psi with a standard
deviation of 2,840 psi. Assume the strengths have a GUMBEL distribution with
those moments; what then is the strength that an engineer can be 98% sure that
another cylinder would exceed? [I promised a Gumbel problem] Conceptually, is a Gumbel distribution a reasonable model for breaking strengths? 3. (15 points) Kate and Sam were trying to use a stratified sampling design to
estimate the level of student interest in an ASCE student chapter project. They
took the interest scores from 6 students and weighted them based upon the size of the groups each student represented. The estimator of student interest was:
SI = [411 +412+213+214+15+I6]/15 Assume for this analysis that there is NO difference between groups of students,
and interest scores for a randomly selected student is a number between 1 and 5, with an unknown mean 113 = 3.2, and a standard deviation 0'51 of 0.70.
[Note: 4+4+2+2+1 +1 = 14 < 15.] (a) What is the mean square error of the estimator SI for the mean “51? (b) What are advantages and disadvantages of maximum likelihood
estimators relative to methodof—moments estimators? 4. (6 points) The College of Engineering conducts endof—semester surveys in all
of its courses each semester. The survey forms are distributed in classes by the
instructor on a day the instructor chooses, either at the beginning or the end of class, and then are collected by a student selected by the professor. For what
reasons might this procedure produce a biased description of student satisfaction with courses and instructors? [Can you give 3 good ones?] CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam
Friday, December 19, 1997 5. (16 points) This year Boy Scouts sold Holiday Season wreaths at Pyramid Mall. Sales were slow: the scouts sold about 1 wreath every 20 minutes, on average. a) Why might wreath sales during the afternoon be well described by a
Poisson process? b) Using a Poisson process model, what are the mean and variance of the
number of sales in a 2 hour period? c) What is the probability of selling 5 or more wreaths in a 1hour shift? d) On the last day the scouts have only 4 wreaths left. What is the mean and
variance of the time it will take to sell all four? What parametric family of
distributions describes that variable? 6. (15 points) Andy Goodwin and Prof. Stedinger went over the TA evaluations
Andy received this semester. On question #18 on the 1st evaluation Andy
obtained an average score of 4.66 (an estimator of ul) with a sample size of 26.
On the 2nd evaluation Andy received an average score of 4.48 on the same
question with a sample size of 30. In both cases the sample standard deviation
was about 0.34. (a) Construct a 90% confidence interval for the true mean difference between
the two scores, pl — uz. (b) What is the prob the actual value of ul — uz is contained in the interval
whose end points you calculated above? (c) Test the hypothesis that there was no change in Andy’ 5 score on this
question with an on = 5% using the data given. What pvalue do you obtain? (d) Is this case, wherein individual scores have values {1, 2, 3, 4, 5} and the
sample sizes are both about 30, a situation where a twosample t test is justified,
or should one use a nonparametric test? Explain WHY? 7. (8 points) The Ithaca Journal reported on 12/16/97 that quake tremors in the
Mammoth Lakes region of California increased the concentrations of arsenic in
water supplies from that area. A hydrologist wishes to test that hypothesis for a
small stream in the region. He has 5 prequake water quality samples which
have concentrations
4.1, 14.3, 5.6, 0.7, and 2.4.
He obtained 3 postquake observations spaced over two weeks. The values were
33.2, 12.8, 19.8 CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING 1997 Final Exam
Friday, December 19, 1997 For an appropriate nonparametric test —
(a) What are the appropriate null and alternative hypotheses?
(b) What is the rejection region for a test with a type I error of 5%.
(c) What is the pvalue for this data set? What do you conclude? 8 . (16 points) After two structural engineering firms merged, an internal review
obtained the two firms’ bids on 8 recent projects. Is there any significant
difference between the firms’ estimated costs? Project Firm W Firm V M
1 177.0 131.0 46.0
2 ' 2,230.0 1,400.0 830.0
3 129.0 78.0 51.0
4 11.2 11.9 0.7
5 338.0 298.0 40.0
6 266.0 170.0 96.0
7 612.0 656.0 44.0
8 408.0 282.0 126.0 Average 521.4 378.4 143.0 Stand. Dev. 714.4 457.6 282.5 (a) What are the appropriate null and alternative hypotheses? (b) For the sign test, what pvalue do you obtain with this data?
What is the rejection region for a sign test with a 10% type I error? NOTE: IF K ~ Bin(p=0.5. n=8) then
Pr[K=0,1,2,3,4,5...]= 0.004; 0.031; 0.109; 0.219; 0.273; 0.219; 0.109 '(c) For the Wilcoxon SignedRank test, what pvalue do you obtain with this data?
What is rejection region for Wilcoxon SignedRank test with a = 10%? CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
1997 Final Exam
Friday, December 19, 1997 9. (12 points) Using the data from problem #8: (a) What is the rejection region for a t test with g. = 5%?
What pvalue do you obtain with this data?
(b) How big would the difference between the means need to be to have a type II error < 10% were CD for the differences Di = WiVi equals 300?
(c) How large a sample is needed to make
the type II error < 10% when CD = 300 and uw — uV = 120? 10. (15 points) Consider the 12 observations
274, 437, 404, 1683, 561, 305, 39, 155, 166, 71, 149, 72 which are the number of pages in the first twelve books on the bookshelf next to
my computer. [Websters New World Dictionary (2nd ed.) 1980 has 1683 pages;
The Elements of Ster has 71 pages. CRC Standard Mathematics Tables had 561
pages. The mean was 154 pages and the standard deviation was 96.] Might the number of pages in my books follow a normal distribution?
(a) Plot these data on attached probability paper. (b) From the plot what do you think about the normality of book lengths? WHY? (c) For the probabilityplot correlation test of normality,
specify the approximate rejection region for an on = 5% test. (d) Based on the kinds of books by my computer, what might be
a reasonable probability distribution to describe their length? WHY? (e) The r for the probability plot correlation test of the data with their nscores was
0.797. The r for the probability plot correlation test of the logarithms of the data
with their nscores was 0986. What would you conclude from those two results? CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
1997 Final Exam
Friday, December 19, 1997 11. (32 pts) Due to global climate change, parts of the world may receive less
rainfall. An environmental engineer is interested in predicting the impact on
production of decreased rainfall. Consider the data, production, P: 214 415 170 220 k]
seaonal rainfall, R: 23 42 17 26 inches
n = 21
R =32.0 inches P = 276 k]
SR = 7.1 inches sP = 74.0 k] 2(Ri 1‘2 )(Pi 13) = 5779 (k]*inches) a) What are the leastsquares estimates of on and B for the model
P = 0t + B R + 8 where 8 is the model error? b) Compute an unbiased estimate of the variance of the model's residual error.
What fraction of the observed variability in P does the model explain? c) What is the standard deviation of b, the estimator of B? d) What is the rejection region for a 5% test of whether or not B = 0?
Why have you selected either a onesided or twosided test?
What is the p—value for this case? e) Using the values of the parameter that you obtained, what is a 90% prediction
interval for P at a site with an annual rainfall rate of 35 inches? f) Prof. Stedinger talked with engineers at the US Bureau of Reclamation about
seasonal snowmelt runoff prediction models they developed with multiple
regression. They used seasonal precipitation and measured snowpackwater
equivalent depths at some 2535 sites in their regression model, and searched for the set of sites that effectively maximized the R2 value they obtained. What is
good or bad about that procedure? Your exams should be over. I hope you enjoy the holiday season.
Drive safely.
Jery Stetfirger CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
Solutions Final Exam 1997 1. Answers vary. A tall column or a long cable depends upon the strength of
MANY segments, and the overall strength is that of WEAKEST (minumum value) of all such element. 5 pts.
2. Gumbel distribution has 02 = 1.645/oc2 which yields a = 0.0045 2 pts
then u = u + 0.5772/oc yields u = 3862 2 pts
Hence x002 = u  1n[ln(0.98)]/a = 841.5 3 pts Gumbel is not a good choice. Lower tail goes off to infinity, when strengths must
be bounded below by zero. Gumbel is limiting distribution for LARGEST from a
sample with a distribution UNBOUNDED above; strength often determined by
WEAKEST (smallest) link in chain. Unnatural choice for this situation. 3 pts 3. (a) E[SI] = (4+4+2+2+1+1)uSI/15 = (14/15)p51. 3 pts
Bias = H51/ 15 1 pt Var[SI] = (16+16+4+4+1+1)0512/196 = 420512/225 = 0.187 0512 4 pts
MSE = (1/ 15)2u512 + Var = (psi/15)2 +187 6512 = 0.137 3 pts (b) In large samples MLEs should be more efficient and have strong theoretical
motivation. They work wellwith discrete data. Unfortuntately MLEs are often
more difficult to compute than method of moment estimators. 4 pts 4. (i) faculty pick the day and whether questionnaire distributed at beginning or
end of class they are likely to pick a favorable time day, and time. Perhaps after
good lecture, or when attendence is lower. (ii) Only students who attend class
have opportunity to respond: undercoverage! Students who are really unhappy
with class may not be attending. (iii) Actually filling out questionnaire is
voluntary — students can just leave, particularly if little time is alloted for the
valuation by the professor. Therefore students with strong feelings more likely
to respond: voluntary response or nonresponse bias. 6 pts 5 (a) Happen INDEPENDENTLY over time as different people arrive, rate is
constant over the period, and two do not happen at the same instant  assumptions of a Poisson process. 3 pts
(b) 7» = 3. For Poisson distrribution with v = kt = 6.
E[K] = Var[K] = v = it = 6. 4 pts (c) Pr[ K 2 5 Iv =7tt=3] = 1  Pr[ K s 4 Iv =3] = 1 0.815 (using table) = 0.185 4 pts
(d) Time to the 4th arrival has a Gamma distribution (with a = 4; B = K) 1 pts E[T] = (1/13 = 4/3 hours; VarlT] = (1/32 = 4/ 9 hour2 [note units] 4 pts CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
Solutions Final Exam 1997 6. (a) CI is given by :21 — i2 i z0.05*0.34*sqrt(1 / 26 + 1/30) = 0.030  0.33 where z0.05 = 1.645. 5 pts
(b) The true difference in the means either is or is not in this interval
 we do not know which: p = 0 or 1. 2 pts (c) Compute test statistic: Z = [ i1  i2 ] / [0.34*sqrt(1 / 26 + 1 / 30)] . For a twosided test, reject Ho of no difference if IZI > 1.960 = 20025 for a = 5%. Observe z = 1.98 so observe a p—value of 2*(0.025) = 5%! 5 pts
(d) The range of test scores is constrained to a relatively narrow region, and the
observations are scattered across that range. Moreover the sample sizes are
reasonable for the central limit theorem to apply. Z test is very reasonable here! On the exam almost everyone used a t test with 54 degrees of freedom.
There is not justificiation I can see for use of t over Z: the original data are
nothing like normal, which is what must be assumed to justify the use of the t
distribution with degrees of freedom n1+n22. A nonparametric Wilcoxon test would NOT work well with all of the ties: one can only obtain the integers 12345; how do you assign distinct ranks to a
setthatlookslikel 1 1 1 1 2222 33333 ...444 5555.
A "largesample” Z test seems to be the way to go here. 3 pts 7. (a) Ho: FX = FY where X are prequake data, and Y are the postquake data.
Ha: FX stochastically less than Fy.
Alternatively Ho: medianx = mediany versus Ha: medianx < mediany 2 pts (b) Let W = ranks of smaller sample (postquake data), where large
observations have large ranks. Reject Ho if W 2 20 (from Table A.10). 3 pts (c) Observe w = 20 > pvalue = 3.6% 3 pts 8. (a) H0: Fw = FV versus Ha: FW at FV.
Alternatively Ho: medw = medV versus Ha: medwat medV 2 pts (b) For sign test observe 2 signs. K ~ Bin(p=0.5,n=8) => Pr[K s 2] = 0.144. Hence pvalue 2sided test = 2*(0.004 + 0.031+0.109) = 2*(0.144) = 29%. 4 pts
Rejection region for 10% test would be S = 0, 1 or 7, 8. 3 pts
(c) For signedrank test, sum ranks of negative values when smallest value has
rank 1 yields 8_ = 1+3=4 —> S+ = 8(9/2)—4 = 32. Pvalue = 2*(0.027) = 5.4% 4 pts Rejection region a = 10% is essentially [need to interpolate in table A.9.]
if 5+ or S_ greater than 30 [two sided test]. (Actual a = 11%; conservative choice is 31) 3 pts 9. (a) Compute T = sqrt(n)*(dbar)/sd where Di = Wi  Vi are the differences.
For 2sided test, a = 10%, reject Ho: no difference for IT  > toms,17 = 2.365. 3 pts CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
Solutions Final Exam 1997 Obtain t = sqrt(8)*143/ 282 = 1.43 > pvalue 2*(0.10) = 20%. 3 pts (b) From Table A.13 for a twosided test with a = 5%, B = 10%
requires d ~ 1.5, hence need a mean difference of 1.5(300) ~ i450. It can be plus or minus for a twosided test. 3 pts
(c) d = 120/300 = 0.4 hence need to have df = 74 —> n = 75. 3 pts In this data set, bids for different jobs are orders of magnitude different. Hence the differences do not capture the real relative difference in the bids. The
t test does terrible because it is confused by the one or two very large differences.
The Wilcoxon test does better, but still fails to recognize that a differnce of 51 in
numbers of about 100 is more serious than an error of 44 in numbers that
average 630, even though 44 and 50 are about the same size. 10. A probability plot can be obtained by plotting x6) as a function of (I)'l(pi), or
equivalently using normal probability paper and plotting X“) as a function of pi. MINITAB and DataDesk can do it; I used EXCEL, even though plot is not as nice. Rank pi = (i3/8)/(n+1/4) (D‘l(pi) x“)
1 0.051 1.64 39
2 0.133 1.11 71 3 0.214 0.79 72
4 0.296 0.54 149
5 0.378 0.31 155
6 0.459 0.10 166
7 0.541 0.10 274
8 0.622 0.31 305
9 0.704 0.54 404
10 0.786 0.79 437
11 0.867 1.11 561
12 0.949 1.64 1683 3 pts Prob Plot of Data 2000
1500
1000 500 0
2.00 —1.00 0.00 1.00 2.00 Nscores G!)
s:
o
.5
as
>
I
o
«n
.n
O 3 pts
(b) Does not look very straight —> likely not normal. 3 pts CEE 304  UNCEFITAINTY ANALYSIS IN ENGINEERING
Solutions Final Exam 1997 (c) For n = 10 use 3 0.92 and for n = 15 use 5 0.94, hence s 0.93 about right 3 pts
(d) Should be nonnegative and positively skewed. Something like Weibull or Gamma. The tail should not be too thick. Books do not get much
bigger than twice my big dictionary. Maybe should truncate the distribution above some threshold? 3 pts
(e) Clearly a normal distribution does not fit. Result is very significant
pvalue < 1%. The lognormal fit is good. 3 pts 11. REGRESSION !!!! a) The leastsquares estimates of on and B are a = 92.56 and b =5.73. 3+3 pts
b) The unbiased estimate of the variance of s is se2 = 4020 = (63.4)2. 3 pts
R2 = fraction of variability in y explained = 0.303 = r2
where r = E (  R )( Pi 13 )/[(n1)sxsy] 3 pts
c) Var(b) = se2 / [(n1)sx2] = 4.00 3 pts
d) Twosided test, reject if ITI > t0'025,19 = 2.093 2 pts
From Var(b) = 4.00 = (2.00)2; t = 5.73/2: 2.871 With df = 212=19.
p—value about 2(0.005) = 1%. 3 pts Choose a twosided test because I am not willing to speculate whether or not
more rain will increase production. Actually it would be very resonable to
argue that (to a point), more rain should increase production; if you make that
case then a onesided test is the proper choice. 2 pts e) Using your fitted line, the natural estimate of the mean value of P for
R = 35 is a + bR = 293. A 90 percent prediction interval for a future observation is — 2
a+bxit0.05,195 / l+%—+#)——
ECGEV i=1 which for n = 21, x = 35 and t0.05,19 = 1.729 yields 293 i t0.05,19 65.2 which is
181 to 406 (WHOW!) . 5 pts f) BAD idea. Adding any variable increases R2 a little. It is only reasonable to
add variables that make the model a better predictor. Variables appear to increase prediction accuracy if adjustedR2 increases because of a decrease in se. It is also reasonable to require that new variables have coefficients statistically
different from zero (or another default value): this is more demanding. Either
of these last two choices can be defended statistically. More advanced analyses
indicate the best choice is probably inbetween the two. 5 pts ...
View
Full
Document
This note was uploaded on 02/02/2008 for the course CEE 3040 taught by Professor Stedinger during the Fall '08 term at Cornell University (Engineering School).
 Fall '08
 Stedinger

Click to edit the document details