This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
** 2001 Final Exam **
Friday, December 21, 2001 Exam is open notes and openbook.
It lasts 150 minutes and there are 150 points.
SHOW WORK! 1. (15 points) Domestic security is now an issue of national attention. ( More to
come in CE E 597 Risk Analysis.) A water authority staff is going through its
records of requests for detailed information about the structural design and . operations. Requests have come from within the US and abroad, from utilities
and consultants. Might these requests be from terrorists hoping to discover
where the system is vulnerable? Over the past year the utility has received 2
requests per week (2 per 5 working days). a) Using a Poisson process model, what are the mean and variance of the
number of requests in a month (with 21 working days)? b) What is the probability that there are two or more requests in the first
week of February (5 working days)? c) If the utility decided to examine the most recent 30 requests, what is the
mean and standard deviation of how far back (in working days) that they would
need to go to find 30 requests? d) Is a Poisson process likely to be a reasonable model of the arrival of
requests for detailed system information ? 2. (6 points) If links in a chain have a Weibull distribution with parameter k = 4, and the mean strength of chains of length 50 m is 5,000 lbs with standard
deviation 1400 lbs, what then is the mean and standard deviation of the strength
of chains of length 800 m? Justify your answer. 3. After the class addressing sampling, Prof. Stedinger was asked how Gannett
Health Services conducted their surveys of campus drinking. I found out. The
last major crosscampus survey of undergraduate drinking patterns was the 2000
Fall Core Alcohol and Drug Survey. It proceeded as follows. The undergraduate population is known to be 51% male and 49% female.
The Univeristy Registrar was asked to randomly select the names of 965 males
and 735 females corresponding to 1700 registered undergradutes. They were all
mailed surveys. Some 651 valid responses to the drinking question were
received. [Some surveys were returned by the Post Office (144), many surveys
were not returned (835), and some were returned with invalid responses (63 w/ 0
sex reported, 7 no answer or multiple answers).] The valid response rate was
38%. Here are those numbers and the response to the drinking question: CEE 304  UNCERTAIN TY ANALYSIS IN ENGINEERING
** 2001 Final Exam **
Friday, December 21, 2001  Survey Summary 
Male Eemalc Intel Emctichalg Surveys sent 965 735 1700 57% Valid responses 332 319 651 51% Response rate' 34% 43% 38%  What is the average number of drinks you consume per week?
mean 7.578 3.347 5.632
st. dev 11.760 4.982 9.23 The variance for the men is very large because some 4% report averaging more
than 40 drinks/ week. No women reported averaging more than 28 drinks/ week.
[More men were sampled because their response rate is generally lower.] a) (8 points) For men and women the frequency distribution of the responses was
[note the clumps at 0, 8, 10, 12 and 15 (for men)]: o
in
=
e
a.
in
0 g I! 1o 11 12 13 14 15 16 17 18 >19
5Average 8# Drinks per Week If you wanted to construct a compact and smoothed represention of these
responses (perhaps to estimate Pr[ D > 15 ]) using a nice analytical distribution,
what would be an appropriate choice and WHY that one and not an alternative? b) (7 points) Use this data to construct a 95% confidence for the true value of the
average number of drinks a women undergraduates at Cornell consumes per
week. What is the probability the true mean of the average number of drinks
consumed by Cornell women is the interval you just computed? c) (7 points) What is the rejection region of a t test of whether men and women
drink at the same rate? What pvalue do you obtain? CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING ** 2001 Final Exam **
Friday, December 21, 2001 [Note that women on average have less body weight and thus need to drink less to obtain the same
dose; and due to physiological difference women experience more of an effect from the same dose]
d) (3 points) Is the test in (c) likely to be valid with this data. [Note, 5 of 332 men reported averages in excess of 60 drinks/ week] e) (10 points) The overall mean of the average number of drinks/ week was
obtained by pooling the data sets for men and women to obtain one large sample.
i) Is this the best estimator of that average that could be constructed?
ii) Please comment on the sampling strategy they employed 
Are there potentially any important problems? Is bias likely to be a concern?
What would you recommend they do differently? 4. (20 points) A coed living unit was suprized by the differences in men’s and
women’s drinking habits as described by the University survey. They surveyed
their members and obtained avg std.dev
Men: 0, 0.7, 2, 6, 8, 10, 50 > 10.93 17.63
Women: 0, 0, 0.5, 1, 3, 5 > 1.58 2.01 For an appropriate nonparametric test: (This data sets has several ties, so in your computation, assign all 3 zero responses their average rank equal to 2)
(a) What are the appropriate null and alternative hypotheses?
(b) What is the rejection region for a test with a type I error of 5%.
(c) What is the pvalue for this data set? What do you conclude? (d) What value of t do you obtain for a two—sample t test. What is the pvalue? (e) If challenged, could you, and if you could, how would you justify use of a
nonparametric test instead of a t test with this data? 5. (20 points) Prof. Brutasert uses satellite instrumentation to measure average
groundsurface temperature. A new sensor should provide smaller errors in
comparison with groundbased observations. The measurement errors [here
Oldi or Newi = (Measuredi  Truei) in °C ] obtained with the Old and the New
instrument are contained in the table below. Consider whether the absolute
value of the measurement errors with the new instrument are indeed smaller. CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING ** 2001 Final Exam **
Friday, December 21, 2001 Actual Error Absolute Error D = difference Irial .0151 Nm .LQldJ. 11mm 19mm
1 0.42 0.37 ‘ 0.42 0.37 0.06 2 0.01 0.02 0.01 0.02 — 0.01 3 — 0.64  0.40 0.64 0.40 0.24 4  0.60 — 0.42 0.60 0.42 0.18 5 0.00 0.10 0.00 0.10 — 0.10 6 4.10 2.14 4.10 2.14 1.96 7 0.32 0.10 0.32 0.10 0.21 8  0.01 — 0.07 0.01 0.07 —— 0.06 9  1.69 — 1.06 1.69 1.06 0.64 10 — 2.68  1.51 2.68 1.51 1.17
Average — 0.076 — 0.071 1.048 0.618 0.430 St. Dev. 1.759 0.967 1.372 0.718 0.661 (a) What are the appropriate null and alternative hypotheses? (b) What is the rejection region for a sign test with a type I error of about 5% ?
What pvalue do you obtain with this data?
If the probability of a negative sign is really only 0.3, what would the type 11
error be for your 0L=5% test? [For n = 10, Devore has binomial tables. ] (c) What is rejection region for appropriate nonparametric test with or = 5%?
What pvalue do you obtain with this data? ((1) What is the rejection region for a t test with on = 5%?
What pvalue do you obtain with this data? (e) How big would the mean “D = E{ lOldi I — lNewi l} with the t test
need to be to have a type H error < 10% were 6D for the differences equal to 0.7 ? (f) How large a sample is needed to ensure B < 5% with the t test
when 6D is 0.7 and the mean difference “D = 0.4 ? 6. (14 points) In the design of a pier, the structural engineer needs to determine
the character of earth surrounding the pilings and its ability to support loads.
Several samples were collected and the following resistances measured: CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
** 2001 Final Exam **
Friday, December 21, 2001 45, 75, 27, 45, 34, 43, 44, 35, 88, 40, 48, 65, 54, 56, 42 (a) Plot these data on the attached probability paper.
(b For the probabilityplot correlation test of normality, _ specify the approximate rejection region for an or = 5% test.
(c) The r for the probability plot correlation test (correlation of data with their
nscores) was 0.948; r for the probability plot correlation test on the logarithms of
the observations was 0.982. What do you conclude from these results? Explain. 7. (35 pts) Prof. O’ Rourke is studying the effect of soil properties on the
earthquake damage experienced by civil infrastrure including pipelines and
buildings. Let Ei be a measure of likely earthquake intensity that reﬂects local soil properties, and Di a measure of the damage to residential houses in a quake.
Here are some possible data: Quake intensity, E: 30.0 5.1 117.1 122.2 12.1 174.4
Damage, Di: 12.7 4.6 38.2 25.1 14.5 37.3
15 = 17.72 n=24 E =67.71
SD = 14.28 sE = 57.50 2(Di i5 )(Ei E) = 11,400
a) Consider a linear model. What are the leastsquares estimates of 0c and Bfor
D = oc+l3 E + 8 where 8 is the model error? b) Compute an unbiased estimate of the variance of 8.
c) What fraction of the observed variability in D is explained by E?
d) What is the standard deviation of b, the estimator ofﬁ? ‘
e) What is the rejection region for a 5% test of whether [3:0?
Why have you selected either a onesided or twosided test?
How small an on could one use and still reject Ho with this data set? f) Using the values of the parameter that you obtained, what is a 90% prediction
interval for D at a site with an earthquake intensity of E = 120. 8. (5 pts) What fundamental problem does the collection of knowledge
and mathematical relationships called ”statistics” address ? I liope you enjoy tﬁe 50ny season. Drive safefy. . Jery Stetfiryer CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
Solutions Final Exam 2001 1. Poisson process. Compute 7» = 2 / 5 = 0.4 arrivals per day. 1 pts
a) For Poisson distribution, a = v = M = 21 (0.4) = 8.4 = Var(K) 3 pts
b) Pr[K22 Iv =5(0.4)=2] = 1—Pr[K=Oor1]=1e"’—ve“’ =0.593 3pts
c) Gamma distribution (not required). k = 30; 7» = 0.4. E(T30) = 30 / it = 75 days; Var(T30) =30/x2 = 187.5 daysz ..> SD = 13.7 days 4 pts
d) They arrive separately (ignore time of day when mail arrives at office),
independently in time for different groups, with constant rate. 4 pts 2. Weibull for Weakest Links. Lectures found that for chain of length n with
Weibul distributed links, resulting Weibull for chain has: u = v/ nl/k. (2 pts) We have k = 4, and effectively n = 800/50 = 16. Thus change in u is factor of 2 (2 pts).
Using E[Y]=uy=uI‘(1 +%);Var[Y] =oY2 = u2[r(1 +12;)—r2(1 +4113] Longer (weaker) chain has )1 = (5000)/ 2 = 2,500 lbs 8: o = (1400)/ 2 = 700 lbs. (2 pts)
3. Surveys, Conﬁdence Intervals Distributions. a) Averages are continuous so that rules out Bernouli values of 0 8: 1), binomial (no n), Poisson (variance = mean so a too small), Geometric (variance < mean so a too small). Plus they are discrete anyway. Gumbel and normal fail because need x 2 0. Lognormal has density function of zero at zero so fails to capture peak. Gamma with a < 1, or Weibull with k < 1 are reasonable. See which fits better if we can tell. (Weibull has thicker tail.) Neither reproduces the actual nonzero probability that some people never drink and thus have a true average of exactly zero. 8 pts
b) 2 = 1.96 so 95% CI = xbar i z s/ sqrt(319) = 2.80 to 3.89 drinks/ week. 5 pts This interval may or may not contain the true average; we do not know. 2 pts c) reject Ho if Z = l W l > 20.025 = 1.960 for a 2sided 5% test 2 pts
. / §§ + §i
nx ny
Obtain t = 6.02. Pvalue < 0.0004 (front cover). Not a lot of doubt here. 5 pts
d) Valid. Yes, there are big outliers. But when one averages 300 observations, the resulting
average IS normal, regardless of the original distribution of the data (provided a < 00). (2 pts)
That is the Central Limit Theorem. (1 pt)
e) They have not done the right thing. They have collected (appropriately) a stratiﬁed sample,
and should analyze it as such. Combing the two samples and pretending it is a Simple Random
Sample exaggerates the variance in the estimators of the mean. One SHOULD collect a stratiﬁed sample to take advantage of the differences between men
and women. Furthermore, because men have such a larger variance, one should sample many more
men than women and then weight the men’s and women’s averages by 0.51 and 0.49 (their
proportion in the student body). It may also be useful to stratify by class (Fr., Soph, Jun. Sen), or
student age (18, 19, ..., 21, 21+). Why not? It cost little to do so and increases precision. There is a tremendous opportunity for bias in this analysis. While the original sample from
the registrar is a Simple Random Sample, there is ﬁrst a nonresponse bias for surveys returned,
and then there is a voluntary response bias because only about half of the students return the
surveys. To solve this problem we need a survey approach that has a greater return: survey by
phone or offer a gift for returned surveys. There may also be an honesty problem: do drinkers brag, or CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING
Solutlons Final Exam 2001 exaggerate to skew the survey to justify their behavior? 10 pts.
4. TwoSample Tests a) Drinking: men versus women
H0: Fmen = Fwomen No difference 3 pts Ha: Fmen > Fwomen Want to check previous independent survey. b) Use WilcoxonMann—Whitey Rank Sum test W = sum ranks of women.
Expect them to have small ranks so need to convert the value given in the tables so can reject Ho if wwomen s 6(14)  54 = 30 3 pts
c) Observe W = 31. pvalue a little larger than 5%. Maybe 67% 2+2 pts
Accept Ho 1 pt
d) UseT = I (X‘Y)"(”"2’”) iFindt=1.40 2pts
4 / §§ + it
with df = 6.18 or 6.0. Pvalue = 11% (onesided) 2+2 pts e) Larger survey demonstrates that the averages are NOT normal. See the big values and spike in
histogram at zero. A smallsample t test is just not valid. Go with a nonparametric test.
Has good efﬁciency & avoids incorrect assumptions. 3 pts 5. Paired Data => OneSample Tests
a) Use test on differences D. Ho: median D = 0 (or H0: Fold = Fnew) Both mean no difference.
Ha: median D > 0 (expect Old  New > 0) 2 pts b) Sign test uses counts only.
H0: p(+) = p() = 0.5; Ha: p() < 0.5 From tables for (1 ~ 5%, reject if S s 2 => (1 = 5.5%. 2 pts
Observe S = 3, pvalues = Pr[ 5 s 3 ] = 17.2% 2 pts
WereS~Bin(10, p=0.3), ﬂ =pr{Sz3]= 1Pr[SsZ]=62% 2pts c) Use Wilcoxon Signed Rank Test on Sum ranks of the positive differences
where big values have large ranks. Reject Ho if 5+ 2 44 2 pts Find 5+ = 47. p—value = Pr[ 5_ 2 47] ~ 2.4% 2 pts
(1) For a t test, assume data is normal and consider H0: “D = 0; Ha: “D > 0; Reject if T > t0.05,9 = 1.833 2 pts t = sqrt(10) (0.43)/ 0.66 = 2.06 with df = 9. 1sided pvalue = 3.3% 2 pts
e) For a = 5%, 1sided t for df = 9, want 3 < 10%, need d = ( uaO )/o = 1.1 with o = 0.7 yields “a = 0.7(1.1) = 0.77 2 pts
f) For a = 5%, 1sided t to ensure [3 < 5% with d = 0.4/0.7 = 0.57 ~ 0.6
find from table A.17 that df = 34 for n = 35. 2 pts 6. a) (8 pts) Plotting x i) versus <I>'1(pi) yields a probability plot. Here is the data
i ii3/8)/ 15.25 Nscore x(i) log x(i) ,
1 0.041 1.74 27 1.44
2 0.107 1.25 34 1.54 CEE 304  UNCERTAINTY ANALYSIS IN ENGINEERING Solutlons Flnal Exam 2001 ..7 0.434 0.17
8 0.500 0.00
9 0.566 0.17
..13 0.828 0.95
14 0.893 1.25
15 0.959 1.74 44
45
45
65
75
88 Probability Plot Geotech Data Obseruaum 2.00 4.00 0.00
Nscore 1.00 1. 64
1. 65
1. 66
1.81
1.88
1.94 b) Table A.12 page 739 Devore5 indicates reject hypothesis of normality forrs0.9383 tom: 15 ata=5%. (2 pts) c) r — 0.948 > 0.9383 so cannot reject normality. However, r — 0.982 obtained with the the logarithms
of the data looks better. Data does curve smoothly upward 1n graph suggesting a more highly skewed distribution than normal. It would be reasonable to uses a lognormal with data, but cannot reject normality. 7. Regression (35 points  make sure you get to this one!) a) Least squares estimators: a = 7.567;
b) se2 = 135—  (0. 617)2; se = 11. 64 C) R2: 1_ (Residual sumof—squares)
(Total sumof—squares) = 1 (n—k)se2/ [(n1)sy2] =0.356 = r2 = (0.604)2 b = 0.150 d) Need St.Dev.(b) = se/sqrt((n—1)sx2) = 0.042 e) Reject Ho: B = 0 versus Ho: [5 >0 if t > t0.05, 22 = 1.717 Okay!
Observe t = 3.552. pvalue close to 0.001 (actually 0.0009) (4 pts) 8 pts
4 pts 4 pts
4 pts
2+1 pts
2+2 pts f) or E = 120 the mean value of D = a + bC = 25.6; a 90% prediction interval for a
future Dvalue is 4.82 to 46.32, using t0.05,22 = 1.717 (w/ SEprediction = 12.08) in a+bx:l: t0.05,22 Se 1+%+ (XX)2 i (Xii )2
i—l 8pts 8. Statistics addresses the selection, summary a—nd interpretation of observations to
infer characteristics of the real world. This requires descriptions of the precision / uncertainty of our estimators. What do we know and how well do we know it? Statistics includes inference which is an approach to decision making (science w/ data).
“Statistics is an effort to understand data and draw conclusions from it.” ...
View
Full Document
 Fall '08
 Stedinger

Click to edit the document details