Biostatistics 100B
Homework Solutions 4
February 5th, 2007
Solutions To Homework Assignment 4
Warmup Problems
(1) Interval Basics:
A
confidence interval
gives you a range of values which you are (reasonably) sure
includes the AVERAGE value of Y associated with a given X. In other words, if you found many data
points with that X, then the average value of the corresponding Y’s would lie in the interval. A
prediction
interval
gives you a range of values which you are (reasonably) certain includes the value of Y associated
with a SINGLE data point or the NEXT data point at a particular value of X. To emphasize: you use a CI
if you are interested in the AVERAGE Y for a given value of X and you use a PI is you are interested in
a SINGLE or PARTICULAR Y for a given value of X. As an example, suppose I want to predict peoples’
heights (Y) based on their ages (X). If I want to predict the AVERAGE height of all 10 year olds I use a
confidence interval for Y when X=10. However, suppose my cousin has just had a baby girl named Susan
and I want to predict what Susan’s height will be when she is 10. Then I want a prediction interval because
I am trying to predict the height of a SINGLE 10 year old. The prediction interval will always be wider
than the confidence interval because it is harder to predict for a single person than for lots of people. In the
example above, think of it this way. Any single child could be really tall or short making it hard to guess in
advance. However, if I want the average for all children, some will be tall, some short and these will balance
each other out in the average. Similarly, It is much easier for me to guess what the average score for the class
will be on a midterm than it is for me to guess what any individual student’s score will be. The formulas for
the CI and PI reflect this difference. Look at the formulas for the variances or standard deviations–the extra
1 in the PI formula makes it wider. Here are some additional comments that were not part of the problem
but may help your understanding. If they aren’t useful, just ignore them
....
I have written the formulas for
the variances for the CIs and PIs below because they are a little easier to explain. You of course take the
square root to get the standard deviation which you use in the confidence interval formula.
CI
:
s
2
ˆ
Y
0
=
s
2
Y

X
(
1
n
+
(
X
0
−
¯
X
)
2
SSX
)
PI
:
s
2
Y
0
=
s
2
Y

X
(1 +
1
n
+
(
X
0
−
¯
X
)
2
SSX
) =
s
2
Y

X
+
s
2
ˆ
Y
0
Both formulas contain the variability of the points about the line,
s
2
Y

X
.
This makes sense.
The more
variable the data is, the harder it will be to make predictions, so you will need the CIs and PIs to be
wider.
Both formulas contain 1/n.
This also makes sense.
The more data you have, the more accurate
your estimates, the better your predictions will be, and the narrower your CIs and PIs can be.
Finally,
both intervals contain a piece which says how far the value you are predicting at,
X
0
is from the middle
of your data set,
¯
X
. The further you get from your data, the less reliable your predictions will be. As we
have discussed many times, predicting too far outside the range of the data is risky. Thus the further
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '07
 Sugar
 Statistics, Econometrics, Regression Analysis, Statistical hypothesis testing, Prediction interval, tobs

Click to edit the document details