This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Biostatistics 100B Homework Solutions 4 February 5th, 2007 Solutions To Homework Assignment 4 Warmup Problems (1) Interval Basics: A confidence interval gives you a range of values which you are (reasonably) sure includes the AVERAGE value of Y associated with a given X. In other words, if you found many data points with that X, then the average value of the corresponding Y’s would lie in the interval. A prediction interval gives you a range of values which you are (reasonably) certain includes the value of Y associated with a SINGLE data point or the NEXT data point at a particular value of X. To emphasize: you use a CI if you are interested in the AVERAGE Y for a given value of X and you use a PI is you are interested in a SINGLE or PARTICULAR Y for a given value of X. As an example, suppose I want to predict peoples’ heights (Y) based on their ages (X). If I want to predict the AVERAGE height of all 10 year olds I use a confidence interval for Y when X=10. However, suppose my cousin has just had a baby girl named Susan and I want to predict what Susan’s height will be when she is 10. Then I want a prediction interval because I am trying to predict the height of a SINGLE 10 year old. The prediction interval will always be wider than the confidence interval because it is harder to predict for a single person than for lots of people. In the example above, think of it this way. Any single child could be really tall or short making it hard to guess in advance. However, if I want the average for all children, some will be tall, some short and these will balance each other out in the average. Similarly, It is much easier for me to guess what the average score for the class will be on a midterm than it is for me to guess what any individual student’s score will be. The formulas for the CI and PI reflect this difference. Look at the formulas for the variances or standard deviations–the extra 1 in the PI formula makes it wider. Here are some additional comments that were not part of the problem but may help your understanding. If they aren’t useful, just ignore them.... I have written the formulas for the variances for the CIs and PIs below because they are a little easier to explain. You of course take the square root to get the standard deviation which you use in the confidence interval formula. CI : s 2 ˆ Y = s 2 Y | X ( 1 n + ( X − ¯ X ) 2 SSX ) PI : s 2 Y = s 2 Y | X (1 + 1 n + ( X − ¯ X ) 2 SSX ) = s 2 Y | X + s 2 ˆ Y Both formulas contain the variability of the points about the line, s 2 Y | X . This makes sense. The more variable the data is, the harder it will be to make predictions, so you will need the CIs and PIs to be wider. Both formulas contain 1/n. This also makes sense. The more data you have, the more accurate your estimates, the better your predictions will be, and the narrower your CIs and PIs can be. Finally, both intervals contain a piece which says how far the value you are predicting at, X is from the middle of your data set,...
View Full Document
This note was uploaded on 03/12/2008 for the course BIOSTAT 100B taught by Professor Sugar during the Spring '07 term at UCLA.
- Spring '07