This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Biostatistics 100B Homework Solutions 5 February 12th, 2007 Solutions To Homework Assignment 5 Warmup Problems (1) Interpreting A Multiple Regression Equation: (a) False. It is never safe to say that a change in X causes a change in Y. Just because X and Y are related does not mean one causes the other. The correct statement would be that a change of one unit in X 1 is associated with a 14 unit increase in Y, assuming X 2 is held fixed. Remember that you do not even know whether X 2 can be held fixed when X 1 is increased! (b) False: The sign of the coefficient has nothing to do with the strength of the relationship between X and Y. It simply tells you the direction of the relationship. If the coefficient is positive, increases in X are associated with increases in Y. If the coefficient is negative, increases in X are associated with decreases in Y. Even the size of the coefficient does not really tell you the strength of the relationship. Suppose X 1 is measured in inches. If I change the units to feet, I will multiply the coefficient by 12 but nothing will have changed. Beware of comparing magnitudes of coefficients! (c) True (maybe): Suppose X 1 is held fixed, say at 0. Then if X 2 is large enough (say 3 or greater), ˆ Y will be negative. This is not necessarily bad–Y may be a variable that takes on negative values. (Note that this does assume that X 1 can take on the value 0 when X 2 is 3 which need not be possible. Technically you would need to be sure it was realistically possible to get a pair of X 1 and X 2 values that would make Y negative.) (2) Special Issues In Multiple Regression: (a) Overfitting occurs when you add lots of predictor variables to your model that are not really signif- icantly related to your response variable. Adding more X variables makes your model look better. For instance R 2 must always go up and SSE must always go down as you add more predictors because the model cannot explain less variability. This may make it seem as if adding more predictors can only help and not hurt. However, if the X’s are not really related to Y then your model may give lousy predictions for new data points even if it does a good job on your current data points. This is where the term overfitting comes from. The model is constructed to “fit” your existing data points very well but it may have to go through so many contortions and wiggles to do it that it won’t fit new data points well–it is “overfitted” to the original points. In fact, it turns out that if you have as many predictor variables as you have data points your model will fit the data perfectly–you will get an R 2 of 100%–even if the predictors you are using have nothing to do with Y! There are many ways to avoid overfitting. First, you shouldn’t include in your model predictor variables that are not significantly related to Y. You can check this by looking at the individual t tests for the variables. Second, if you add lots of useless predictors R 2 adj will go down even though...
View Full Document
This note was uploaded on 03/12/2008 for the course BIOSTAT 110B taught by Professor Sugar during the Spring '08 term at UCLA.
- Spring '08