hw7key - BIOST/EPI 513 Spring Quarter 2011 Dr. McKnight...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 BIOST/EPI 513 Spring Quarter 2011 Dr. McKnight HOMEWORK 7 KEY See Appendix I for STATA commands and Appendix II for output. 1. Using the data in prosHW.dta on the Assigments page of the class website, use the training data set (for which training == 1) and the command swaic (type findit swaic ) to improve upon the predictive model derived in class. The improvement will be made possible by allowing dummy variables for the dre variable (it was treated as grouped linear in class) and by also allowing a log(psa) term. (a) Generate a new variable equal to log(psa) , fit a logistic model using only the training data set to predict the binary “penetration” outcome using all the following variables: psa, log(psa), age, race, i.dre, i.dcaps, vol (main effects only). Present the value of the area under the ROC curve (AUC) for this model for both the training and validation data sets. AUC for the training data set: 0.787 AUC for the validation data set: 0.738 (b) Use swaic and the training data set to determine the model with a lowest AIC from among those considered. List the subset of the original variables that are included in this model. The model with the lowest AIC (209.9) contains log(psa) , dre (as a factor variable), vol , and race . (c) Fit the minimum AIC model derived using swaic in (b) to the training data set and present AIC and BIC for it. AIC: 209.9 BIC: 232.2 (d) Compute and present the AUC of the model fit in (c) for both training and validation data sets. AUC for the training data set: 0.783 AUC for the validation data set: 0.743 (e) Compare the values of AUC applied to the validation data set from both the full model fit in (a) and the reduced model fit in (c). What do you conclude? The AUC from the reduced model applied to the validation data set is higher than the AUC from the full model applied to the validation data set, so the reduced model is preferred for its accuracy and parsimony. This illustrates that a model containing more variables is not necessarily better in terms of prediction accuracy.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 2. The data file leuk.dta
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 5

hw7key - BIOST/EPI 513 Spring Quarter 2011 Dr. McKnight...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online