1) Read in the NHANES II data and restrict to those followed for mortality. You will need to keep at least the variables listed in the lab exercise. Restructure the data into the Andersen-Gill counting process format (where each observation represents a full person-year of follow-up), creating the same variables as we did in the lab exercise during this step ( start, end, fup_yr, cal_yr, age, event, and censor ). Print the resulting data only for ID=27793 (variables : diab, born_yr, exam_yr, last_yr ,cal_yr ,die_yr, i ,fupyrs, fup_yr, age ,start ,end ,event and censor ) in the processed data set (use the statement: WHERE ID=27793; in PROC PRINT). /* Q1 */ data HW6; set "P:\Spring 2014\EPI204\nh2fs2014.sas7bdat" ( keep =seqno death born_yr exam_yr die_yr last_yr diab booze recex male race height wt smokever school serchol); if death ne . ; bmi = wt/((height/ 100 )*(height/ 100 )); hichol = . ; if . < serchol <= 240 then hichol = 0 ; else if serchol > 240 then hichol = 1 ; if school^= . and school< 3 then lowed= 1 ; else if school>= 3 then lowed= 0 ; rename seqno=id; last_yr=last_yr+ 1900 ; die_yr=die_yr+ 1900 ; exam_yr=exam_yr+ 1900 ; born_yr=born_yr+ 1900 ; fupyrs=last_yr-exam_yr; do i = 0 to fupyrs; fup_yr=i; start=i; end=i+ 1 ; cal_yr=exam_yr+i; age=cal_yr-born_yr; if die_yr=cal_yr then event= 1 ; else event= 0 ; if event= 1 then censor= 0 ; else censor= 1 ; output ; end ; run ; proc print data =HW6; by id; where id= 27793 ; var diab born_yr exam_yr last_yr cal_yr die_yr i fupyrs fup_yr age start end event censor; run ; id=27793 Obs DIA B BORN_Y R EXAM_Y R LAST_Y R cal_y r DIE_Y R i fupyr s fup_y r ag e star t en d even t censor 12869 1 0 1905 1978 1980 1978 1980 0 2 0 73 0 1 0 1 12869 2 0 1905 1978 1980 1979 1980 1 2 1 74 1 2 0 1 12869 3 0 1905 1978 1980 1980 1980 2 2 2 75 2 3 1 0 1
2) Create BMI from the height and wt variables. Create a variable called hichol that equals 1 if baseline serum cholesterol is greater than 240, and 0 if it is less than or equal to 240. Make sure that any missing values for serchol are also missing for hichol . How many person-years were contributed to each level of hichol in the dataset? I ran the following code on the non Andersen-Gill data. /* Q2 */ proc summary nway ; class hichol; var fupyrs; output out =want (drop=_:) n =count sum =Follow_Up_Years; run ; proc print data = want; run ; Ob s hicho l coun t Follow_Up_Years 1 0 5987 77820 2 1 3263 41638 For hichol = 0, 77820 person-years were contributed. For hichol = 1, 41638 person-years were contributed. 3) Run a Cox model to calculate the relative hazard of hichol with respect to mortality. Use time under study observation as the time scale/metameter. Report the unadjusted hazard ratio and 95% confidence interval and interpret these results. Use the EFRON approximation to handle ties through this homework. /* Q3 */ PROC phreg data =HW6; model (start, end)*censor( 1 ) = hichol / rl ties =efron; run ; Analysis of Maximum Likelihood Estimates Paramete r D F Paramete r Estimate Standar d Error Chi- Square Pr > ChiS q Hazar d Ratio 95% Hazard Ratio Confidence Limits hichol 1 0.20360 0.04417 21.2423 <.0001 1.226 1.124 1.337 The unadjusted hazard rate of death for a person with high cholesterol is 1.226 times that of someone with low cholesterol with a 95% CI = (1.124, 1.337), over the study period.
- Spring '14
- Survival analysis, Proportional hazards models, Disadv