1)
Read in the NHANES II data and restrict to those followed for mortality.
You will need to
keep at least the variables listed in the lab exercise.
Restructure the data into the Andersen-Gill
counting process format (where each observation represents a full person-year of follow-up),
creating the same variables as we did in the lab exercise during this step (
start, end, fup_yr, cal_yr,
age, event, and censor
).
Print the resulting data
only
for ID=27793 (variables
: diab, born_yr,
exam_yr, last_yr ,cal_yr ,die_yr, i ,fupyrs, fup_yr, age ,start ,end ,event and censor
) in the processed
data set (use the statement: WHERE ID=27793;
in PROC PRINT).
/* Q1 */
data
HW6;
set
"P:\Spring 2014\EPI204\nh2fs2014.sas7bdat"
(
keep
=seqno death born_yr exam_yr die_yr last_yr diab booze recex male race height
wt smokever school serchol);
if
death ne
.
;
bmi = wt/((height/
100
)*(height/
100
));
hichol =
.
;
if
.
< serchol <=
240
then
hichol =
0
;
else
if
serchol >
240
then
hichol =
1
;
if
school^=
.
and school<
3
then
lowed=
1
;
else
if
school>=
3
then
lowed=
0
;
rename
seqno=id;
last_yr=last_yr+
1900
;
die_yr=die_yr+
1900
;
exam_yr=exam_yr+
1900
;
born_yr=born_yr+
1900
;
fupyrs=last_yr-exam_yr;
do
i =
0
to
fupyrs;
fup_yr=i;
start=i;
end=i+
1
;
cal_yr=exam_yr+i;
age=cal_yr-born_yr;
if
die_yr=cal_yr
then
event=
1
;
else
event=
0
;
if
event=
1
then
censor=
0
;
else
censor=
1
;
output
;
end
;
run
;
proc
print
data
=HW6;
by
id;
where
id=
27793
;
var
diab born_yr exam_yr last_yr cal_yr die_yr i fupyrs fup_yr age start end event censor;
run
;
id=27793
Obs
DIA
B
BORN_Y
R
EXAM_Y
R
LAST_Y
R
cal_y
r
DIE_Y
R
i
fupyr
s
fup_y
r
ag
e
star
t
en
d
even
t
censor
12869
1
0
1905
1978
1980
1978
1980
0
2
0
73
0
1
0
1
12869
2
0
1905
1978
1980
1979
1980
1
2
1
74
1
2
0
1
12869
3
0
1905
1978
1980
1980
1980
2
2
2
75
2
3
1
0
1

2)
Create BMI from the
height
and
wt
variables. Create a variable called
hichol
that equals 1 if
baseline serum cholesterol is greater than 240, and 0 if it is less than or equal to 240.
Make sure
that any missing values for
serchol
are also missing for
hichol
.
How many person-years were
contributed to each level of
hichol
in the dataset?
I ran the following code on the non Andersen-Gill data.
/* Q2 */
proc
summary
nway
;
class
hichol;
var
fupyrs;
output
out
=want (drop=_:)
n
=count
sum
=Follow_Up_Years;
run
;
proc
print
data
= want;
run
;
Ob
s
hicho
l
coun
t
Follow_Up_Years
1
0
5987
77820
2
1
3263
41638
For hichol = 0, 77820 person-years were contributed.
For hichol = 1, 41638 person-years were contributed.
3)
Run a Cox model to calculate the relative hazard of
hichol
with respect to mortality.
Use time
under study observation as the time scale/metameter.
Report the unadjusted hazard ratio and
95% confidence interval and interpret these results. Use the EFRON approximation to handle ties
through this homework.
/* Q3 */
PROC
phreg
data
=HW6;
model
(start, end)*censor(
1
) = hichol /
rl
ties
=efron;
run
;
Analysis of Maximum Likelihood Estimates
Paramete
r
D
F
Paramete
r
Estimate
Standar
d
Error
Chi-
Square
Pr > ChiS
q
Hazar
d
Ratio
95% Hazard Ratio
Confidence
Limits
hichol
1
0.20360
0.04417
21.2423
<.0001
1.226
1.124
1.337
The unadjusted hazard rate of death for a person with high cholesterol is 1.226 times that of someone with low
cholesterol with a 95% CI = (1.124, 1.337), over the study period.

#### You've reached the end of your free preview.

Want to read all 9 pages?

- Spring '14
- Hernandez-Diaz
- Survival analysis, Proportional hazards models, Disadv