HOMEWORK 1 – Due Tuesday, April 1, 2014 at 10:30 AM
Table. List of variables
Variable Name
Description
servitc
Serum vitamin c level (continuous µmol/L)
ageyrs
age (years)
height
height in centimeters (cm)
wt
weight (kilograms)
sex
sex (men 1; women 2)
race
race
(white 1; black 2;
other 3)
smokever
smoking status (never smoker 0; ever smoker 1)
booze
alcohol consumption (number of drinks per week, continuous)
1.
Create a new dataset by restricting the NHANES II dataset (nh2fs2014.sas7bdat, available from the
course website under the Data 2014 page) to subjects with complete data
on the covariates listed in
the above Table. Then create the following variables:

BMI (continuous): calculated by [weight (kg)/ (height(m))
2
]

BMI_CAT (categorical): recode BMI_CAT as 0 if BMI < 19 (underweight), 1 if 19 ≤ BMI <
25 (normal weight), and 2 if BMI ≥ 25 (overweight).

Gender:
recode sex as 0=male, 1=female

Alcohol consumption status (categorical variable): create categories for <3 drinks/week vs.
≥ 3 drinks/week.
Please provide SAS code as the answer to this question.
/* Q1 and Q2 */
data
HW1;
set
"P:\Spring 2014\EPI204\nh2fs2014.sas7bdat"
;
where
servitc ne
.
and ageyrs ne
.
and height ne
.
and wt ne
.
and sex ne
.
and race
ne
.
and smokever ne
.
and booze ne
.
;
race = race 
1
;
bmi = wt/((height/
100
)*(height/
100
));
if
bmi <
19
then
bmi_cat =
0
;
else
if
19
<= bmi <
25
then
bmi_cat =
1
;
else
if
bmi >=
25
then
bmi_cat =
2
;
sex = sex 
1
;
if
booze <
3
then
booze_cat =
0
;
else
if
booze >=
3
then
booze_cat =
1
;
sexbmi = sex*bmi;
sexbmicat = sex*bmi_cat;
run
;
2.How many observations were there in the original dataset? How many observations are in your new dataset?
3.Compute descriptive statistics for serum vitamin C level ((Mean, Median, Max, and Minimum). Graph the distribution in the form of a box plot and a histogram. Describe the distribution of serum vitamin C level.1
4.
a. Graph and describe the relation of serum vitamin C with BMI as shown by a scatterplot.
