The following SAS commands invoke the REG procedure and fit model (1) to
the data:
proc reg;
model y=x;
run;
where y is the response variable and x is the regressor variable. To fit a model to the data, you must
specify the MODEL statement.
3.1 Using proc reg
Create a simple regression model for FVC among children of 5-18 years old. Start the model by
including only age as a covariate.
proc
reg
data
= restricted;
model
fvc = ageyrs;
title
"Linear regression model"
;
run;
Interpret the beta coefficient for age.
3.2 Multivariate regression
Now add more covariates in the model. Interpret the beta coefficient for sex.
proc
reg
data
= restricted;
model
fvc = ageyrs height sex wt;
title
"Linear regression model with covariates"
;
run
;
4. General Linear Models: The GLM procedure (PROC GLM)
The GLM procedure can also be used to fit linear regression models.
Some notable differences in coding between the two procedures that may affect the choice of the
one over the other are the following:
4.1 Parameter estimates output
PROC REG: By default, it outputs the parameter estimates, their standard errors, the test-
6

EPI204 – Lab Session 1
statistics and p-values.
PROC GLM: The SOLUTION option in the MODEL statement must be specified when the
model includes categorical variables and the CLASS statement is used.
4.2 Categorical variables as independent variables
PROC REG:
Indicator variables for each level of the categorical variable must be created.
PROC GLM:
It can accommodate categorical variables as long as these variables are included in
the CLASS statement. If the CLASS statement is not specified, SAS will treat the
categorical variable as a continuous variable with values equal to the levels of the
categorical variable.
4.3
Using proc reg and proc glm
Let’s check whether race (as indicator variables) is an important predictor of FVC using proc reg.
proc
reg
data
= restricted;
model
fvc =ageyrs height sex wt black other;
title
"REG: Linear regression model"
;
run
;
Let’s check whether race (as indicator variables) is an important predictor of FVC using proc glm.
proc
glm
data
= restricted;
class
race;
model
fvc = ageyrs height sex wt race/
solution
;
title
"GLM: Linear regression model"
;
run
;
Are your coefficients the same?
Why or why not?
Proc Reg Betas
Proc GLM Betas
Intercept
-389.5
-401.5
Ageyrs
4.7
4.7
Height
3.5
3.5
Sex
32.7
32.7
Wt
1.7
1.7
White/Race=1
NA
12.0
Black/Race=2
-46.7
-34.7
Other/Race=3
-12.0
NA
SAS treats the highest category as the reference category for comparisons between levels of a
categorical variable when you use a class statement. Why? Because they are idiots. To change the
reference category for race to “whites” instead of “other”, we need to re-order the data and use the
order=data option in glm. Note that the coefficients or sex, height, and age are the same in both
models.


You've reached the end of your free preview.
Want to read all 9 pages?
- Spring '14
- Hernandez-Diaz
- Linear Regression, Regression Analysis, FVC