Go to file -> Save As -> to save your program.
Close SAS.
Open the folder where you saved lab1.sas7bdat.
Is it still there?
Where are the datasets original and restricted?
When you close the SAS program, SAS deletes any temporary datasets (i.e. those in the WORK
folder).
To get back to where we were, open SAS, open your saved editor program.
Create permanent datasets
Replace
data
original; set EPI204.lab1;
run
;
by
data
EPI204.original; set EPI204.lab1;
run
;
You should see a file “original” in your P drive or the place where you saved
lab1.sas7bdat file
1. Data Exploration
1.1 Summary statistics
Open your saved editor program and re-run it.
Let’s examine the summary statistics for FVC.
proc
means
data
=restricted
n
mean
std
min
median
q1
q3
max
nmiss
;
tables
fvc;
run
;
What’s wrong with this code?
Where did you find this information?
Correct the code and describe the distribution of FVC in this cohort.
proc
means
data
=restricted
n
mean
std
min
median
q1
q3
max
nmiss
;
var
fvc;
run
;
1.2 Histograms and box plots
Create histograms and box plots for forced vital capacity and height using the newly created
dataset restricted to children. Describe the distribution of forced vital capacity and height in
children.
proc
univariate
data
= restricted
plot
;
var
fvc height;
histogram
;
run
;
proc
sgplot
data
= restricted;
title
"Forced Vital Capacity distribution"
;
vbox
fvc;
run
;
Note that once you set a title, it will continue to be the title until replaced by something else.
1.3 Scatter plot
Describe the relationship between FVC and height as shown by a scatter plot.
proc
gplot
data
= restricted;
plot
fvc*height;
run
;
2. Linear Regression Model
A statistical model equation is used to express the relation between a response variable and another
set of variables. The model predicts the
outcome
variable (also called the dependent or response
variable) from a function of
regressor
variables (also called independent variables, predictors,
explanatory variables, or factors) and
parameters.
In a linear regression model the predictor function is linear in the
parameters
. For example, a linear
regression model equation has the following form:
(1)
where, Y
i
is the response variable, X
i
is a regressor variable, β
0
and β
1
are unknown parameters to
be estimated, and ε
i
is the error term for the observations i=1,2,…,n.
Model (1) can be fit with different procedures in SAS, including PROC REG and
PROC GLM
3. Linear Models: The REG procedure (PROC REG)
The REG procedure is a general-purpose procedure for regression. It is used when the outcome
variable is continuous while the predictor variables may be categorical variables, which divide the
observations into discrete groups, or continuous variables.
There are numerous statements and options available in PROC REG, and you can look them up in
SAS online documentation. In this lab session, we are going to focus on the options you will be
using during the class.

