C532: ANOVA & Regression Modeling
1RY
, 201
Y. Cao
1
Topics:
Learn more about multiple linear regressions and how to choose the “best”
model
Datasets
: bodyfa.sav
To select the
possible best
regression model, first and very important thing to do is
to use all of available resources to assess which variables should be included if they
are clinical important.
Secondly, we need to resort to statistical tools and theories. This is our main topic
today. That is, we use the
block method
in the procedure of Linear Regression with
SPSS; we add one explanatory variable at a time to the model, and compare the
R
square (R
2
) change
to learn how much (percentage) each new variable contributes
to the model!
Of course, for each block, we can include more than one variable, but today we always
stick to this rule: include only one variable for each block thus we can simplify our
solutions.
Realistically speaking, we try to make a model the best model as possible as we can;
or we may say that there is no such thing – the best model at all!
Once we have decided which variables are “the best candidates” to predict the
outcome variable, and then we want to use the regular method to do modeling. That is,
put all variables we choose to enter the model simultaneously (the block no longer is
good for us or more precisely, we only use one block now!).
Finally, we can assess our models with the help of
residual plots
and
influence
analysis
as we have learned from previous course C531.
In the following examples, we will not investigate higher order terms or interactions, for
instances, terms x
1
x
2
, x
1
x
2
x
3
, X
1
x
2
x
3
x
4
, and so on but no terms like x
1
2
,
x
1
2
*
x
2
,…
Suppose now, after taking some clinical considerations, we decide to include four
important
explanatory variables
:
Age
(=x
1
, years),
Body Mass Index
(or BMI=x
2
, weight (pound) / [height (inch]
2
),
Abdominal circumference
(=x
3
, cm) and
Hip circumference
(=x
4
, cm),
to predict the
outcome variable
,
% body fat
(y).