1
Advanced Topics in Forest
Biometrics
–
FOR6934
Logistic Regression
What is Logistic Regression?
A form of regression that allows the prediction of
discrete (categorical) variables using continuous
and/or discrete predictors
Addresses the same questions that discriminant
analysis and multiple regression do but without
distributional assumptions on the predictors
the predictors need not be linearly related to the dependent
variable
Homoscedasticity (equal variance in each group) not
necessary
Uses a
logit
link function to transform the probability,
p:
p
p
p
1
ln
)
logit(
Why use Logistic Regression?
Logistic regression is often used when:
The dependent variable is discrete
The relationship between the dependent and
independent variables is non-linear
Example: the probability
of forking changes very
little with for slow
growing trees, but a 30-
cm change in height
greatly increases the
probability of forking
Questions
Can categories be correctly predicted given a set of
predictors?
Usually once this is established the predictors are
manipulated to see if the equation can be simplified.
Comparison of equation with predictors plus intercept to a
model with just the intercept
What is the strength of association between the
outcome variable and a set of predictors?
What is the relative importance of each predictor?
How does each variable affect the outcome?
Does a predictor make the solution better / worse / have no
effect?
Questions - 2
Are there interactions among predictors?
Does adding interactions among predictors
(continuous or categorical) improve the model?
Continuous predictors may have to be
“centered” (further reading) in order to avoid
multicollinearity when interactions are present
How good is the model?
Can parameters be accurately predicted?
Can the model accurately classify cases for
which the outcome is known?
How good is the “fit” of the model?
Assumptions
The outcome must be discrete
There must be “enough” responses in every given
category
If there are too many cells with no responses:
parameter estimates and standard errors can “blow up”
groups may be perfectly separable (e.g. multicollinear)
which makes maximum likelihood estimation impossible
If the distributional assumptions of discriminant
analysis (DA) or multiple regression are met, they may
be more powerful
Note, however, that DA has been shown to overestimate the
association using discrete predictors in some cases

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*