This preview shows pages 1–3. Sign up to view the full content.
1
Advanced Topics in Forest
Biometrics
–
FOR6934
Logistic Regression
What is Logistic Regression?
A form of regression that allows the prediction of
discrete (categorical) variables using continuous
and/or discrete predictors
Addresses the same questions that discriminant
analysis and multiple regression do but without
distributional assumptions on the predictors
the predictors need not be linearly related to the dependent
variable
Homoscedasticity (equal variance in each group) not
necessary
Uses a
logit
link function to transform the probability,
p:
p
1
ln
)
logit(
Why use Logistic Regression?
Logistic regression is often used when:
The dependent variable is discrete
The relationship between the dependent and
independent variables is nonlinear
Example: the probability
of forking changes very
little with for slow
growing trees, but a 30
cm change in height
greatly increases the
probability of forking
Questions
Can categories be correctly predicted given a set of
predictors?
Usually once this is established the predictors are
manipulated to see if the equation can be simplified.
Comparison of equation with predictors plus intercept to a
model with just the intercept
What is the strength of association between the
outcome variable and a set of predictors?
What is the relative importance of each predictor?
How does each variable affect the outcome?
Does a predictor make the solution better / worse / have no
effect?
Questions  2
Are there interactions among predictors?
Does adding interactions among predictors
(continuous or categorical) improve the model?
Continuous predictors may have to be
“centered” (more on this later!) in order to avoid
multicollinearity when interactions are present
How good is the model?
Can parameters be accurately predicted?
Can the model accurately classify cases for
which the outcome is known?
How good is the “fit” of the model?
Assumptions
The outcome must be discrete
There must be “enough” responses in every given
category
If there are too many cells with no responses:
parameter estimates and standard errors can “blow up”
groups may be perfectly separable (e.g. multicollinear)
which makes maximum likelihood estimation impossible
If the distributional assumptions of discriminant
analysis (DA) or multiple regression are met, they may
be more powerful
Note, however, that DA has been shown to overestimate the
association using discrete predictors in some cases
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document2
Assumptions  2
There must be a linear relationship
in the
logit
The systematic component of the equation
should have a linear relationship with the logit
form of the dependent variable
The “usual assumptions”:
Lack of multicollinearity
Lack of “driving” outliers
Independent data
Odds ratios
The odds in favor of an event are:
p
/ (1 −
) ,
where:
is the probability of the event
Odds are usually written as:
This is the end of the preview. Sign up
to
access the rest of the document.
 Spring '08
 Staff

Click to edit the document details