π
1
=0.3,
π
2
=0.7
74
Why discriminant
analysis?
•
When the classes are wellseparated,
the parameter
estimates for the logistic
regression model are surprisingly
unstable. Linear discriminant analysis
does not suffer from
this problem.
Why discriminant
analysis?
•
If
n
is small and the distribution of
the predictors
X
is
approximately
normal in each of the classes, the
linear
discriminant model is again
more stable than the logistic
regression model.
Why discriminant
analysis?
•
Linear discriminant analysis is
popular when we have more
than
two response classes, because it
also provides lowdimensional views
of the data.
Linear Discriminant Analysis when
p
= 1
The Gaussian density has the form
.
Here
µ
k
is the mean, and
σ
k
2
is the
variance (in class
k
). We will assume
that all the
σ
k
=
σ
are the same.
78
Linear Discriminant Analysis when
p
= 1
Plugging this into Bayes formula,
we get a rather complex
expression for
p
k
(
x
) = Pr(
Y
=
k

X
=
x
):
Happily, there are simplifications and
cancellations.
79
Linear Discriminant Analysis when
p
= 1
Happily, there are simplifications and cancellations.
80
Linear Discriminant Analysis when
p
= 1
Happily, there are simplifications and cancellations.
81
Discriminant functions
To classify at the value
X
=
x
, we
need to see which of the
p
k
(
x
) is
largest. Taking logs, and discarding
terms that do not
depend on
k
, we see
that this is equivalent to assigning
x
to
the
class with the largest
discriminant
score
:
Note that
δ
k
(
x
) is a
linear
function of
x
.
82
Discriminant functions
If there are
K
= 2 classes and
π
1
=
π
2
= 0
.
5, then one can see
that
the
decision boundary
is at
(show this)
83
Discriminant functions
(show this)
84
−4
−2
0
2
4
−3
−2
−1
0
1
2
3
4
0
1
2
3
4
5
Example with
µ
1
= −1
.
5,
µ
2
= 1
.
5,
π
1
=
π
2
= 0
.
5, and
σ
2
= 1.
Typically we don’t know these
parameters; we just have the
training
data. In that case we simply estimate the
parameters
and plug them into the rule.
85
Estimating the parameters
Where
is the usual
formula for the estimated variance in the
k
th class.
Linear Discriminant Analysis when
p >
1
Discriminant function:
Density:
87
Linear Discriminant Analysis when
p >
1
Discriminant function:
Density:
88
Linear Discriminant Analysis when
p >
1
Despite its complex
form,
δ
k
(
x
) =
c
k
0
+
c
k
1
x
1
+
c
k
2
x
2
+
. . .
+
c
kp
x
p
is a
linear function
.
89
Illustration:
p
= 2 and
K
= 3classes
−4
−2
0
2
4
−4
−2
0
2
4
X
1
X
2
−4
−2
0
2
4
X
2
−4
−2
0
2
4
X
1
• Here
π
1
=
π
2
=
π
3
=1
/
3.
•
The dashed lines are known as the
Bayes
decision boundaries
.
Were they known, they
would yield the fewest misclassification
errors, among all possible classifiers.
Fisher’s Iris Data
4 variables
3 species
50
samples/class
•
Setosa
•
Versicolor
•
Virginica
LDA classifies
all but 3 of the
150
training
samples
correctly.
92
Fisher’s Discriminant
Plot
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
−10
−5
0
5
10
−2 −1
You've reached the end of your free preview.
Want to read all 171 pages?
 Fall '19
 Normal Distribution, Regression Analysis, Test, The Federalist Papers, Maximum likelihood, Likelihood function