Π 1 03 π 2 07 74 why discriminant analysis when the

This preview shows page 74 - 94 out of 171 pages.

π 1 =0.3, π 2 =0.7 74
Image of page 74
Why discriminant analysis? When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem.
Image of page 75
Why discriminant analysis? If n is small and the distribution of the predictors X is approximately normal in each of the classes, the linear discriminant model is again more stable than the logistic regression model.
Image of page 76
Why discriminant analysis? Linear discriminant analysis is popular when we have more than two response classes, because it also provides low-dimensional views of the data.
Image of page 77
Linear Discriminant Analysis when p = 1 The Gaussian density has the form . Here µ k is the mean, and σ k 2 is the variance (in class k ). We will assume that all the σ k = σ are the same. 78
Image of page 78
Linear Discriminant Analysis when p = 1 Plugging this into Bayes formula, we get a rather complex expression for p k ( x ) = Pr( Y = k | X = x ): Happily, there are simplifications and cancellations. 79
Image of page 79
Linear Discriminant Analysis when p = 1 Happily, there are simplifications and cancellations. 80
Image of page 80
Linear Discriminant Analysis when p = 1 Happily, there are simplifications and cancellations. 81
Image of page 81
Discriminant functions To classify at the value X = x , we need to see which of the p k ( x ) is largest. Taking logs, and discarding terms that do not depend on k , we see that this is equivalent to assigning x to the class with the largest discriminant score : Note that δ k ( x ) is a linear function of x . 82
Image of page 82
Discriminant functions If there are K = 2 classes and π 1 = π 2 = 0 . 5, then one can see that the decision boundary is at (show this) 83
Image of page 83
Discriminant functions (show this) 84
Image of page 84
−4 −2 0 2 4 −3 −2 −1 0 1 2 3 4 0 1 2 3 4 5 Example with µ 1 = −1 . 5, µ 2 = 1 . 5, π 1 = π 2 = 0 . 5, and σ 2 = 1. Typically we don’t know these parameters; we just have the training data. In that case we simply estimate the parameters and plug them into the rule. 85
Image of page 85
Estimating the parameters Where is the usual formula for the estimated variance in the k th class.
Image of page 86
Linear Discriminant Analysis when p > 1 Discriminant function: Density: 87
Image of page 87
Linear Discriminant Analysis when p > 1 Discriminant function: Density: 88
Image of page 88
Linear Discriminant Analysis when p > 1 Despite its complex form, δ k ( x ) = c k 0 + c k 1 x 1 + c k 2 x 2 + . . . + c kp x p is a linear function . 89
Image of page 89
Illustration: p = 2 and K = 3classes −4 −2 0 2 4 −4 −2 0 2 4 X 1 X 2 −4 −2 0 2 4 X 2 −4 −2 0 2 4 X 1 • Here π 1 = π 2 = π 3 =1 / 3. The dashed lines are known as the Bayes decision boundaries . Were they known, they would yield the fewest misclassification errors, among all possible classifiers.
Image of page 90
Fisher’s Iris Data 4 variables 3 species 50 samples/class Setosa Versicolor Virginica LDA classifies all but 3 of the 150 training samples correctly.
Image of page 91
92
Image of page 92
Fisher’s Discriminant Plot l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l ll l l l −10 −5 0 5 10 −2 −1
Image of page 93
Image of page 94

You've reached the end of your free preview.

Want to read all 171 pages?

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes