# Π 1 03 π 2 07 74 why discriminant analysis when the

This preview shows page 74 - 94 out of 171 pages.

π 1 =0.3, π 2 =0.7 74
Why discriminant analysis? When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem.
Why discriminant analysis? If n is small and the distribution of the predictors X is approximately normal in each of the classes, the linear discriminant model is again more stable than the logistic regression model.
Why discriminant analysis? Linear discriminant analysis is popular when we have more than two response classes, because it also provides low-dimensional views of the data.
Linear Discriminant Analysis when p = 1 The Gaussian density has the form . Here µ k is the mean, and σ k 2 is the variance (in class k ). We will assume that all the σ k = σ are the same. 78
Linear Discriminant Analysis when p = 1 Plugging this into Bayes formula, we get a rather complex expression for p k ( x ) = Pr( Y = k | X = x ): Happily, there are simplifications and cancellations. 79
Linear Discriminant Analysis when p = 1 Happily, there are simplifications and cancellations. 80
Linear Discriminant Analysis when p = 1 Happily, there are simplifications and cancellations. 81
Discriminant functions To classify at the value X = x , we need to see which of the p k ( x ) is largest. Taking logs, and discarding terms that do not depend on k , we see that this is equivalent to assigning x to the class with the largest discriminant score : Note that δ k ( x ) is a linear function of x . 82
Discriminant functions If there are K = 2 classes and π 1 = π 2 = 0 . 5, then one can see that the decision boundary is at (show this) 83
Discriminant functions (show this) 84
−4 −2 0 2 4 −3 −2 −1 0 1 2 3 4 0 1 2 3 4 5 Example with µ 1 = −1 . 5, µ 2 = 1 . 5, π 1 = π 2 = 0 . 5, and σ 2 = 1. Typically we don’t know these parameters; we just have the training data. In that case we simply estimate the parameters and plug them into the rule. 85
Estimating the parameters Where is the usual formula for the estimated variance in the k th class.
Linear Discriminant Analysis when p > 1 Discriminant function: Density: 87
Linear Discriminant Analysis when p > 1 Discriminant function: Density: 88
Linear Discriminant Analysis when p > 1 Despite its complex form, δ k ( x ) = c k 0 + c k 1 x 1 + c k 2 x 2 + . . . + c kp x p is a linear function . 89
Illustration: p = 2 and K = 3classes −4 −2 0 2 4 −4 −2 0 2 4 X 1 X 2 −4 −2 0 2 4 X 2 −4 −2 0 2 4 X 1 • Here π 1 = π 2 = π 3 =1 / 3. The dashed lines are known as the Bayes decision boundaries . Were they known, they would yield the fewest misclassification errors, among all possible classifiers.
Fisher’s Iris Data 4 variables 3 species 50 samples/class Setosa Versicolor Virginica LDA classifies all but 3 of the 150 training samples correctly.
92
Fisher’s Discriminant Plot l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l ll l l l −10 −5 0 5 10 −2 −1

#### You've reached the end of your free preview.

Want to read all 171 pages?