Unformatted text preview: Mixture Discriminant Analysis Mixture Discriminant Analysis
Jia Li
Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/jiali Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Mixture Discriminant Analysis A method for classification (supervised) based on mixture models. Extension of linear discriminant analysis The mixture of normals is used to obtain a density estimation for each class. Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Linear Discriminant Analysis Suppose we have K classes. Let the training samples be {x1 , ..., xn } with classes {z1 , ..., zn }, zi {1, ..., K }. Each class, with prior probability ak , is assumed to follow a Gaussian distribution: (xk , ). Model estimation: n I (zi = k) ak = i=1 n n i=1 xi I (zi = k) k = n i=1 I (zi = k) = n
i=1 (xi  zi )(xi  zi )t n Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Given a test sample X = x, the Bayes classification rule is: z = arg max ak (xk , ) ^
k The decision boundary is linear because is shared by all the classes. Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Mixture Discriminant Analysis A single Gaussian to model a class, as in LDA, is too restricted. Extend to a mixture of Gaussians. For class k, the withinclass density is: fk (x) =
Rk r =1 kr (xkr , ) A common covariance matrix is still assumed. Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis A 2classes example. Class 1 is a mixture of 3 normals and class 2 a mixture of 2 normals. The variances for all the normals are 3.0. Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Model Estimation The overall model is: P(X = x, Z = k) = ak fk (x) = ak
Rk r =1 kr (xkr , ) where ak is the prior probability of class k. The ML estimation of ak is the proportion of training samples in class k. EM algorithm is used to estimate kr , kr , and . Roughly speaking, we estimate a mixture of normals by EM for each individual class. needs to be estimated by combining all the classes.
http://www.stat.psu.edu/jiali Jia Li Mixture Discriminant Analysis EM iteration: Estep: for each class k, collect samples in this class and compute the posterior probabilities of all the Rk components. Suppose sample i is in class k, kr (xi kr , ) pi,r = Rk , r = 1, ..., Rk r =1 kr (xi kr , ) Mstep: compute the weighted MLEs for all the parameters. n i=1 I (zi = k)pi,r kr = n i=1 I (zi = k) kr n n i=1 xi I (zi = k)pi,r = n i=1 I (zi = k)pi,r
r =1 =
Jia Li i=1 Rzi pi,r (xi  zi r )(xi  zi r )t n http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Waveform Example Three functions h1 ( ), h2 ( ), h3 ( ) are shifted versions of each other, as shown in the figure. Each hj is specified by the equallateral right triangle function. Its values at integers = 1 21 are measured. Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis The three classes of waveforms are random convex combinations of two of these waveforms plus independent Gaussian noise. Each sample is a 21 dimensional vector containing the values of the random waveforms measured at = 1, 2, ..., 21. To generate a sample in class 1, a random number u uniformly distributed in [0, 1] and 21 random numbers 1 , 2 , ..., 21 normally distributed with mean zero and variance 1 are generated. xj = uh1 (j) + (1  u)h2 (j) + j , j = 1, ..., 21. To generate a sample in class 2, repeat the above process to generate a random number u and 21 random numbers 1 , ..., 21 and set xj = uh1 (j) + (1  u)h3 (j) + j , Class 3 vectors are generated by xj = uh2 (j) + (1  u)h3 (j) + j , j = 1, ..., 21. j = 1, ..., 21. Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Example random waveforms
6 Class 1 4 2 0 2 4 0 8 6 Class 2 4 2 0 2 0 6 4 Class 3 2 0 2 4 5 10 15 20 5 0 5 10 15 20 0 5 5 10 15 20 5 10 15 20 6 4 2 0 2 4 5 10 15 20 8 6 4 2 0 2 4 0 5 10 15 20 Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali Mixture Discriminant Analysis A three component mixture of normals is assumed for each class. The Bayes risk has been estimated to be about 0.14. MDA outperforms LDA, QDA, and CART. Training data size: 300. Test data size: 500. Ten simulations are performed. Error rates for MDA (3 components per class) and other methods are compared below. Method Training Test LDA 0.121(0.006) 0.191(0.006) QDA 0.039(0.004) 0.205(0.006) CART 0.072(0.003) 0.289(0.004) MDA 0.087(0.005) 0.169(0.006) Low dimension views are obtained from projecting on to canonical coordinates.
http://www.stat.psu.edu/jiali Jia Li Mixture Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali ...
View
Full
Document
This note was uploaded on 02/04/2012 for the course STAT 557 taught by Professor Jiali during the Fall '09 term at Penn State.
 Fall '09
 JIALI
 Statistics

Click to edit the document details