{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cse6328-w5 - Prepared by Prof Hui Jiang(CSE6328 CSE6328 3.0...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Prepared by Prof. Hui Jiang (CSE6328) 12-02-01 Dept. of CSE, York Univ. 1 CSE6328 3.0 Speech & Language Processing Prof. Hui Jiang Department of Computer Science and Engineering York University No.5 Pattern Classification (III) & Pattern Verification Model Parameter Estimation · Maximum Likelihood (ML) Estimation: ML method: most popular model estimation EM (Expected-Maximization) algorithm Examples: Univariate Gaussian distribution Multivariate Gaussian distribution Multinomial distribution Gaussian Mixture model Markov chain model: n-gram for language modeling Hidden Markov Model (HMM) · Discriminative Training Maximum Mutual Information (MMI) Minimum Classification Error (MCE) · Bayesian Model Estimation: Bayesian theory · MDI (Minimum Discrimination Information) alternative model estimation method
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Prepared by Prof. Hui Jiang (CSE6328) 12-02-01 Dept. of CSE, York Univ. 2 Discriminative Training(I): Maximum Mutual Information Estimation (1) · The model is viewed as a noisy data generation channel class id ω observation feature X. · Determine model parameters to maximize mutual information between ω and X. ( close relation between ω and X) λ 1 λ 2 λ N noisy data generation channel ω X ∑∑ ∑∑ ∑∑ ∑∑ = = = = ω ω ω ω ω ω ω ω λ λ ω ω ω ω ω ω ω ω ω ω X X X X X p X p X p X p X p X p X p X p X p X p p X p X p X I ) | ( ) | ( log ) , ( ) | ( ) | ( log ) , ( ) ( ) | ( log ) , ( ) ( ) ( ) , ( log ) , ( ) , ( 2 2 2 2 ) , ( max arg } { 1 1 X I N MMI N ω λ λ λ λ = Discriminative Training(I): Maximum Mutual Information Estimation (2) · Difficulty: joint distribution p( ω ,X) is unknown. · Solution: collect a representative training set (X 1 , ω 1 ), (X 2 , ω 2 ), …, (X T , ω T ) to approximate the joint distribution. · Optimization: Iterative gradient-ascent method Growth-transformation method ∑∑ = = = T t t t X MMI N t t N N N X p X p X p X p X p X I 1 2 2 1 ) | ( ) | ( log max arg ) | ( ) | ( log ) , ( max arg ) , ( max arg } { 1 1 1 ω ω ω λ λ ω ω ω ω λ λ λ λ λ λ λ λ ω ω λ λ
Background image of page 2
Prepared by Prof. Hui Jiang (CSE6328) 12-02-01 Dept. of CSE, York Univ. 3 Discriminative Training(II): Minimum Classification Error Estimation (1) · In a N-class pattern classification problem, given a set of training data, D={ (X 1 , ω 1 ), (X 2 , ω 2 ), …, (X T , ω T )}, estimate model parameters for all class to minimize total classification errors in D. MCE: minimize empirical classification errors · Objective function total classification errors in D For each training data, (X t , ω t ), define misclassification measure: or if d(X t , ω t )>0, incorrect classification for X t 1 error if d(X t , ω t )<=0, correct classification for X t 0 error ) | ( ) ( max ) | ( ) ( ) , ( ' ' ' t t t t t t t t t t X p p X p p X d ω ω ω ω λ ω λ ω ω + = )] | ( ) ( ln[ max )] | ( ) ( ln[ ) , ( ' ' ' t t t t t t t t t t X p p X p p X d ω ω ω ω λ ω λ ω ω + = Discriminative Training(II): Minimum Classification Error Estimation (2) · Approximate d(X t , ω t ) by a differentiable function: or η ω ω ω ω λ ω η λ ω ω / 1 ' ' ' )] | ( ) ( exp[ 1 1 ln ) | ( ) ( ) , ( + t t t t t t t t t t X p p N X p p X d η ω
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}