CSE
jurafsky&martin_3rdEd_17 (1).pdf

ˆ b j v k p t t 1 s t o t v k g t j p t t 1 g t j

Info icon This preview shows pages 139–141. Sign up to view the full content.

”): ˆ b j ( v k ) = P T t = 1 s . t . O t = v k g t ( j ) P T t = 1 g t ( j ) (9.43) We now have ways in Eq. 9.38 and Eq. 9.43 to re-estimate the transition A and observation B probabilities from an observation sequence O , assuming that we al- ready have a previous estimate of A and B . These re-estimations form the core of the iterative forward-backward algorithm. The forward-backward algorithm (Fig. 9.16 ) starts with some initial estimate of the HMM parameters l = ( A , B ) . We then iteratively run two steps. Like other cases of the EM (expectation-maximization) algorithm, the forward-backward algorithm has two steps: the expectation step, or E-step , and the maximization step, or M-step . E-step M-step In the E-step, we compute the expected state occupancy count g and the expected state transition count x from the earlier A and B probabilities. In the M-step, we use g and x to recompute new A and B probabilities. function F ORWARD -B ACKWARD ( observations of len T , output vocabulary V , hidden state set Q ) returns HMM=(A,B) initialize A and B iterate until convergence E-step g t ( j ) = a t ( j ) b t ( j ) a T ( q F ) 8 t and j x t ( i , j ) = a t ( i ) a i j b j ( o t + 1 ) b t + 1 ( j ) a T ( q F ) 8 t , i , and j M-step ˆ a i j = T - 1 X t = 1 x t ( i , j ) T - 1 X t = 1 N X k = 1 x t ( i , k ) ˆ b j ( v k ) = T X t = 1 s . t . O t = v k g t ( j ) T X t = 1 g t ( j ) return A , B Figure 9.16 The forward-backward algorithm. Although in principle the forward-backward algorithm can do completely unsu- pervised learning of the A and B parameters, in practice the initial conditions are very important. For this reason the algorithm is often given extra information. For example, for speech recognition, in practice the HMM structure is often set by hand, and only the emission ( B ) and (non-zero) A transition probabilities are trained from a set of observation sequences O . Section ?? in Chapter 31 also discusses how initial A and B estimates are derived in speech recognition. We also show that for speech the forward-backward algorithm can be extended to inputs that are non-discrete (“con- tinuous observation densities”).
Image of page 139

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

140 C HAPTER 9 H IDDEN M ARKOV M ODELS 9.6 Summary This chapter introduced the hidden Markov model for probabilistic sequence clas- sification . Hidden Markov models ( HMMs ) are a way of relating a sequence of obser- vations to a sequence of hidden classes or hidden states that explain the observations. The process of discovering the sequence of hidden states, given the sequence of observations, is known as decoding or inference . The Viterbi algorithm is commonly used for decoding. The parameters of an HMM are the A transition probability matrix and the B observation likelihood matrix. Both can be trained with the Baum-Welch or forward-backward algorithm. Bibliographical and Historical Notes As we discussed at the end of Chapter 4, Markov chains were first used by Markov (1913, 2006) , to predict whether an upcoming letter in Pushkin’s Eugene Onegin would be a vowel or a consonant.
Image of page 140
Image of page 141
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern