# hmm - Hidden Markov Model Hidden Markov Model Jia Li...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Hidden Markov Model Hidden Markov Model Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/jiali Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model generates data as follows. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model For sequence or spatial data, the assumption of independent samples is too constrained. The statistical dependence among samples may bear critical information. Examples: Speech signal Genomic sequences Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Model Setup Suppose we have a sequential data u = {u1 , u2 , ..., ut , ..., uT }, ut Rd . As in the mixture model, every ut , t = 1, ..., T , is generated by a hidden state, st . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model The underlying states follow a Markov chain. Given present, the future is independent of the past: P(st+1 | st , st-1 , ..., s0 ) = P(st+1 | st ) . Transition probabilities: ak,l = P(st+1 = l | st = k) , k, l = 1, 2, ..., M, where M is the total number of states. Initial probabilities of states: k . M l=1 ak,l = 1 for any k , M k=1 k = 1 . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model P(s1 , s2 , ..., sT ) = P(s1 )P(s2 |s1 )P(s3 |s2 ) P(sT |sT -1 ) = s1 as1 ,s2 as2 ,s3 asT -1 ,sT . Given the state st , the observation ut is independent of other observations and states. For a fixed state, the observation ut is generated according to a fixed probability law. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Given state k, the probability law of U is specified by bk (u). Discrete: suppose U takes finitely many possible values, bk (u) is specified by the pmf (probability mass function). Continuous: most often the Gaussian distribution is assumed. bk (u) = 1 (2)d | 1 exp(- (u - k )t -1 (u - k )) k 2 k| Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model In summary: P(u, s) = P(s)P(u | s) s s = s1 bs1 (u1 )as1 ,s2 bs2 (u2 ) asT -1 ,sT bsT (uT ) . P(s)P(u | s) total prob. formula P(u) = = s1 bs1 (u1 )as1 ,s2 bs2 (u2 ) asT -1 ,sT bsT (uT ) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Example Suppose we have a video sequence and would like to automatically decide whether a speaker is in a frame. Two underlying states: with a speaker (state 1) vs. without a speaker (state 2). From frame 1 to T , let st , t = 1, ..., T denotes whether there is a speaker in the frame. It does not seem appropriate to assume that st 's are independent. We may assume the state sequence follows a Markov chain. If one frame contains a speaker, it is highly likely that the next frame also contains a speaker because of the strong frame-to-frame dependence. On the other hand, a frame without a speaker is much more likely to be followed by another frame without a speaker. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model For a computer program, the states are unknown. Only features can be extracted for each frame. The features are the observation, which can be organized into a vector. The goal is to figure out the state sequence given the observed sequence of feature vectors. We expect the probability distribution of the feature vector to differ according to the state. However, these distributions may overlap, causing classification errors. By using the dependence among states, we may make better guesses of the states than guessing each state separately using only the feature vector of that frame. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Model Estimation Parameters involved: Transition probabilities: ak,l , k, l = 1, ..., M. Initial probabilities: k , k = 1, ..., M. For each state k, k , k . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Definitions Under a given set of parameters, let Lk (t) be the conditional probability of being in state k at position t given the entire observed sequence u = {u1 , u2 , ..., uT }. Lk (t) = P(st = k|u) = P(s | u)I (st = k) . s Under a given set of parameters, let Hk,l (t) be the conditional probability of being in state k at position t and being in state l at position t + 1, i.e., seeing a transition from k to l at t, given the entire observed sequence u. Hk,l (t) = P(st = k, st+1 = l|u) = P(s | u)I (st = k)I (st+1 = l) Note that Lk (t) = http://www.stat.psu.edu/jiali Jia Li M s l=1 Hk,l (t), M k=1 Lk (t) = 1. Hidden Markov Model Maximum likelihood estimation by EM: E step: Under the current set of parameters, compute Lk (t) and Hk,l (t), for k, l = 1, ..., M, t = 1, ..., T . M step: Update parameters. t=1 k = T T Lk (t)ut Lk (t) t=1 k = T t=1 T -1 t=1 Hk,l (t) ak,l = T -1 . t=1 Lk (t) Jia Li http://www.stat.psu.edu/jiali Lk (t)(ut - k )(ut - k )t T t=1 Lk (t) Hidden Markov Model Note: the initial probabilities of states k are often manually determined. We can also estimate them by k or k T t=1 Lk (t) , Lk (1) M k=1 k = 1 Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Comparison with the Mixture Model Lk (t) is playing the same role as the posterior probability of a component (state) given the observation, i.e., pt,k . Lk (t) = P(st = k|u1 , u2 , ..., ut , ..., uT ) pt,k = P(st = k|ut ) If we view a mixture model as a special hidden Markov model with the underlying state process being i.i.d (a reduced Markov chain), pt,k is exactly Lk (t). Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model The posterior probabilities pt,k in the mixture model can be determined using only sample ut because of the independent sample assumption. Lk (t) depends on the entire sequence because of the underlying Markov process. For a mixture model, we have T t=1 pt,k ut k = T t=1 pt,k k = T t=1 pt,k (ut - k )(ut T t=1 pt,k - k )t Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Derivation from EM The incomplete data are u = {ut : t = 1, ..., T }. The complete data are x = {st , ut : t = 1, ..., T }. Note Q( |) = E (log(f (x| ))|u, ). Let M = {1, 2, ..., M}. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model The function f (x | ) is f (x | ) = P(s | )P(u | s, ) = s 1 T ast-1 ,st = P(s | ak,l : k, l M)P(u | s, , : k M) k k T t=2 t=1 P(ut | t , t ) . s s We then have log f (x | ) = log(s1 ) T t=1 Jia Li http://www.stat.psu.edu/jiali + T t=2 log ast-1 ,st + log P(ut | t , t ) s s (1) Hidden Markov Model E (log f (x | )|u, ) T = P(s|u, ) log(s1 ) + log ast-1 ,st + s t=1 M T log P(ut | t , t ) s s Lk (1) log(k ) t=2 = + + k=1 T M t=1 k=1 T M M t=2 k=1 l=1 Hk,l (t) log ak,l Lk (t) log P(ut | , ) k k Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Prove the equality of the second term s P(s|u, ) = T M M t=2 k=1 l=1 T t=2 log ast-1 ,st Hk,l (t) log ak,l Similar proof applies to the equality corresponding to other terms. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model s P(s|u, ) = s T log ast-1 ,st P(s|u, ) = = s t=2 k=1 l=1 T M M t=2 k=1 l=1 T M M t=2 k=1 l=1 s T M M t=2 T M M t=2 k=1 l=1 I (st-1 = k)I (st = l) log ak,l P(s|u, )I (st-1 = k)I (st = l) log ak,l P(s|u, )I (st-1 = k)I (st = l) log ak,l = Hk,l (t) log ak,l Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model The maximization of the above expectation gives the update formulas in the M-step. Note that the optimization of , can be separated from k k that of ak,l and k . The optimization of ak,l can be separated for different k. The optimization of and is the same as for the mixture k k model with pt,k replaced by Lk (t). Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Forward-Backward Algorithm The forward-backward algorithm is used to compute Lk (t) and Hk,l (t) efficiently. The amount of computation needed is at the order of M 2 T . Memory required is at the order of MT . Define the forward probability k (t) as the joint probability of observing the first t vectors u , = 1, ..., t, and being in state k at time t. k (t) = P(u1 , u2 , ..., ut , st = k) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model This probability can be evaluated by the following recursive formula: k (1) = k bk (u1 ) M l=1 1k M l (t - 1)al,k , k (t) = bk (ut ) 1 < t T, 1 k M . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Proof: k (t) = P(u1 , u2 , ..., ut , st = k) = M X l=1 M X l=1 M X l=1 M X l=1 M X l=1 M X l=1 P(u1 , u2 , ..., ut , st = k, st-1 = l) = P(u1 , ..., ut-1 , st-1 = l) P(ut , st = k | st-1 = l, u1 , ..., ut-1 ) l (t - 1)P(ut , st = k | st-1 = l) l (t - 1)P(ut | st = k, st-1 = l) P(st = k | st-1 = l) l (t - 1)P(ut | st = k)P(st = k | st-1 = l) l (t - 1)bk (ut )al,k = = = = The fourth equality comes from the fact given st-1 , st is independent of all s , = 1, 2, ..., t - 2 and hence u , = 1, ..., t - 2. Also st is independent of ut-1 since st-1 is given. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Define the backward probability k (t) as the conditional probability of observing the vectors after time t, u , = t + 1, ..., T , given the state at time t is k. k (t) = P(ut+1 , ..., uT | st = k) , 1 t T - 1 Set k (T ) = 1 , for all k . As with the forward probability, the backward probability can be evaluated using the following recursion k (T ) = 1 k (t) = M l=1 ak,l bl (ut+1 )l (t + 1) 1t<T . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Proof: k (t) = P(ut+1 , ..., uT | st = k) = M l=1 M l=1 P(ut+1 , ..., uT , st+1 = l | st = k) P(st+1 = l | st = k)P(ut+1 , ..., uT | st+1 = l, st = k) ak,l P(ut+1 , ..., uT | st+1 = l) ak,l P(ut+1 | st+1 = l)P(ut+2 , ..., uT | st+1 = l, ut+1 ) ak,l P(ut+1 | st+1 = l)P(ut+2 , ..., uT | st+1 = l) ak,l bl (ut+1 )l (t + 1) = = M l=1 M l=1 M l=1 = = = Jia Li http://www.stat.psu.edu/jiali M l=1 Hidden Markov Model The probabilities Lk (t) and Hk,l (t) are solved by Lk (t) = P(st = k | u) = = 1 k (t)k (t) P(u) P(u, st = k) P(u) Hk,l (t) = P(st = k, st+1 = l | u) P(u, st = k, st+1 = l) = P(u) 1 = k (t)ak,l bl (ut+1 )l (t + 1) . P(u) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Proof for Lk (t): P(u, st = k) = P(u1 , ..., ut , ..., uT , st = k) = P(u1 , ..., ut , st = k)P(ut+1 , ..., uT | st = k, u1 , ..., ut ) = k (t)P(ut+1 , ..., uT | st = k) = k (t)k (t) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Proof for Hk,l (t): P(u, st = k, st+1 = l) = P(u1 , ..., ut , ..., uT , st = k, st+1 = l) = P(u1 , ..., ut , st = k) P(ut+1 , st+1 = l | st = k, u1 , ..., ut ) = k (t)P(ut+1 , st+1 = l | st = k) = k (t)P(st+1 = l | st = k) P(ut+2 , ..., uT | st+1 = l) P(ut+2 , ..., uT | st+1 = l, st = k, u1 , ..., ut+1 ) = k (t)ak,l P(ut+1 | st+1 = l)l (t + 1) = k (t)ak,l bl (ut+1 )l (t + 1) Jia Li http://www.stat.psu.edu/jiali P(ut+1 | st+1 = l, st = k)l (t + 1) Hidden Markov Model Note that the amount of computation for Lk (t) and Hk,l (t), k, l = 1, ..., M, t = 1, ..., T is at the order of M 2 T . Note: P(u) = M k=1 k (t)k (t) , for any t In particular, if we let t = T , P(u) = M k=1 k (T )k (T ) = M k=1 k (T ) . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Proof: P(u) = P(u1 , ..., ut , ..., uT ) = M k=1 P(u1 , ..., ut , ..., uT , st = k) P(u1 , ..., ut , st = k)P(ut+1 , ..., uT | st , u1 , ..., ut ) k (t)P(ut+1 , ..., uT | st ) k (t)k (t) = M k=1 M k=1 M k=1 = = Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model The Estimation Algorithm The estimation algorithm iterates the following steps: Compute the forward and backward probabilities k (t), k (t), k = 1, ..., M, t = 1, ..., T under the current set of parameters. k (1) = k bk (u1 ) k (t) = bk (ut ) M l=1 1k M l (t - 1)al,k , 1 < t T, 1 k M . k (T ) = 1 k (t) = Jia Li M l=1 ak,l bl (ut+1 )l (t + 1) 1t<T . http://www.stat.psu.edu/jiali Hidden Markov Model Compute Lk (t), Hk,l (t) using k (t), k (t). Let P(u) = M k (1)k (1). k=1 Lk (t) = 1 k (t)k (t) P(u) 1 k (t)ak,l bl (ut+1 )l (t + 1) . P(u) Hk,l (t) = Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Update the parameters using Lk (t), Hk,l (t). t=1 k = T T Lk (t)ut t=1 Lk (t) k = T t=1 Lk (t)(ut - k )(ut T t=1 Lk (t) - k )t t=1 Hk,l (t) ak,l = T -1 . Lk (t) t=1 T -1 Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Multiple Sequences If we estimate an HMM using multiple sequences, the previous estimation algorithm can be extended naturally. For brevity, let's assume all the sequences are of length T . Denote the ith sequence by ui = {ui,1 , ui,2 , ..., ui,T }, i = 1, ..., N. In each iteration, we compute the forward and backward probabilities for each sequence separately in the same way as previously described. Compute Lk (t) and Hk,l (t) separately for each sequence, also in the same way as previously described. Update parameters similarly. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Compute the forward and backward probabilities k (t), (i) k (t), k = 1, ..., M, t = 1, ..., T , i = 1, ..., N, under the current set of parameters. k (1) = k bk (ui,1 ) , 1 k M, 1 i N . k (t) = bk (ui,t ) (i) M l=1 (i) (i) l (t - 1)al,k , (i) 1 < t T , 1 k M, 1 i N . k (T ) = 1 , 1 k M, 1 i N k (t) = (i) M l=1 (i) ak,l bl (ui,t+1 )l (t + 1) (i) 1 t < T , 1 k M, 1 i N . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Compute Lk (t), Hk,l (t) using k (t), k (t). Let (i) (i) P(ui ) = M k (1)k (1). k=1 Lk (t) = (i) (i) (i) (i) (i) (i) 1 (i) (i) (t)k (t) P(ui ) k Hk,l (t) = 1 (i) (i) (t)ak,l bl (ui,t+1 )l (t + 1) . P(ui ) k Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Update the parameters using Lk (t), Hk,l (t). k = N T (i) i=1 t=1 Lk (t)ui,t N T (i) i=1 t=1 Lk (t) k = N T i=1 (i) t=1 Lk (t)(ui,t - k )(ui,t N T (i) i=1 t=1 Lk (t) - k )t ak,l = N T -1 (i) i=1 t=1 Hk,l (t) N T -1 (i) i=1 t=1 Lk (t) . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model HMM with Discrete Data Given a state k, the distribution of the data U is discrete, specified by a pmf. Assume U U = {1, 2, ...J}. Denote bk (j) = qk,j , j = 1, ..., J. Parameters in the HMM: ak,l and qk,j , k, l = 1, ..., M, j = 1, ..., J. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Model estimation by the following iteration: Compute the forward and backward probabilities k (t), k (t). Note that bk (ut ) = qk,ut . Compute Lk (t), Hk,l (t) using k (t), k (t). Update the parameters as follows: T -1 t=1 Hk,l (t) ak,l = T -1 , k, l = 1, ..., M t=1 Lk (t) t=1 qk,j = T Lk (t)I (ut = j) , k = 1, ..., M; j = 1, ..., J T t=1 Lk (t) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Viterbi Algorithm In many applications using HMM, we need to predict the state sequence s = {s1 , ..., sT } based on the observed data u = {u1 , ..., uT }. Optimization criterion: find s that maximizes P(s | u): s = arg max P(s | u) = arg max s s P(s, u) = arg max P(s, u) s P(u) This criterion is called the rule of Maximum A Posteriori (MAP). The optimal sequence {s1 , s2 , ..., sT } can be found by the Viterbi algorithm. The amount of computation in the Viterbi algorithm is at the order of M 2 T . Memory required is at the order of MT . http://www.stat.psu.edu/jiali Jia Li Hidden Markov Model The Viterbi algorithm maximizes an objective function G (s), where s = {s1 , ..., sT }, st {1, ..., M}, is a state sequence and G (s) has a special property. Brute-force optimization of G (s) involves an exhaustive search of all the M T possible sequences. Property of G (s): G (s) = g1 (s1 ) + g2 (s2 , s1 ) + g3 (s3 , s2 ) + + gT (sT , sT -1 ) The key is the objective function can be written as a sum of "merit" functions depending on one state and its preceding one. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model A Markovian kind of property: Suppose in the optimal state sequence s , the tth position st = k. To maximize G (s1 , s2 , ..., sT ), we can maximize the following two functions separately: Gt,k (s1 , ..., st-1 ) = g1 (s1 ) + g2 (s2 , s1 ) + + gt (k, st-1 ) Gt,k (st+1 , ..., sT ) = gt+1 (st+1 , k) + + gT (sT , sT -1 ) The first function involves only states before t; and the second only states after t. Also note the recursion of Gt,k (s1 , ..., st-1 ): Gt,l (s1 , ..., st-2 , k) = Gt-1,k (s1 , ..., st-2 ) + gt (l, k) . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Every state sequence s corresponds to a path from t = 1 to t = T. We put weight gt (k, l) on the link from state l at t - 1 to state k at t. At the starting node, we put weight g1 (k) for state k. G (s) is the sum of the weights on the links in path s. In the figure, suppose the colored path is the optimal one. At t = 3, this path passes through state 2. Then the sub-path before t = 3 should be the best among all paths from t = 1 to t = 3 that end at state 2. The sub-path after t = 3 should be the best among all paths from t = 3 to t = 6 that start at state 2. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model How the Viterbi Algorithm Works (Pseudocode) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Pseudocode At t = 1, for each node (state) k = 1, ..., M, record G1,k = g1 (k). At t = 2, for each node k = 1, ..., M, only need to record which node is the best preceding one. Suppose node k is linked to node l at t = 1, record l and G2,k = maxl=1,2,...,M [G1,l + g2 (k, l)] = G1,l + g2 (k, l ). The same procedure is applied successively for t = 2, 3, ..., T . At every node, link it to its best preceding one. Set Gt,k = maxl=1,2,...,M [Gt-1,l + gt (k, l)] = Gt-1,l + gt (k, l ). Gt,k is the sum of weights of the best path up to t and with the end tied at state k and l is the best preceding state. Record l and Gt,k . At the end, only M paths are formed, each ending with a different state at t = T . The objective function for a path ending at node k is GT ,k . Pick k that maximizes GT ,k . Trace the path backwards from the last state k . http://www.stat.psu.edu/jiali Jia Li Hidden Markov Model Proof for the Viterbi Algorithm Notation: Let s (t, k) be the sequence {s1 , ..., st-1 } that maximizes Gt,k (s1 , ..., st-1 ): s (t, k) = arg max Gt,k (s1 , ..., st-1 ) s1 ,...,st-1 Let Gt,k = maxs1 ,...,st-1 Gt,k (s1 , ..., st-1 ). Let (t, k) be the sequence {st+1 , ..., sT } that maximizes s t,k (st+1 , ..., sT ): G (t, k) = arg max Gt,k (st+1 , ..., sT ) s st+1 ,...,sT Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Key facts for proving the Viterbi algorithm: If the optimal state sequence s has the last state sT = k, from 1 to T - 1 should be then the subsequence of s s (T , k) and max G (s) = GT ,k (s (T , k)) . s Since we don't know what should be sT , we should compare all the possible states k = 1, ..., M: max G (s) = max GT ,k (s (T , k)) . s k Gt,k (s (t, k)) and s (t, k) can be obtained recursively for t = 1, ..., T . Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Proof for the recursion: Suppose Gt-1,k (s (t - 1, k)) and s (t - 1, k) for k = 1, ..., M have been obtained. For any l = 1, ..., M: Gt,l (s (t, l)) = s1 ,...,st-1 k k k k max Gt,l (s1 , ..., st-1 ) s1 ,...,st-2 = max max Gt,l (s1 , ..., st-2 , k) = max max (Gt-1,k (s1 , ..., st-2 ) + gt (l, k)) s1 ,...,st-2 = max(gt (l, k) + max Gt-1,k (s1 , ..., st-2 )) s1 ,...,st-2 = max(gt (l, k) + Gt-1,k (s (t - 1, k)) Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Suppose k achieves the maximum, that is, k = arg maxk (gt (l, k) + Gt-1,k (s (t - 1, k)). Then s (t, l) = {s (t - 1, k ), k }, that is, for s (t, l), the last state st-1 = k and the subsequence from position 1 to t - 2 is (t - 1, k ). s The amount of computation involved in deciding Gt,l (s (t, l)) and s (t, l) for all l = 1, ..., M is at the order of M 2 . For each l, we have to exhaust M possible k's to find k . To start the recursion, we have G1,k () = g1 (k) , s (1, k) = {} . Note: at t=1, there is no preceding state. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Optimal State Sequence for HMM We want to find the optimal state sequence s : s = arg max P(s, u) = arg max log P(s, u) s s The objective function: G (s) = = [log s1 + log bs1 (u1 )] + [log as1 ,s2 + log bs2 (u2 )] + + [log asT -1 ,sT + log bsT (uT )] If we define g1 (s1 ) = log s1 + log bs1 (u1 ) gt (st , st-1 ) = log ast ,st-1 + log bst (ut ) , T then G (s) = g1 (s1 ) + t=2 gt (st , st-1 ). Hence, the Viterbi algorithm can be applied. log P(s, u) = log[s1 bs1 (u1 )as1 ,s2 bs2 (u2 ) asT -1 ,sT bsT (uT )] Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Viterbi Training Viterbi training to HMM resembles the classification EM estimation to a mixture model. Replace "soft" classification reflected by Lk (t) and Hk,l (t) by "hard" classification. In particular: Replace the step of computing forward and backward probabilities by selecting the optimal state sequence s under the current parameters using the Viterbi algorithm. Let Lk (t) = I (st = k), i.e., Lk (t) equals 1 when the optimal state sequence is in state k at t; and zero otherwise. Similarly, let Hk,l (t) = I (st-1 = k)I (st = l). Update parameters using Lk (t) and Hk,l (t) and the same formulas. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Applications Speech recognition: Goal: identify words spoken according to speech signals Automatic voice recognition systems used by airline companies Automatic stock price reporting Raw data: voice amplitude sampled at discrete time spots (a time sequence). Input data: speech feature vectors computed at the sampling time. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Methodology: In the above example, HMM is used for "profiling". Similar ideas have been applied to genomics sequence analysis, e.g., profiling families of protein sequences by HMMs. http://www.stat.psu.edu/jiali Estimate an Hidden Markov Model (HMM) for each word, e.g., State College, San Francisco, Pittsburgh. The training provides a dictionary of models {W1 , W2 , ...}. For a new word, find the HMM that yields the maximum likelihood. Denote the sequence of feature vectors extracted for this voice signal by u = {u1 , ..., uT }. Classify to word i if Wi maximizes P(u | Wi ). M Recall that P(u) = k=1 k (T ), where k (T ) are the forward probabilities at t = T , computed using parameters specified by Wi . Jia Li Hidden Markov Model Supervised learning: Use image classification as an example. The image is segmented into man-made and natural regions. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Training data: the original images and their manually labeled segmentation. Associate each block in the image with a class label. A block is an element for the interest of learning. At each block, compute a feature vector that is anticipated to reflect the difference between the two classes (man-made vs. natural). For the purpose of classification, each image is an array of feature vectors, whose true classes are known in training. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model If we ignore the spatial dependence among the blocks, an image becomes a collection of independent samples {u1 , u2 , ..., uT }. For training data, we know the true classes {z1 , ..., zT }. Any classification algorithm can be applied. Mixture discriminant analysis: model each class by a mixture model. What if we want to take spatial dependence into consideration? Use a hidden Markov model! A 2-D HMM would be even better. Assume each class contains several states. The underlying states follow a Markov chain. We need to scan the image in a certain way, say row by row or zig-zag. This HMM is an extension of mixture discriminant analysis with spatial dependence taken into consideration. Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Details: Suppose we have M states, each belonging to a certain class. Use C (k) to denote the class state k belongs to. If a block is in a certain class, it can only exist in one of the states that belong to its class. Train the HMM using the feature vectors {u1 , u2 , ..., uT } and their classes {z1 , z2 , ..., zT }. There are some minor modifications from the training algorithm described before since no class labels are involved there. For a test image, find the optimal sequence of states {s1 , s2 , ..., sT } with maximum a posteriori probability (MAP) using the Viterbi algorithm. Map the state sequence into classes: zt = C (st ). ^ Jia Li http://www.stat.psu.edu/jiali Hidden Markov Model Unsupervised learning: Since a mixture model can be used for clustering, HMM can be used for the same purpose. The difference lies in the fact HMM takes spatial dependence into consideration. For a given number of states, fit an HMM to a sequential data. Find the optimal state sequence s by the Viterbi algorithm. Each state represents a cluster. Examples: image segmentation, etc. Jia Li http://www.stat.psu.edu/jiali ...
View Full Document

## This note was uploaded on 02/04/2012 for the course STAT 557 taught by Professor Jiali during the Fall '09 term at Penn State.

Ask a homework question - tutors are online