87 Pages

HMM

Course: CS 536, Fall 2009
School: Rutgers
Rating:
 
 
 
 
 

Word Count: 5440

Document Preview

Speech Digital Processing Lecture 20 The Hidden Markov Model (HMM) 1 Lecture Outline Theory of Markov Models discrete Markov processes hidden Markov processes Solutions to the Three Basic Problems of HMMs computation of observation probability determination of optimal state sequence optimal training of model Variations of elements of the HMM model types densities Implementation Issues scaling...

Register Now

Unformatted Document Excerpt

Coursehero >> New Jersey >> Rutgers >> CS 536

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Speech Digital Processing Lecture 20 The Hidden Markov Model (HMM) 1 Lecture Outline Theory of Markov Models discrete Markov processes hidden Markov processes Solutions to the Three Basic Problems of HMMs computation of observation probability determination of optimal state sequence optimal training of model Variations of elements of the HMM model types densities Implementation Issues scaling multiple observation sequences initial parameter estimates insufficient training data 2 Implementation of Isolated Word Recognizer Using HMMs Stochastic Signal Modeling Reasons for Interest: basis for theoretical description of signal processing algorithms can learn about signal source properties models work well in practice in real world applications Types of Signal Models deteministic, parametric models stochastic models 3 Discrete Markov Processes System of N distinct states, {S1, S2 ,..., SN } Time(t ) 1 State q1 Markov Property: 23 q 2 q3 4 q4 5 ... q5 ... P qt = Si | qt 1 = S j , qt 2 = Sk ,... = P qt = Si | qt 1 = S j 4 Properties of State Transition Coefficients Consider processes where state transitions are time independent, i.e., a ji = P qt = Si | qt 1 = S j , 1 i , j N a ji 0 j , i a i =1 N ji = 1 j 5 Example of Discrete Markov Process Once each day (e.g., at noon), the weather is observed and classified as being one of the following: State 1Rain (or Snow; e.g. precipitation) State 2Cloudy State 3Sunny with state transition probabilities: 0.4 0.3 0.3 A = {aij } = 0.2 0.6 0.2 0.1 0.1 0.8 6 Discrete Markov Process Problem: Given that the weather on day 1 is sunny, what is the probability (according to the model) that the weather for the next 7 days will be sunny-sunny-rainrain-sunny-cloudy-sunny? Solution: We define the observation sequence, O, as: O = {S3 , S3 , S3 , S1, S1, S3 , S2 , S3 } and we want to calculate P(O|Model). That is: P (O | Model) = P [S3 , S3 , S3 , S1, S1, S3 , S2 , S3 | Model] 7 Discrete Markov Process P (O | Model) = P [S3 , S3 , S3 , S1, S1, S3 , S2 , S3 | Model] = P [S3 ] P [S3 | S3 ] P [S1 | S3 ] P [S1 | S1 ] 2 P [S3 | S1 ] P [S2 | S3 ] P [S3 | S2 ] = 3 ( a33 ) a31a11a13a32a23 2 = 1( 0.8 ) ( 0.1)( 0.4 )( 0.3 )( 0.1)( 0.2 ) 2 = 1.536 10 04 i = P [q1 = Si ], 1 i N 8 Discrete Markov Process Problem: Given that the model is in a known state, what is the probability it stays in that state for exactly d days? Solution: O = {Si , Si , Si ,..., Si , S j Si } 123 d d +1 d 1 P (O | Model, q1 = Si ) = ( aii ) 1 d i = d pi (d ) = 1 aii d =1 (1 aii ) = pi (d ) 9 Exercise Given a single fair coin, i.e., P (H=Heads)= P (T=Tails) = 0.5, which you toss once and observe Tails: a) what is the probability that the next 10 tosses will provide the sequence {H H T H T T H T T H}? SOLUTION: For a fair coin, with independent coin tosses, the probability of any specific observation sequence of length 10 (10 tosses) is (1/2)10 since there are 210 such sequences and all are equally probable. Thus: P (H H T H T T H T T H) = (1/2)10 10 Exercise b) what is the probability that the next 10 tosses will produce the sequence {H H H H H H H H H H}? SOLUTION: Similarly: P (H H H H H H H H H H)= (1/2)10 Thus a specified run of length 10 is equally as likely as a specified run of interlaced H and T. 11 Exercise c) what is the probability that 5 of the next 10 tosses will be tails? What is the expected number of tails over the next 10 tosses? SOLUTION: The probability of 5 tails in the next 10 tosses is just the number of observation sequences with 5 tails and 5 heads (in any sequence) and this is: P (5H, 5T)=(10C5) (1/2)10 = 252/10240.25 since there are (10C5) combinations (ways of getting 5H and 5T) for 10 coin tosses, and each sequence has probability of (1/2)10 . The expected number of tails in 10 tosses is: 10 1 E (Number of T in 10 coin tosses) = d = 5 d =0 d 2 10 10 Thus, on average, there will be 5H and 5T in 10 tosses, but the probability of exactly 5H and 5T is only about 0.25. 12 Coin Toss Models A series of coin tossing experiments is performed. The number of coins is unknown; only the results of each coin toss are revealed. Thus a typical observation sequence is: O = O1O2O3 ...OT = HHTTTHTTH...H Problem: Build an HMM to explain the observation sequence. Issues: 1. What are the states in the model? 2. How many states should be used? 3. What are the state transition probabilities? 13 Coin Toss Models 14 Coin Toss Models 15 Coin Toss Models Problem: Consider an HMM representation (model ) of a coin tossing experiment. Assume a 3-state model (corresponding to 3 different coins) with probabilities: P(H) P(T) State 1 0.5 0.5 State 2 0.75 0.25 State 3 0.25 0.75 and with all state transition probabilities equal to 1/3. (Assume initial state probabilities of 1/3). a) You observe the sequence: O=H H H H T H T T T T What state sequence is most likely? What is the probability of the observation sequence and this most likely state sequence? 16 Coin Toss Problem Solution SOLUTION: Given O=HHHHTHTTTT, the most likely state sequence is the one for which the probability of each individual observation is maximum. Thus for each H, the most likely state is S2 and for each T the most likely state is S3. Thus the most likely state sequence is: S= S2 S2 S2 S2 S3 S2 S3 S3 S3 S3 The probability of O and S (given the model) is: 10 1 P (O, S | ) = (0.75) 3 10 17 Coin Toss Models b) What is the probability that the observation sequence came entirely from state 1? SOLUTION: The probability of O given that S is of the form: S = S1S1S1S1S1S1S1S1S1S1 is: 1 P (O, S | ) = (0.50)10 3 10 10 The ratio of P (O, S | ) to P (O, S | ) is: P (O, S | ) 3 = = 57.67 R= | ) 2 P (O, S 18 Coin Toss Models c) Consider the observation sequence: O = H T T HTHHTTH How would your answers to parts a and b change? SOLUTION: Given O which has the same number of H 's and T 's, the answers to parts a and b would remain the same as the most likely states occur the same number of times in both cases. 19 Coin Toss Models d) If the state transition probabilities were of the form: a11 = 0.9, a21 = 0.45, a31 = 0.45 a32 = 0.45 a33 = 0.1 a12 = 0.05, a22 = 0.1, a13 = 0.05, a23 = 0.45, i.e., a new model , how would your answers to parts a-c change? What does this suggest about the type of sequences generated by the models? 20 Coin Toss Problem Solution SOLUTION: The new probability of O and S becomes: 6 3 1 P (O, S | ) = (0.75)10 ( 0.1) ( 0.45 ) 3 The new probability of O and S becomes: 1 P (O, S | ) = (0.50)10 (0.9)9 3 The ratio is: 3 1 1 R = = 1.36 10 5 2 9 2 21 10 6 3 Coin Toss Problem Solution Now the probability of O and S is not the same as the probability of O and S. We now have: 1 P (O, S | ) = (0.75)10 (0.45)6 (0.1)3 3 1 P (O, S | ) = (0.50)10 (0.9)9 3 with the ratio: 3 1 1 R = = 1.24 10 3 2 2 9 Model , the initial model, clearly favors long runs of H ' s or T ' s, whereas model , the new model, clearly favors random sequences of H ' s and T ' s. Thus even a run of H ' s or T ' s is more likely to occur in state 1 for model , and a random sequence of H ' s and T ' s is more likely to occur in states 2 and 3 for model . 22 10 6 3 Balls in Urns Model 23 Elements of an HMM 1. N, number of states in the model i states, S = {S1, S2 ,..., SN } i state at time t , qt S 2. M, number of distinct observation symbols per state i observation symbols, V = { 1, 2 ,..., M } 3. State transition probability distribution, A = {aij }, aij = P (qt +1 = S j | qt = Si ), 1 i , j N 4. Observation symbol probability distribution in state j B = {b j (k )} 5. Initial state distribution, = { i } b j (k ) = P k at t | qt = S j , 1 j N, 1 k M i observation at time t , Ot V i = P [q1 = Si ], 1 i N 24 HMM Generator of Observations 1. Choose an initial state, q1 = Si , according to the initial state distribution, . 2. Set t = 1. 3. Choose Ot = k according to the symbol probability distribution in state Si , namely bi (k ). 4. Transit to a new state, qt +1 = S j according to the state transition probability distribution for state Si , namely aij . 5. Set t = t + 1; return to step 3 if t T ; otherwise terminate the procedure. t state observation 123456 q1 q2 q3 q4 q5 q6 O1 O2 O3 O4 O5 O6 T qT OT 25 Notation: = ( A, B, ) --HMM Three Basic HMM Problems Problem 1--Given the observation sequence, O = O1O2 ...OT , and a model observation sequence? Problem 2--Given the observation sequence, O = O1O2 ...OT , how do we choose a state sequence Q = q1q2 ...qT which is optimal in some meaningful sense? Problem 3--How do we adjust the model parameters = ( A, B, ) to maximize P (O | )? Interpretation: Problem 1--Evaluation or scoring problem. Problem 2--Learn structure problem. Problem 3--Training problem. 26 = ( A, B, ) , how do we (efficiently) compute P (O | ), the probability of the Solution to Problem 1P(O|) Consider the fixed state sequence (there are N T such sequences): Q = q1 q2 ... qT Then P (O |Q, ) = bq1 (O1 ) bq2 (O2 )... bqT (OT ) P (Q | ) = q1 aq1q2 aq2q3 ... aqT 1qT and P (O,Q | ) = P (O |Q, ) P (Q | ) Finally P (O | ) = P (O | ) = all Q P (O,Q | ) q bq (O1 ) aq q bq (O2 )... aq 1 1 12 2 T 1qT q1,q2 ,...,qT bqT (OT ) Calculations required 2T N T ; N = 5,T = 100 2 100 5100 1072 computations! 27 The Forward Procedure Consider the forward variable, t (i ), defined as the probability of the partial observation sequence (until time t ) and state Si at time t, given the model, i.e., t (i ) = P (O1O2 ...Ot , qt = Si | ) Inductively solve for t (i ) as: 1. Initialization 1(i ) = i bi (O1 ), 1 i N 2. Induction N t +1( j ) = t (i ) aij b j (Ot +1 ), i =1 3. Termination N i =1 1 t T 1, i j N P (O | ) = P (O1O2 ...OT , qT = Si | ) = T (i ) i =1 N Computation: N 2T versus 2TNT ; N = 5, T = 100 2500 versus 1072 28 The Forward Procedure 29 The Backward Algorithm Consider the backward variable, t ( i ), defined as the probability of the partial observation sequence from t + 1 to the end, given state Si at time t , and the model, i.e., t (i ) = P (Ot +1 Ot + 2 ...OT | qt = Si , ) Inductive Solution : 1. Initialization T (i ) = 1, 1 i N 2. Induction t (i ) = aij b j (Ot +1 ) t +1( j ), t = T 1,T 2,...,1, 1 i N j =1 N N 2T calculations, same as in forward case 30 Solution to Problem 2Optimal State Sequence 1. Choose states, qt , which are individually most likely maximize expected number of correct individual states 2. Choose states, qt , which are pair - wise most likely maximize expected number of correct state pairs 3. Choose states, qt , which are triple - wise most likely maximize expected number of correct state triples 4. Choose states, qt , which are T - wise most likely find the single best state sequence which maximizes P (Q,O | ) This solution is often called the Viterbi state sequence because it is found using the Viterbi algorithm. 31 Maximize Individual States We define t (i ) as the probability of being in state Si at time t , given the observation sequence, and the model, i.e., P (qt = Si , O | ) t (i ) = P (qt = Si |O, ) = P (O | ) then P (qt = Si , O | ) t (i ) = with P (q i =1 N = t = Si , O | ) t (i ) t (i ) (i ) t (i ) = Nt P (O | ) t (i ) t (i ) i =1 (i ) = 1, t i =1 t N then qt = argmax [ t (i )], 1 t T 1 i N Problem: qt need not obey state transition constraints. 32 Best State SequenceThe Viterbi Algorithm Define t (i ) as the highest probability along a single path, at time t , which accounts for the first t observations, i.e., t (i ) = max P [q1q2 ...qt 1, qt = i , O1O2 ...Ot | ] q1,q2 ,...,qt 1 We must keep track of the state sequence which gave the best path, at time t , to state i . We do this in the array t ( i ). 33 The Viterbi Algorithm Step 1- -Initialization 1(i ) = i bi (O1 ), 1 i N 1(i ) = 0, Step 2 - -Recursion 1 i N t ( j ) = max t 1(i ) aij b j (Ot ) , 2 t T , 1 j N 1 i N t ( j ) = argmax t 1(i ) aij , 1 i N 2 t T, 1 j N Step 3 - -Termination P = max [T (i )] qT = argmax [T (i )] 1 i N 1 i N Step 4 - -Path (State Sequence) Backtracking qt = t+1 qt+1 , () t = T 1,T 2,...,1 34 Calculation N 2T operations (,+) Alternative Viterbi Implementation i = log ( i ) bi (Ot ) = log bi (Ot ) aij = log aij Step 1- -Initialization 1 i N 1 i N, 1 t T 1 i, j N 1 i N 1 i N 1(i ) = log(1(i )) = i + bi (O1 ) , 1(i ) = 0, Step 2 - -Recursion t ( j ) = log( t (j))=max t 1(i ) + aij + b j (Ot ) , 2 t T , 1 j N 1 i N t ( j ) = argmax t 1(i ) + aij , 1 i N 2 t T, 1 j N Step 3 - -Termination P = max T (i ) , 1 i N 1 i N 1 i N qT = argmax T (i ) , 1 i N Step 4 - -Backtracking qt = t +1(qt+1 ), t = T 1,T 2,...,1 35 Calculation N 2T additions Problem Given the model of the coin toss experiment used earlier (i.e., 3 different coins) with probabilities: P(H) P(T) State 1 State 2 State 3 0.5 0.75 0.25 0.5 0.25 0.75 with all state transition probabilities equal to 1/3, and with initial state probabilities equal to 1/3. For the observation sequence O=H H H H T H T T T T, find the Viterbi path of maximum likelihood. 36 Problem Solution Since all aij terms are equal to 1/3, we can omit these terms (as well as the initial state probability term) giving: The recursion for t ( j ) gives ( 2 t 10 ) 1(1) = 0.5, 1(2) = 0.75, 1(3) = 0.25 2 (2) = (0.75)2 , 3 (2) = (0.75)3 , 4 (2) = (0.75)4 , 5 (2) = (0.75)4 (0.25), 2 (3) = (0.75)(0.25) 3 (3) = (0.75)2 (0.25) 4 (3) = (0.75)3 (0.25) 5 (3) = (0.75)5 6 (3) = (0.75)5 (0.25) 7 (3) = (0.75)7 8 (3) = (0.75)8 9 (3) = (0.75)9 2 (1) = (0.75)(0.5), 3 (1) = (0.75)2 (0.5), 4 (1) = (0.75)3 (0.5), 5 (1) = (0.75)4 (0.5), 6 (1) = (0.75)5 (0.5), 6 (2) = (0.75)6 , 7 (1) = (0.75)6 (0.5), 7 (2) = (0.75)6 (0.25), 8 (1) = (0.75)7 (0.5), 8 (2) = (0.75)7 (0.25), 9 (1) = (0.75)8 (0.5), 9 (2) = (0.75)8 (0.25), This leads to a diagram (trellis) of the form: 10 (1) = (0.75)9 (0.5), 10 (2) = (0.75)9 (0.25), 10 (3) = (0.75)10 37 Solution to Problem 3the Training Problem no globally optimum solution is known all solutions yield local optima can get solution via gradient techniques can use a re-estimation procedure such as the Baum-Welch or EM method consider re-estimation procedures basic idea: given a current model estimate, , compute expected values of model events, then refine the model based on the computed values E [Model Events] E [Model Events] (0) (1) (2) Define t ( i , j ), the probability of being in state Si at time t , and state S j at time t + 1, given the model and the observation sequence, i.e., t (i , j ) = P qt = Si , qt +1 = S j |O, 38 The Training Problem t (i , j ) = P qt = Si , qt +1 = S j |O, 39 The Training Problem t (i , j ) = P qt = Si , qt +1 = S j |O, t (i , j ) = P qt = Si , qt +1 = S j ,O | P (O | ) t (i ) aij b j (Ot +1 ) t +1( j ) = = P (O | ) t (i ) = t (i , j ) j =1 T 1 t =1 N t (i ) aij b j (Ot +1 ) t +1( j ) (i ) a i =1 j =1 t N N ij b j (Ot +1 ) t +1( j ) (i ) = Expected number of transitions from S t t i T 1 t =1 (i, j ) = Expected number of transitions from S to S i j 40 Re-estimation Formulas i = Expected number of times in state Si at t = 1 = 1( i ) aij = Expected number of transitions from state Si to state S j Expected number of transitions from state Si T 1 = (i , j ) t =1 T t (i ) t =1 t b j (k ) = Expected number of times in state j with symbol k Expected number of times in state j ( j ) = T ( j ) t =1 t t t =1 Ot = k T 41 Re-estimation Formulas If = ( A, B, ) is the initial model, and = A, B, is the re-estimated model, then it can be proven that either: 1. the initial model, , defines a critical point of the likelihood function, in which case = , or 2. model is more likely than model in the sense that P (O | ) > P (O | ), i.e., we have found a new model from which the observation sequence is more likely to have been produced. Conclusion: Iteratively use in place of , and repeat the re-estimation until some limiting point is reached. The resulting model is called the maximum likelihood (ML) HMM. 42 ( ) Re-estimation Formulas 1. The re-estimation formulas can be derived by maximizing the auxiliary function Q( , ) over , i.e., Q(, ) = P (O, q | )log P (O, q | q It can proved be that: max Q(, ) P (O | ) P (O | ) Eventually the likelihood function converges to a critical point 2. Relation to EM algorithm: i E (Expectation) step is the calculation of the auxiliary function, Q(, ) i M (Modification) step is the maximization over 43 Notes on Re-estimation 1. Stochastic constraints on i , aij , b j (k ) are automatically met, i.e., i =1 N i = 1, a j =1 N ij = 1, b (k ) = 1 k =1 j M 2. At the critical points of P = P (O | ), then P i i = i i = N P k k =1 k P aij aij = N = aij P aik a k =1 ik aij P b j (k ) = b j (k ) b j (k ) = M P b j (l ) b ( ) =1 j b j (k ) at critical points, the re-estimation formulas are exactly correct. 44 Variations on HMMs 1. Types of HMMmodel structures 2. Continuous observation density modelsmixtures 3. Autoregressive HMMsLPC links 4. Null transitions and tied states 5. Inclusion of explicit state duration density in HMMs 6. Optimization criterionML, MMI, MDI 45 Types of HMM 1. Ergodic models--no transient states 2. Left-right models--all transient states (except the last state) with the constraints: 1, i = 1 i = 0, i 1 aij = 0 j >i Controlled transitions implies: aij = 0, j > i + ( = 1,2 typically) 3. Mixed forms of ergodic and left-right models (e.g., parallel branches) Note: Constraints of left-right models don't affect re-estimation formulas (i.e., a parameter initially set to 0 remains at 0 during re-estimation). 46 Types of HMM Ergodic Model Left-Right Model Mixed Model 47 Continuous Observation Density HMMs Most general form of pdf with a valid re-estimation procedure is: b j ( x ) = c jm x, jm ,U jm , 1 j N m =1 M x = observation vector={ x1, x2 ,..., xD } M = number of mixture densities c jm = gain of m-th mixture in state j = any log-concave or elliptically symmetric density (e.g., a Gaussian) jm = mean vector for mixture m, state j U jm = covariance matrix for mixture m, state j c jm 0, m =1 1 j N, 1 m M 1 j N 1 j N 48 c j M jm = 1, b ( x )dx = 1, State Equivalence Chart S S S S S Equivalence of state with mixture density to multi-state single mixture case 49 Re-estimation for Mixture Densities c jk = T t =1 t T t =1 TM ( j, k ) t T t =1 k =1 t ( j, k ) t jk = ( j, k ) O ( j, k ) t =1 t T U jk = ( j , k ) (O t =1 t T t =1 t jk )(Ot jk ) t ( j, k ) i t ( j , k ) is the probability of being in state j at time t with the k -th mixture component accounting for Ot ( j ) ( j ) c (O , ,U ) t jk jk t M jk t ( j, k ) = N t t ( j )t ( j ) m=1c jm (Ot , jm ,U jm ) j =1 50 Autoregressive HMM Consider an observation vector O = ( x0 , x1,..., xK 1 ) where each xk is a waveform sample, and O represents a frame of the signal (e.g., K = 256 samples). We assume xk is related to previous samples of O by a Gaussian autoregressive process of order p, i.e., Ok = ai Ok i + ek , 0 k K 1 i =1 p where ek are Gaussian, independent, identically distributed random variables with zero mean and variance 2 , and ai ,1 i p are the autoregressive or predictor coefficients. As K , then 1 f (O ) = (2 2 )K / 2 exp 2 (O, a ) 2 where (O, a ) = ra (0)r (0) + 2 ra (i )r (i ) i =1 p 51 Autoregressive HMM ra (i ) = an an + i , (a0 = 1), 1 i p n =0 p i r (i ) = K i 1 n =0 xx n n +i , 0i p [a ] = 1, a1, a2 ,..., ap The prediction residual is: K 2 = E ( ei ) = K 2 i =1 Consider the normalized observation vector O O= = O K 2 K f (O ) = (2 ) K / 2 exp (O, a ) 2 In practice, K is replaced by K , the effective frame length, e.g., K = K / 3 for frame overlap of 3 to 1. 52 Application of Autoregressive HMM b j (0) = c jm b jm (O ) m =1 M K b jm (O ) = (2 ) K / 2 exp (O, a jm ) 2 Each mixture characterized by predictor vector or by autocorrelation vector from which predictor vector can be derived. Re-estimation formulas for r jk are: r jk = ( j, k ) r t =1 T t T t ( j, k ) t =1 t ( j ) ( j ) c b (O ) t M jk jk t t ( j, k ) = N t t ( j ) t ( j ) c jk b jk (Ot ) j =1 k =1 53 Null Transitions and Tied States Null Transitions: transitions which produce no output, and take no time, denoted by Tied States: sets up an equivalence relation between HMM parameters in different states number of independent parameters of the model reduced parameter estimation becomes simpler useful in cases where there is insufficient training data for reliable estimation of all model parameters 54 Null Transitions 55 Inclusion of Explicit State Duration Density For standard HMM's, the duration density is: pi (d ) = probability of exactly d observations in state Si = (aii )d 1(1 aii ) With arbitrary state duration density, pi (d ), observations are generated as follows: 1. an initial state, q1 = Si , is chosen according to the initial state distribution, i 2. a duration d1 is chosen according to the state duration density pq1 (d1 ) 3. observations O1 O2 ...Od1 are chosen according to the joint density bq1 (O1 O2 ...Od1 ). Generally we assume independence, so bq1 (O1 O2 ...Od1 ) = bq1 (Ot ) t =1 d1 4. the next state, q2 = S j , is chosen according to the state transition probabilities, aq1q2 , with the constraint that aq1q1 = 0, i.e., no transition back to the same state can occur. 56 Explicit State Duration Density Standard HMM HMM with explicit state duration density 57 Explicit State Duration Density t state duration 1 q1 d1 d1 + 1 q2 d2 d1 + d 2 + 1 q3 d3 Od1 +d2 +1...Od1 +d2 +d3 observations O1...Od1 Od1 +1...Od1 +d2 Assume: 1. first state, q1, begins at t = 1 2. last state, qr , ends at t = T entire duration intervals are included within the observation sequence O1 O2 ...OT Modified : t (i ) = P (O1 O2 ...Ot , Si ending at t | ) Assume r states in first t observations, i.e., Q = {q1 q2 ... qr } with qr = Si D = {d1 d 2 ... d r } with d s =1 r s =t 58 Explicit State Duration Density Then we have t (i ) = q pq (d1 )P (O1O2 ...Od | q1 ) q d 1 1 1 aq1q2 pq2 (d 2 )P (Od1 +1...Od1 +d2 | q2 )... aqr 1qr pqr (d r )P (Od1 +d2 +...+dr 1 +1...Ot | qr ) By induction: t ( j ) = t d (i ) aij p j (d ) i =1 d =1 N D s =t d +1 t b j (Os ) Initialization of t (i ) : 1(i ) = i pi (1)bi (O1 ) 2 (i ) = i pi (2) bi (Os ) + s =1 3 2 j =1, j i 2 N 1( j ) a ji pi (1) bi (O2 ) 3d ( j ) a ji pi (d ) 3 (i ) = i pi (3) bi (Os ) + s =1 d =1 j =1, j i N s = 4 d b (O ) i s 3 P (O | ) = T (i ) i =1 N 59 Explicit State Duration Density i re-estimation formulas for aij , bi (k ), and pi (d ) can be formulated and appropriately interpreted i modifications to Viterbi scoring required, i.e., t (i ) = P (O1O2 ...Ot , q1q2 ...qr = Si ending at t |O ) Basic Recursion : t t (i ) = max max t d ( j ) a ji pi (d ) b j (Os ) 1 j N , j i 1d D s = t d +1 i storage required for t 1... t D N D locations i maximization involves all terms--not just old 's and a ji as in previous case significantly larger computational load (D 2 / 2) N 2 T computations involving b j (O ) Example: N = 5, D = 20 implicit duration explicit duration storage computation 5 2500 100 500,000 60 Issues with Explicit State Duration Density 1. quality of signal modeling is often improved significantly 2. significant increase in the number of parameters per state (D duration estimates) 3. significant increase in the computation associated with probability calculation ( D 2 / 2) 4. insufficient data to give good pi (d ) estimates Alternatives : 1. use parametric state duration density pi (d ) = (d , i , i2 ) -- Gaussian d 1e d pi (d ) = i -- Gamma ( i ) i i i 2. incorporate state duration information after probability calculation, e.g., in a post-processor 61 Alternatives to ML Estimation Assume we wish to design V different HMM's, 1, 2 ,..., V . Normally we design each HMM, V , based on a training set of observations, OV , using a maximum likelihood (ML) criterion, i.e., PV = max P OV | V V Consider the mutual information, IV , between the observation sequence, OV , and the complete set of models = ( 1, 2 ,..., V ) , V V IV = log P (O | V ) log P (OV | W ) w =1 Consider maximizing IV over , giving V V I = max log P (O | V ) log P (OV | W ) w =1 i choose so as to separate the correct model, V , from all V other models, as much as possible, for the training set, OV . 62 Alternatives to ML Estimation Sum over all such training sets to give models according to an MMI criterion, i.e., V V v I = max log P (O | v log P (Ov | w ) w =1 v =1 i solution via steepest descent methods. ( ) 63 Comparison of HMMs Problem: given two HMM's, 1 and 2 , is it possible to give a measure of how similar the two models are Example : For ( A1, B1 ) equivalent ( A2 , B2 ) we require P (Ot = k ) to be the same for both models and for all symbols k . Thus we require pq + (1 p )(1 q ) = rs + (1 r )(1 s ) 2 pq p q = 2rs = r = s s= Let p + 1 2 pq r 1 2r p = 0.6, q = 0.7, r = 0.2, then 0.433 64 s = 13 / 30 Comparison of HMMs Thus the two models have very different A and B matrices, but are equivalent in the sense that all symbol probabilities (averaged over time) are the same. We generalize the concept of model distance (dis-similarity) by defining a distance measure, D(1, 2 ) between two Markov sources, 1 and 2 , as D(1, 2 ) = 1 (2) (2) log P (OT | 1 ) log P (OT | 2 ) T (2) where OT is a sequence of observations generated by model 2 , and scored by both models. We symmetrize D by using the relation: 1 DS (1, 2 ) = [D(1, 2 ) + D(2 , 1 )] 2 65 Implementation Issues for HMMs 1. Scalingto prevent underflow and/or overflow. 2. Multiple Observation Sequencesto train left-right models. 3. Initial Estimates of HMM Parametersto provide robust models. 4. Effects of Insufficient Training Data 66 Scaling i t (i ) is a sum of a large number of terms, each of t...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Rutgers - PH - 343
Ph 343Lab 3Monitoring the Suns Emission and the Telescopes Pointing Accuracy Continued IIThis part due: Thursday, October 17, 2002Purpose: In this part of the lab you will use your data to investigate the brightness of the Sun and the perform
Rutgers - PHYSICS - 397
Solar Energy 1Solar EnergyPhysics of Modern Devices February 11, 2009Solar Energy 2The SunOn a bright, sunny day, the sun shines approximately 1,000 watts of energy per square meter of the planet's surface, and if we could collect all of th
Rutgers - PHYSICS - 278
Sustainable Energy - without the hot airVersion 3.5.2. November 3, 2008. This Cover-sheet must not appear in the printed book.low-resolution edition.The quest for safe, secure and sustainable energy poses one of the most critical challenges of o
Penn State - SLF - 5134
From Pittsburgh, PA Live near the Pgh Zoo. High School: CAPA (Pittsburgh High school for the Creative and Performing Arts)Chemical Engineering Need to get away from gasoline, as it will eventually run out. I want to get involved in the new
North-West Uni. - BWR - 789
Curriculum Vitae ofBARON REEDCONTACT INFORMATION Department of Philosophy Northwestern University 1880 Campus Drive Evanston, IL 60208 phone: (847) 467-6370 email: b-reed@northwestern.edu webpage: http:/faculty.wcas.northwestern.edu/~bwr789 EMPLOY
North-West Uni. - BWR - 789
Curriculum Vitae ofBARON REEDOFFICE: Department of Philosophy Northwestern University 1880 Campus Drive Evanston, IL 60208 phone: (847) 467-6370 email: b-reed@northwestern.edu webpage: http:/www.philosophy.northwestern.edu/people/reed.htm EMPLOYME
North-West Uni. - TKC - 491
THEODORE CHRISTOV NORTHWESTERN UNIVERSITY DEPARTMENT OF POLITICAL SCIENCE 601 UNIVERSITY PLACE 402 SCOTT HALL EVANSTON, IL 60208 PHONE: 310-359-3526 E-MAIL: TCHRISTOV@NORTHWESTERN.EDUACADEMIC APPOINTMENTNorthwestern University Visiting Assista
North-West Uni. - BWR - 789
Curriculum Vitae ofBARON REEDCONTACT INFORMATION Department of Philosophy Northwestern University 1880 Campus Drive Evanston, IL 60208 phone: (847) 467-6370 email: b-reed@northwestern.edu webpage: http:/faculty.wcas.northwestern.edu/~bwr789 EMPLOY
UC Davis - MD - 448
MITALI DASDepartment of Economics, University of California-Davis, Davis CA 95616 Phone: (530)-752-2129, Fax: (530)-752-9382, Email: das@ucdavis.edu EDUCATIONPh.D. Economics, M.I.T, 1998 Dissertation: Nonparametric Instrumental Variables and Sampl
Rutgers - ECE - 572
Distributed Computing: Fault ToleranceManish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers UniversityBasic ConceptsDependability Includes Availability Reliability Safety MaintainabilityFailure Mod
Rutgers - ECE - 572
Semantic Framework for Rudder Middleware InfrastructureZhen Li The Applied Software Systems Laboratory Rutgers University Email: zhljenny@caip.rutgers.eduAbstractRudder is a generative agent-based middleware with intelligent deductive capabilitie
Rutgers - ECE - 572
Final Report for Class Project ECE 5721/10N-Body Simulation Using Tree CodesHao Tang, Student Member, IEEEHere, the simulation proceeds over time steps, each time computing the net force on every particle and thereby updating its position and
Rutgers - LECTURE - 123
http:/www.lecture123.comThe Corporate Interactive-CommunicatorRecord, Playback, Collaborate: It's That Simple Effective Web Based Information Exchange: Anytime, Anyplace!Effective information exchange can require interactions in small face-to-fac
Penn State - AEB - 214
The Discovery of Innocence and the Decline of the Death PenaltyFrank R. Baumgartner, Suzanna De Boef, and Amber E. BoydstunDepartment of Political Science The Pennsylvania State University University Park PA 16802Frankb@psu.edu Sdeboef@psu.edu
Rutgers - ECE - 452
ECE 567 - Software Engineering I2/2/2001ECE 452- Introduction to Software EngineeringRequirements Analysis Basics Lecture 6: Use Case ModelingManish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers
Rutgers - ECE - 452
Software testingECE 452 Introduction to Software EngineeringLecture 24: Software TestingManish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers University Testing is the process of Testing is the proce
Rutgers - ECE - 452
ECE452 - Confidential Self-Review Form Spring 2001 Please hand this to me personally or email it to Parashar@ece.rugters.edu -Name: Project: --Give a short description of your role on the project (i.e., analyst, designer, project leader, etc.):-Lis
Rutgers - ECE - 452
UML Sequence DiagramsUML Sequence Diagrams 4 Use Case Analysis Sequence Diagram Basics Sequence Diagram Example From Sequence Diagrams to Class DiagramsMichael JonesUML Sequence Diagrams21Use Cases Use Cases title actors goal desc
Rutgers - ECE - 452
UML Statechart DiagramsUML Statechart Diagrams 4 Statechart Diagram Semantics Statechart Diagram Notation Statechart Diagram Examples Statechart Diagram IssuesMichael Jones UML Statechart Diagrams 21Statechart Diagram Semantics What are
Rutgers - ECE - 452
ECE 567 - Software Engineering I2/26/2001ECE 452 Introduction to Software EngineeringLecture 8: SW Management, Metrics, Project EstimationManish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers Univers
Rutgers - ECE - 452
INTRODUCTION TO SOFTWARE ENG.COURSE OBJECTIVESPRACTICES IN SOFTWARE ENG. WITH EMPHASIS ON DEVELOPMENT PROCESS. SOFTWARE LIFE CYCLE, PROJECT MANAGEMENT, ANALYSIS, DESIGN, TESTING, CASE TOOLS.1PROFESSOR MANISH PARASHAR WEB SITE: http:/www.ece.r
Rutgers - ECE - 452
ECE 452- Introduction to Software EngineeringLecture 11: Object-Oriented Analysis & DesignManish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers UniversitySystem Sequence DiagramsDetermine system events
Rutgers - ECE - 452
Lecture 12: OOA & OOD -PatternsManish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers UniversityECE 452 Introduction to Software EngineeringOOSE - Assigning ResponsibilityResponsibility: .a contract or
Rutgers - ECE - 452
ECE 452- Introduction to Software EngineeringLecture 7: Distributed Software Architecture (I. Sommerville)Manish Parashar parashar@ece.rutgers.edu Department of Electrical & Computer Engineering Rutgers UniversityIan Sommerville 2000 Software Eng
UC Davis - LOG - 0508
Cal-Nev-Ha DistrictKiwanis FamilyGO West! Committee2005-2006 APPLICATIONExcited about 2005-2006 GO West!? The committee will be fun-filled, productive, and rewarding! Committee members will be working closely with the GO West! Chair to develop t
UC Davis - MATH - 125
Practice Problems Easy problems:1. The unit sphere in R3 is the set S 2 of all points (x, y, z) such that x2 + y 2 + z 2 = 1. For what points (x0 , y0 , z0 ) is it possible to nd a C 1 function z(x, y) dened near (x0 , y0 ) such that z(x0 , y0 ) = z
North-West Uni. - ATA - 454
Target's Patio Furniture Delivery StrategyOperations 454 Professor Jan Van Mieghem Kyle Allain Josh Hutto Dana Luthy Doug Roth Eric WalsworthOperations 454Kyle Allain, Josh Hutto, Dana Luthy, Doug Roth, Eric Walsworth Part 1: Case Write-UpINT
Penn State - AAW - 159
Abagail Wetzler Engineering 408 September 10, 2007 Weekly Pre-class Questions1.) In the reading A Whole New Mind For a Flat World, I agreed with this hypothetical situation because this type of interview happens in the real world. Every student sho
LSU - APPL - 003
Bloodborne Pathogens Training for the ResearcherUpdate for employees with potential exposure to blood or other potentially infectious materials (OPIM) - Safety precautions to prevent infections2008Louisiana State University A&M Louisiana AgCente
North-West Uni. - CBA - 103
APPLIED PHYSICS LETTERS 90, 141112 2007Hole-initiated multiplication in back-illuminated GaN avalanche photodiodesR. McClintock, J. L. Pau, K. Minder, C. Bayram, P. Kung, and M. RazeghiaCenter for Quantum Devices, Department of Electrical Enginee
Rutgers - CS - 672
North-West Uni. - D - 1605
Optimal Taxation in an RBC Model: A Linear-Quadratic Approach Pierpaolo Benigno New York University Michael Woodford Columbia UniversityAugust 16, 2005Abstract We reconsider the optimal taxation of income from labor and capital in the stochastic
North-West Uni. - D - 1605
Federal Reserve Bank of Minneapolis Research Department Sta Report 328 Revised April 2005Business Cycle AccountingV. V. ChariUniversity of Minnesota and Federal Reserve Bank of MinneapolisPatrick J. KehoeFederal Reserve Bank of Minneapolis and
North-West Uni. - D - 1605
Christiano 416, Fall 2005 Homework 6, Due November 18. 1. Prove the uniform taxation result for the three good static, nonmonetary model in the class notes on Ramsey policy. 2. Consider the function, f (x) = 1 , f : [5, 5] R. 1 + x2Approximate thi
North-West Uni. - D - 1600
0RQHWDU\ 3ROLF\ LQ D )LQDQFLDO &ULVLV/DZUHQFH - &KULVWLDQR &KULV *XVW DQG -RUJH 5ROGRV 4XHVWLRQ :KDW LV WKH (IIHFW RI DQ ,QWHU HVW 5DWH &XW ,Q WKH $IWHUPDWK RI D )LQDQFLDO &ULVLV" 5HGXFHG IRUP UHSUHVHQWDWLRQ RI ILQDQFLDO FULVLV VXGGHQ LPSRVLWL
North-West Uni. - D - 1606
Notes on Ramsey-Optimal Monetary PolicyLawrence J. Christiano, Roberto Motto and Massimo Rostagno January 9, 2007 Contents1 Ramsey-Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Rotemberg-Sticky Prices . . . . . . . . .
North-West Uni. - D - 1600
ht|@?L #Sc 6@* 2fff OL4iLh! 2c _i Aht_@)c Li4Mih 2 At ^it|L? * }i )L 4Lhi Th@U|Ui | |i +@4ti) *LU@|L? hLM*i4 W? @_|L?c | * i TLti )L |L |i N?uLh4 A@ @|L? +it*| ? TM*U ?@?Uic @ hit*| *!i *it @| |i i@h| Lu Lh @?@*)tt Lu |i 6hi_4@? h*i 6Lh ?Lc ) @?_ L
North-West Uni. - D - 1605
Optimal Fiscal and Monetary Policy1Background We Have Discussed the Construction and Estimation of DSGE Models Next, We Turn to Analysis Most Basic Policy Question: How Should the Policy Variables of the Government be Set? What is Optimal Po
North-West Uni. - D - 1606
Understanding the E ects of Government Spending on ConsumptionJordi Gal J.David Lpez-Salido October 2002 and Javier VallsAbstract Recent evidence on the e ects of an exogenous increase in government spending on consumption cannot be easily reconci
Rutgers - MC - 504
TETRAHEDRON LETTERSPergamonTetrahedron Letters 43 (2002) 545548The synthesis of a key intermediate en route to gelsemine: a program based on intramolecular displacement of the carbon oxygen bond of a strategic oxetaneFay W. Ng,a Hong Lin,b Qian
Rutgers - CHEM - 308
Chapter 18 Enols, Enolates, AldolInstructor: Dr. Daniel SeidelThe pKa values of the -hydrogens of aldehydes and ketones range from 16 to 21, comparable to those of alcohols (15-18). Strong bases can remove hydrogens leading to anions called eno
Rutgers - CHEM - 308
308 FINAL EXAM V1 SPRING 20081_ PRINT NAME MULTIPLE CHOICE: 4 POINTS EACH 1. Which is the major product of the following reaction?1.) OHO H3 C O C H + O C CH3 2.) H 3O+ / HeatH3 C A. H3 C C CH OO C H B. CH3 OOH CH CH2O CO C. CH3 O CH C
LSU - APPL - 003
POLI 7974 State & SocietyPOLITICAL SOCIO-ECONOMICS STATES, MARKETS, AND SOCIETIESTuesday 6:10 9:00 pm, spring 2009 Wonik Kim, wkim@lsu.edu, 225-578-5354 OH: 4:00 5:30 pm on Wednesday, or by appointment Stubbs 229, Department of Political Science
Penn State - BPB - 144
Teacher: Brian Burn Date: 10/29/05 Lesson #: 4.7of 4.10 Paul Briczinski Unit Topic: Muscular Strength and Endurance Lesson Topic: Mission Push-Ups PossibleLesson Time: 11:15 # of Students: 15 Grade: 7thAligns with National Standard: 3: Participat
UC Davis - ARE - 150
Paul W. Bertuccio 9 ALRB No. 61Hollister, CaliforniaSTATE OF CALIFORNIA AGRICULTURAL LABOR RELATIONS BOARDPAUL W. BERTUCCIO, Respondent, i and ; UNITED FARM WORKERS OF AMERICA, AFL-CIO, Charging Party. ; i )CaseNOS.79-CE-140-SAL 79-CE-196
North-West Uni. - ASTR - 220
Astron 220Introduction to Astrophysics Lecture 10Bart Willems Spring 2008Astron 220Chapter 7 Special relativityElectromagnetic wavesSolution of Maxwell's equations yield two wave equations describing the propagation of electromagnetic wave
North-West Uni. - SCO - 590
Leveraging Graphics Hardware for Vision Based Human Computer InteractionSven Olsen ECE 432 September 27, 2005Abstract We present a system which allows users to draw on arbitrary display surfaces. The system is implemented using consumer electroni
North-West Uni. - CG - 207
Taken from Advances in Cognitive Science (1986)Ch . 81FROM CA TO DMAP1378From Conceptual Analyzer to Direct Memory Access Parsing : An Overviewstructures is neither unique to parsing (we are always remembering uses of memory - that's why m
Rutgers - PHYSICS - 681
University of Texas - CS - 327
Chapter 34Data Mining Transparencies Pearson Education Limited 1995, 20051Chapter 34 - Objectivesx xxThe concepts associated with data mining. The main features of data mining operations, including predictive modeling, database segmentati
North-West Uni. - CMO - 938
1%(5 :25.,1* 3$3(5 6(5,(67+( (92/87,21 2) (03/2<0(17 5(/$7,216 ,1 86 $1' -$3$1(6( 0$18)$&785,1* ),506 $ &203$5$7,9( +,6725,&$/ $1' ,167,787,21$/ $1$/<6,6 &KLDNL 0RULJX.KL :RUNLQJ 3DSHU KWWSZZZQEHURUJSDSHUVZ 1$7,21$/ %85($8 2) (&2120,& 5(6($5&+ 0
North-West Uni. - WOL - 737
FalsifiabilityWojciech Olszewski and Alvaro Sandroni April 9, 2008Abstract We examine the fundamental concept of Popper's falsifiability within an economic model in which a tester hires a potential expert to produce a theory. Payments are made con
Penn State - MMS - 5119
Toftrees Loop322Lon berger Pa t hOl d La urel Run Trai lElevation amplified by a factor of threeShingletown Ga p Tra i lBUS 322263223224526BUS 32232245Local Mountain Biking around State CollegeElevation (ft)2460 2170 190
Penn State - MMS - 5119
A Comparison of Crime Data from Detroit and Eastern Michigan from 2001 and 2002What a Difference a Year MakesNumber of Aggravated Assaults in 2001Per 10,000 people0.0 - 4.0 4.0 - 7.5 7.5 - 13.0 13.0 - 19.3 19.3 - 21.2 21.2 - 33.4 33.4 - 68.0 68.
Penn State - MMS - 5119
Aspens and Wildfire1 29N 0eeChenCereeLan ss eCykRunCreekkre re H i lls Ck26 29 30 27 2828Gld32315 N5sCreekGillman Basin Site 2E lyAspen Flat Site 3 Aspen Flat Site 416 17sLig ht
University of Texas - CS - 303
Solutions for the Sample Exam 3 - CS 303e1. this2. (s.length() = 1) & Character.isLetter(s.charAt(0)3. we didn't cover command line arguments.4. s instanceof Square5. str.indexOf("great") >= 06. A method name is overloaded if there are tw
North-West Uni. - ME - 381
NORTHWESTERN UNIVERSITY MECHANICAL ENGINEERING DEPARTMENT ME 381 Introduction to MEMS Prof. Horacio D. EspinosaFINAL PROJECTMicromachined Vibrating Gyroscopes: Design and FabricationKimberly S. Elliott Parag Gupta Kyle B. Reed Raquel C. Rodrigu
North-West Uni. - ME - 382
Biomedical Microdevices 4:1, 1726, 2002 # 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.Concentration Effects of a Biopolymer in a Microuidic DeviceBioengineering Program, 2Department of Bioengineering, and Department of Chemica
North-West Uni. - ME - 382
ISSUES IN NANOTECHNOLOGYFrom Micro- to Nanofabrication with Soft MaterialsStephen R. Quake* and Axel SchererSoft materials are nding applications in areas ranging from microuidic device technology to nanofabrication. We review recent work in thes
MN State - ECON - 411
October 1995, The Atlantic MonthlyIf the GDP is Up, Why is America Down?Why we need new measures of progress, why we do not have them, and how they would change the social and political landscape by Clifford Cobb, Ted Halstead, and Jonathan Rowe T
MN State - ECON - 416