This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Lecture Notes: Introduction to Hidden Markov Models Introduction A Hidden Markov Model (HMM), as the name suggests, is a Markov model in which the states cannot be observed but symbols that are consumed or produced by transition are observable. A speech generation system might, for example, be implemented as a HMM and speak a word as it transitions from one state to another. Similarly a speech understanding system might “recognize” a word on a transition. In this sense HMM’s can be thought of as generative or as interpretative. The HMM is the same but in one case the transitions emit symbols (such as words) and in the other case consumes symbols. We will therefore treat observations and actions interchangeably in the foregoing. Hidden Markov Models and simple extensions of them are very popular in a variety of fields including computer vision, natural language understanding, and speech recognition and synthesis (to name a few). Often HMM’s are a natural way of modeling a system and in other cases they are “force-fit” to a problem to which they are not quite ideal. The immense popularity of HMM’s is that very fast, linear time, algorithms exist for some of the most important HMM problems. This allows, for example, speech recognition systems to operate in real-time. A HMM is defined as the four tuple <s 1 ,S,W,E> where s 1 is the start state, S is the set of states, W is the set of observation symbols, and E is the set of transitions. A transition is also a four tuple such as <s 2 ,”had”, s 3 , 0.3>. This example described a transition from state s 2 to s 3 in which the word “had” is either emitted or consumed and the probability of taking the transition is 0.3. We will usually write a transition as T(s 2 ,”had”, s 3 , 0.3) or as: 3 . ) ( 3 " " 2 = → s s P had Sometimes we do not know the starting state but we have a pdf for the starting state. We can therefore, with improved generality replace s 1 with a starting state pdf. We will continue to assume that we know the starting state for the remainder of this discussion in order to simplify examples but generalizing the starting condition to a pdf adds no additional complexity to the algorithms presented. a b “1” 0.48 “0” 0.48 “0” 0.04 “1” 1.0 a b “1” 0.48 “0” 0.48 “0” 0.04 “1” 1.0 Consider the, very simple, example below: The transitions are depicted as arcs that indicate the symbol that is consumed or emitted, in quotes, as the transition is taken and a number that indicates the probability that a transition is taken. A couple of points are worth noting at this point. First, the probabilities of all transitions from a state must sum to 1.0 and second, multiple transitions out of a state can occur with the same symbol. Looking at state “a” in the figure above shows that the symbol “0” can be consumed by two different transitions. One of them changes the state to “b” while the other leaves the system in state “a”. It is because of this that the states cannot be known...
View Full Document
- Fall '05
- U.S. state, Viterbi algorithm, Hidden Markov model, Markov models