# Let us start by recalling that a markov chain is a

• 413

This preview shows page 135 - 137 out of 413 pages.

Let us start by recalling that a Markov chain is a random process generating a sequence S = s 1 s 2 · · · s n such that the probability of generating , or emitting , each s i is fixed beforehand and depends only on the previously generated symbols s 1 · · · s i 1 ; we will denote this probability by P ( s i | s 1 · · · s i 1 ) . An important subclass of Markov chains consists in those of finite history , namely those in which the probability of generating any symbol s i depends only on the last k previous symbols, for some constant k ; that is, P ( s i | s 1 · · · s i 1 ) = P ( s i | s i k · · · s i 1 ) ( k is called the order of the Markov chain). Consider now for concreteness a Markov chain model of order 1, that is, P ( s i | s 1 · · · s i 1 ) = P ( s i | s i 1 ) . In such a Markov chain, the probability of emitting s i depends only on the previous symbol s i 1 . In an HMM, however, the model can have different states, in which the probability of emitting s i depends instead only on the current state, which is unknown, or hidden , from an observer. Moreover, at each step, the model can also probabilistically choose to transition to another hidden state. In other words, if we observe S , we cannot specify the state of the HMM in which each s i was generated. Assuming that we know the HMM which generated S , that is, we know the possible transitions and their probabilities between the hidden states, and the emission probabilities in each state, we can instead try to find the most probable sequence of states of the HMM which generated S . We will solve this problem in Section 7.2, 113 http:/ . Downloaded from http:/ . Columbia University Libraries, on 13 Dec 2016 at 20:03:59, subject to the Cambridge Core terms of use, available at
114 Hidden Markov models (HMMs) with the Viterbi algorithm, which is in close kinship with the Bellman–Ford algorithm for finding a shortest path in a graph. Another interesting question is that of how to compute the probability that the HMM generated S , which we solve in Section 7.3 by adapting the Viterbi algorithm into the forward and backward algorithms. Finally, in Section 7.4 we discuss how we get to know the parameters of an HMM in the first place, by showing a technique for learning the transition and emission probabilities from data. 7.1 Definition and basic problems Before giving the formal definition of an HMM, let us give a concrete example for the segmentation problem mentioned above. Example 7.1 The segmentation problem In prokaryote genomes, the coding and non-coding regions have some easily recognizable statistical properties on their nucleotide content. The simplest statistical property to consider is the GC content , since coding regions typically have more G and C nucleotides than A and T . The segmentation problem is to partition a genome into coding and non-coding regions.