ly, the sequences of exons, splice sites and introns must have NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 10 OCTOBER 2004 different statistical properties. Let's imagine some simple differences: say that exons have a uniform base composition on average (25% each base), introns are A/T rich (say, 40% each for A/T, 10% each for C/G), and the 5′SS consensus nucleotide is almost always a G (say, 95% G and 5% A). Starting from this information, we can draw an HMM (Fig. 1). The HMM invokes three states, one for each of the three labels we might assign to a nucleotide: E (exon), 5 (5′SS) and I (intron). Each state has its own emission probabilities (shown above the states), which model the base composition of exons, introns and the consensus G at the 5′SS. Each state also has transition probabilities (arrows), the probabilities of moving from this state to a new state. The transition 1315 PRIMER © 2004 Nature Publishing Group http://www.nature.com/naturebiotechnology probabilities describe t...
