{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

MIT6_047f08_pset03

# MIT6_047f08_pset03 - MIT OpenCourseWare http/ocw.mit.edu...

This preview shows pages 1–4. Sign up to view the full content.

MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6.047/6.878 Fall 2008 - Problem Set 3 Due: Monday October 6, 2008 at 8pm 1. Hidden Markov models and protein structure One biological application of hidden Markov models is to determine the secondary structure (i.e. the general three dimensional shape) of a protein. This general shape is made up of alpha helices, beta sheets, turns and other structures. In this problem we will assume that the amino acid composition of these regions is governed by an HMM. In order to keep this problem relatively simple, we do not use actual transition values or emission probabilities. We will use the state transition probabilities of: Alpha Helix Beta Sheet Turn Other Alpha Helix 0.7 0.1 0.2 0.0 Beta Sheet 0.2 0.6 0.0 0.2 Turn 0.3 0.3 0.1 0.3 Other 0.3 0.3 0.0 0.4 where, for example, P (Alpha Helix Turn) = 0 . 2 . The start state is always Other and the emission probabilities are: Amino Acid Alpha Helix Beta Sheet Turn Other M 0.35 0.10 0.00 0.05 L 0.30 0.05 0.15 0.15 N 0.15 0.30 0.10 0.20 E 0.10 0.40 0.10 0.15 A 0.05 0.00 0.35 0.20 G 0.05 0.15 0.30 0.25 (a) Draw this HMM. Include the state transition probabilities and the emission probabilities for each state. (b) What is the probability of the second emitted character being an A ? (c) Give the most likely state transition path for the amino acid sequence MLAN using the Viterbi algorithm. What is P ( X, Y ) for X = MLAN and Y = (the Viterbi path)? (Include the Viterbi algorithm table in your solution.) (d) How many other paths could give rise to the same sequence MLAN ? Compute the total probability P ( X ) for X = MLAN , defined as P ( X ) = Y P ( X, Y ) (although this is not how you should compute it!). Compare this to P ( X, Y ) for the Viterbi path in part (c). What does this say about the reliability of the Viterbi path? 2. Markov chains for CpG classification In this problem, we will simulate and test the use of Markov chains for classifying DNA sequences based on nucleotide frequencies. In these Markov chains, we have four states, one for each nucleotide, and transition probabilities between the states. This is a degenerate case of an HMM in which the state fully determines the emission (thus removing the “hiddenness”). Page 50 of Durbin gives the following transition probabilities for the models of being inside a CpG island (M1) and outside a CpG island (M2): 1
6.047/6.878 Fall 2008 Problem Set 3 M1 A C G T A 0.180 0.274 0.426 0.120 C 0.171 0.367 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.384 0.182 M2 A C G T A 0.300 0.205 0.285 0.210 C 0.322 0.298 0.078 0.302 G 0.248 0.246 0.298 0.208 T 0.177 0.239 0.292 0.292 where, for example, P M 1 (A G) = 0 . 426 . We will additionally assume a uniform distribution among nucleotides in the initial state (that is, the chains start in each state with probability 0.25). Write code to (1) generate random sequences from Markov chains defined by M1 and M2 and (2) find the probability of a given sequence being generated by M1 or M2.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.
• Fall '08
• ManolisKellis
• Markov chain, Markov chain Monte Carlo, Hidden Markov model, Markov models, Markov information source, Low-GC regions

{[ snackBarMessage ]}

### Page1 / 6

MIT6_047f08_pset03 - MIT OpenCourseWare http/ocw.mit.edu...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online