MIT6_047f08_pset03

MIT6_047f08_pset03 - MIT OpenCourseWare http://ocw.mit.edu...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . 6.047/6.878 Fall 2008- Problem Set 3 Due: Monday October 6, 2008 at 8pm 1. Hidden Markov models and protein structure One biological application of hidden Markov models is to determine the secondary structure (i.e. the general three dimensional shape) of a protein. This general shape is made up of alpha helices, beta sheets, turns and other structures. In this problem we will assume that the amino acid composition of these regions is governed by an HMM. In order to keep this problem relatively simple, we do not use actual transition values or emission probabilities. We will use the state transition probabilities of: Alpha Helix Beta Sheet Turn Other Alpha Helix 0.7 0.1 0.2 0.0 Beta Sheet 0.2 0.6 0.0 0.2 Turn 0.3 0.3 0.1 0.3 Other 0.3 0.3 0.0 0.4 where, for example, P (Alpha Helix Turn) = . 2 . The start state is always Other and the emission probabilities are: Amino Acid Alpha Helix Beta Sheet Turn Other M 0.35 0.10 0.00 0.05 L 0.30 0.05 0.15 0.15 N 0.15 0.30 0.10 0.20 E 0.10 0.40 0.10 0.15 A 0.05 0.00 0.35 0.20 G 0.05 0.15 0.30 0.25 (a) Draw this HMM. Include the state transition probabilities and the emission probabilities for each state. (b) What is the probability of the second emitted character being an A ? (c) Give the most likely state transition path for the amino acid sequence MLAN using the Viterbi algorithm. What is P ( X,Y ) for X = MLAN and Y = (the Viterbi path)? (Include the Viterbi algorithm table in your solution.) (d) How many other paths could give rise to the same sequence MLAN ? Compute the total probability P ( X ) for X = MLAN , defined as P ( X ) = Y P ( X,Y ) (although this is not how you should compute it!). Compare this to P ( X,Y ) for the Viterbi path in part (c). What does this say about the reliability of the Viterbi path? 2. Markov chains for CpG classification In this problem, we will simulate and test the use of Markov chains for classifying DNA sequences based on nucleotide frequencies. In these Markov chains, we have four states, one for each nucleotide, and transition probabilities between the states. This is a degenerate case of an HMM in which the state fully determines the emission (thus removing the hiddenness). Page 50 of Durbin gives the following transition probabilities for the models of being inside a CpG island (M1) and outside a CpG island (M2): 1 6.047/6.878 Fall 2008 Problem Set 3 M1 A C G T A 0.180 0.274 0.426 0.120 C 0.171 0.367 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.384 0.182 M2 A C G T A 0.300 0.205 0.285 0.210 C 0.322 0.298 0.078 0.302 G 0.248 0.246 0.298 0.208 T 0.177 0.239 0.292 0.292 where, for example, P M 1 (A G) = 0 . 426 . We will additionally assume a uniform distribution among nucleotides in the initial state (that is, the chains start in each state...
View Full Document

This note was uploaded on 09/24/2010 for the course EECS 6.047 / 6. taught by Professor Manoliskellis during the Fall '08 term at MIT.

Page1 / 6

MIT6_047f08_pset03 - MIT OpenCourseWare http://ocw.mit.edu...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online