MIT6_047f08_lec06_note06

MIT6_047f08_lec06_note06 - MIT OpenCourseWare...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . 6.047/6.878 Lecture 6: HMMs I September 24, 2008 1 Introduction To this point, the class has dealt mainly with known, characterized gene sequences. We have learned several methods of comparing genes in hopes of quantifying divergence, highlight- ing similar gene segments which have been preserved among species, and finding optimal alignments of sequences such that we are better able to understand the ”evolution” of the sequence. We have used BLAST, hashing, projections and neighborhood search to find se- quences in a database. Having learned these methods, we now can begin to address the question: 1.1 What do we do with a new found piece of DNA? The first approach is usually to compare the DNA to known sequences, for which the meth- ods mentioned work well. But many times the new sequence cannot be matched to genes within a database. Then, one could compare several unknown sequences and try to group them for later use. All these technics build on comparisons of DNA sequences and on the perseverance of homologous parts. But why not take an orthogonal approach and try to just understand the given piece itself better. This could involve analyzing the sequence for unusual properties like k-mer frequencies and motifs or recurring patterns. Here we want to explore the more constructive approach of modeling properties in a generative model . 1 1.2 Modeling! Modeling involves making the hypotheses about the sequence, that it is a concatenation of distinct functional subsequences. These subsequences have similar properties and can be classified as a small number of different types . For example there are several biological sequence types, such as promoters, first exons, and intergenic sequences. One distinctive feature of these types are different frequencies of the four bases. In this lecture we focus on this property and assume that the likelihood of a base occurring in a sequence is only dependent on the type of its subsequence (and not e.g. on predescending or following bases). Before giving a more formal definition of this model lets see what we can use it for: • We can generate ”typical” DNA sequences (of a certain type) • By comparing how likely it is that a given sequence is generated by one or another type we can interfere types of unknown sequences. • Allowing transitions between certain types with given probabilities allows us in addition to interfering the types of subsequences to interfere the splitting into type-sequences of a whole sequence • Using (annotated) examples we can train our model to most likely generate the existing sequences and thus learn characteristics of types 1.3 Why using a probabilistic model?...
View Full Document

This note was uploaded on 09/24/2010 for the course EECS 6.047 / 6. taught by Professor Manoliskellis during the Fall '08 term at MIT.

Page1 / 16

MIT6_047f08_lec06_note06 - MIT OpenCourseWare...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online