MIT6_047f08_lec08_slide08

MIT6_047f08_lec08_slide08 - MIT OpenCourseWare...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Computational Gene Prediction and Generalized Hidden Markov Models Computational Biology: Genomes, Networks, Evolution Lecture 8 September 30, 2008
Background image of page 2
Today • Gene Prediction Overview • HMMs for Gene Prediction • GHMMs for Gene Prediction • Genscan
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Genome Annotation Genome Sequence
Background image of page 4
Eukaryotic Gene Structure ATG ATG TGA TGA coding segment complete mRNA ATG GT AG GT AG . . . . . . . . . start codon stop codon donor site donor site acceptor site acceptor site exon exon exon intron intron TGA http://geneprediction.org/book/classroom.html Courtesy of William Majoros. Used with permission.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Translation
Background image of page 6
Genetic Code Each amino acid is encoded by one or more codons . Each codon encodes a single amino acid . The third position of the codon is the most likely to vary, for a given amino acid. http://geneprediction.org/book/classroom.html Figure by MIT OpenCourseWare. Acid A G M S GCA GGA ATG AGC AGT TCA TCC TCG TCT ACA AAC AAT CAC CAT H TGC TGT GAC ATA CCA GTA GTC GTG GTT TGG W Q CAA CAG AAA AAG K L CTA CTC CTG CTT TTA R AGA AGG CGA CGC CGG CGT Y TAC TAT CCC CCG CCT P V ATC ATT I GAA GAG TTC TTT GAT C D E F ACC ACG ACT T N GGC GGG GGT GCC GCG GCT Acid Codons Codons Acid Codons Acid Codons
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Gene Prediction as “Parsing” • Given a genome sequence, we wish to label each nucleotide according to the parts of genes – Exon, intron, intergenic, etc • The sequence of labels must follow the syntax of genes – e.g. exons must be followed by introns or intergenic not by other exons • We wish to find the optimal parsing of a sequence by some measure
Background image of page 8
Features A feature is any DNA subsequence of biological significance . For practical reasons, we recognize two broad classes of features: signals short, fixed-length features content regions variable-length features http://geneprediction.org/book/classroom.html Courtesy of William Majoros. Used with permission.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Signals 5’ 3’ E I E I E Associated with short fixed(-ish) length sequences
Background image of page 10
Content Regions Example Recall: often multiple codons for each amino acid All codons are not used equally Content regions often have characteristic base composition 5’ 3’ E I E I E Characteristic higher order nucleotide statistics in coding sequences (hexanucleotides) P exon (X i | X i-1 , X i-2 , X i-3 , X i-4 , X i-5 )
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Extrinsic Evidence
Background image of page 12
HMMs for Gene Prediction • States correspond to gene and genomic regions (exons, introns, intergenic, etc) • State transitions ensure legal parses • Emission matrices describe nucleotide statistics for each state
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
A (Very) Simple HMM the Markov the Markov model: model: Intergenic Intergenic Exon Exon Donor G Donor G Acceptor G Acceptor G Intron Intron q 0 q 0 Start Codon A Start Codon A Start Codon T Start Codon T Start Codon G Start Codon G Stop Codon A Stop Codon A Stop Codon T Stop Codon T Stop Codon G Stop Codon G Donor T Donor T Acceptor A Acceptor A http://geneprediction.org/book/classroom.html Courtesy of William Majoros. Used with permission.
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 50

MIT6_047f08_lec08_slide08 - MIT OpenCourseWare...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online