lec09 - RNA Search and Motif Discovery CSE 427 Winter 2008...

Info iconThis preview shows pages 1–17. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: RNA Search and Motif Discovery CSE 427 Winter 2008 Outline Task 1: RNA 2 ary Structure Prediction (last time) Task 2: RNA Motif Models Covariance Models Training & Mutual Information Task 3: Search Rigorous & heuristic Fltering Task 4: Motif discovery Task 2: Motif Description How to model an RNA Motif? Conceptually, start with a proFle HMM: from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model all G mostly G del ins How to model an RNA Motif? Add column pairs and pair emission probabilities for base-paired regions paired columns <<<<<<< >>>>>>> RNA Motif Models Covariance Models (Eddy & Durbin 1994) aka proFle stochastic context-free grammars aka hidden Markov models on steroids Model position-speciFc nucleotide preferences and base-pair preferences Pro: accurate Con: model building hard, search sloooow RNA sequence analysis using covariance models Eddy & Durbin Nucleic Acids Research, 1994 vol 22 #11, 2079-2088 (see also, Ch 10 of Durbin et al .) What A probabilistic model for RNA families The Covariance Model A Stochastic Context-Free Grammar A generalization of a prole HMM Algorithms for Training From aligned or unaligned sequences Automates comparative analysis Complements Nusinov/Zucker RNA folding Algorithms for searching Main Results Very accurate search for tRNA (Precursor to tRNAscanSE - current favorite) Given sufFcient data, model construction comparable to, but not quite as good as, human experts Some quantitative info on importance of pseudoknots and other tertiary features Probabilistic Model Search As with HMMs, given a sequence, you calculate likelihood ratio that the model could generate the sequence, vs a background model You set a score threshold Anything above threshold a hit Scoring: Forward / Inside algorithm - sum over all paths Viterbi approximation - fnd single best path (Bonus: alignment & structure prediction) Example: searching for tRNAs Alignment Quality Comparison to TRNASCAN Fichant & Burks - best heuristic then 97.5% true positive 0.37 false positives per MB CM A1415 (trained on trusted alignment) > 99.98% true positives <0.2 false positives per MB Current method-of-choice is tRNAscanSE, a CM- based scan with heuristic pre-ltering (including TRNASCAN?) for performance reasons. Slightly different evaluation criteria M j : Match states (20 emission probabilities) I j : Insert states (Background emission probabilities) D j : Delete states (silent - no emission) Profle Hmm Structure CM Structure A: Sequence + structure B: the CM guide tree C: probabilities of letters/ pairs & of indels Think of each branch being an HMM emitting both sides of a helix (but 3 side emitted in reverse order) Overall CM Architecture One box (node) per node of guide tree BEG/MATL/INS/DEL just like an HMM...
View Full Document

This note was uploaded on 04/22/2008 for the course CSC 427 taught by Professor Ruzzo during the Winter '08 term at University of Washington.

Page1 / 92

lec09 - RNA Search and Motif Discovery CSE 427 Winter 2008...

This preview shows document pages 1 - 17. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online