MIT6_047f08_lec08_slide08

Xn lki c1 duration 2 previous d state1 c2 duration 3

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Inference • Genscan uses same basic forward, backward, and viterbi algorithms as generic GHMMs • But assumptions about C, and D states reduce algorithmic complexity Genscan Inference – C-state List State 1 D states 2 Vk(i) C states K C1 C2 x1 x2 x3 ………………………………………..xN Lk(i) C1, duration 2, previous D-state=1 C2, duration 3, previous D-state=2 Genscan Viterbi Induction Max from previous step in this state Extend D-type state by one step (factorable state) Probability of emiting one more nucleotide from state k Was in state K in previous step Vk ( i +1) = max [ Vk ( i ) ⋅ pk ⋅ ek ( Xi+1), max {Vyc ( i − d −1) ⋅ P(yc|c)(1-pyc ) • P(d|c)P(Xi-d ..Xi |c) • P(k|c) ⋅ pk ⋅ ek ( Xi+1)}] c ∈ c-type states Max probability of ending D type state yc at position i-d-1 Probability of transition from yc to c Probability that C state was duration d Probability of subsequence of length d from state c Probability of transition from c to k Probability of nucleotide from state k Just transitioned from c-type state c of duration d which previously transitioned from D-type state yc Terminate Dtype state length Training A Gene Predictor During training of a gene finder, only a subset K of an organism’s gene set will be available for training: The gene finder will later be deployed for use in predicting the rest of the organism’s genes. The way in which the model parameters are inferred during training can significantly affect the accuracy of the deployed program. Courtesy of William Majoros. Used with permission. http://geneprediction.org/book/classroom.html Training A Gene Predictor Courtesy of William Majoros. Used with permission. http://geneprediction.org/book/classroom.html Gene Prediction Accuracy Gene predictions can be evaluated in terms of true positives (predicted features that are real), true negatives (non-predicted features that are not real), false positives (predicted features that are not real), and false negatives (real features that were not predicted: These definitions can be applied at the whole-gene, whole-exon, or individual nucleotide level to arrive at three sets of statistics. Courtesy of William Majoros. Used with permission. http://geneprediction.org/book/classroom.html Accuracy Metrics TP Sn = TP + FN TP Sp = TP + FP 2 × Sn × Sp F= Sn + Sp TP + TN SMC = TP + TN + FP + FN CC = (TP × TN ) − (FN × FP ) (TP + FN ) × (TN + FP ) × (TP + FP ) × (TN + FN ) . TP TN TN 1 ⎛ TP + + + ACP = ⎜ n ⎝ TP + FN TP + FP TN + FP TN + FN ⎞ ⎟, ⎠ AC = 2( ACP − 0.5). Courtesy of William Majoros. Used with permission. http://geneprediction.org/book/classroom.html More Information • http://genes.mit.edu/burgelab/links.html • http://www.geneprediction.org/book/clas sroom.html...
View Full Document

Ask a homework question - tutors are online