129_Lecture5_2014

Anchor a g c g t t a g g t c c t a g t

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: no gap un<l alignment score falls below 50% T T C A A C T A A G G T C C T C A 16 1/28/14 Original BLAST G A T A A G T A A G G T C C A G T An example: k = 4, T = 4 1) The matching word AGGT ini<ates an alignment 2) Extension of the alignment to the lek and right with no gap un<l alignment score falls below 50% 3) Output: AAGTAAGGTCC AACTAAGGTCC T T C A A C T A A G G T C C T C A Gapped BLAST A C G A A G T A A G G T C C A G T An example: k = 4, T = 4 1) The matching word GGTC ini<ates an alignment A G C G T T A G G T C C T A G T C Gapped BLAST A C G A A G T A A G G T C C A G T An example: k = 4, T = 4 1) The matching word GGTC ini<ates an alignment 2) Extend alignment in a band around anchor A G C G T T A G G T C C T A G T C 17 1/28/14 Gapped BLAST A C G A A G T A A G G T C C A G T An example: k = 4, T = 4 1) The matching word GGTC ini<ates an alignment 2) Extend alignment in a band around anchor 3) Output: GTAAGGTCCAGT GTTAGGTC-AGT A G C G T T A G G T C C T A G T C BLAST Portal BLAST: Input 18 1/28/14 BLAST Parameters BLAST Results Statistics of Protein Sequence Alignment •  Statistics of global alignment: Unfortunately, not much is known! Statistics based on Monte Carlo simulations (shuffle one sequence and recompute alignment to get a distribution of scores) •  Statistics of local alignment Well understood for ungapped alignment. Same theory probably apply to gapped-alignment 19 1/28/14 Statistics of Protein Sequence Alignment What is a local alignment ? “Pair of equal length segments, one from each sequence, whose scores can not be improved by extension or trimming. These are called high-scoring pairs, or HSP” http://www.people.virginia.edu/~wrp/cshl98/Altschul/Altschul-1.html The E-value for a sequence alignment HSP scores follow an extreme value distribution, characterized by two parameters, K and λ. The expected number of HSP with score at least S is given by: -10 -8 -6 -4 -2 0 2 S 4 6 8 10 E = Kmn exp(− λS ) m, n : sequence lengths E : E-value The Bit Score of a sequence alignment Raw scores have little meaning without knowledge of the scoring scheme used for the alignment, or equivalently of the parameters K and λ. Scores can be normalized according to: S' = λS − ln (K ) ln (2 ) S’ is the bit score of the alignment. The E-value can be expressed as:...
View Full Document

This document was uploaded on 03/12/2014 for the course CSCI 129 at UC Davis.

Ask a homework question - tutors are online