This preview shows page 1. Sign up to view the full content.
Unformatted text preview: E = mn 2 − S ' 20 1/28/14 The Pvalue of a sequence alignment
The number of random HSP with score greater of equal to S follows a
Poisson distribution: P (X random HSP with score ≥ S ) = exp(− E )
(E: Evalue)
Then: EX
X! P (0 random HSP with score ≥ S ) = exp(− E )
Pval = P (at least 1 random HSP with score ≥ S ) = 1 − exp(− E )
Note: when E <<1, P ≈E The database Evalue for a sequence alignment
Database search, where database contains NS sequences
corresponding to NR residues:
1) All sequences are a priori equally likely to be related to the query: EDB = N S Kmn exp(− λS )
2) Longer sequences are more likely to be related to the query: EDB 2 = KmN R exp(− λS )
BLAST reports EDB2 Sequence Analysis
1. Why do we compare sequences?
2. Sequence comparison: from qualitative to quantitative methods
3. Deterministic methods: Dynamic programming
4. Heuristic methods: BLAST
5. Multiple Sequence Alignment
1. Concept
2. Dynamic programming
3. Heuristics 21 1/28/14 Why multiple sequence alignment? Seq1: AALGCLVKDYFPEPVTVSWNSGSeq2: VSLTCLVKGFYPSDIAVEWWSNG Why multiple sequence alignment? Seq1:
Seq2:
Seq3:
Seq4:
Seq5:
Seq6:
Seq7:
Seq8: AALGCLVKDYFPEPVTVSWNSGVSLTCLVKGFYPSDIAVEWWSNGVTISCTGSSSNIGAGNHVKWYQQLPG
VTISCTGTSSNIGSITVNWYQQLPG
LRLSCSSSGFIFSSYAMYWVRQAPG
LSLTCTVSGTSFDDYYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDGATLVCLISDFYPGAVTVAWKADS MSA: Dynamic programming?
Theoretically, it is possible to extend the dynamic programming
technique to N sequences. 22 1/28/14 MSA: Dynamic programming?
 One of the most important properties of an algorithm is how
its execution time increases as the problem is made larger.
This is the computational complexity of the algorithm
 There is a notation to describe the algorithmic complexity,
called the bigO notation.
If we have a problem of size (i.e. number of input data points)
n, then an algorithm takes O(n) time if the time increases
linearly with n.
 It is important to realize that an algorithm that is quick on
small problems may be totally useless on large problems if
it has a bad O() behavior. MSA: Dynamic programming?
Standard description of algorithms, where n is the size of the
problem, and c is a constant:
Complexity Type Compu3ng 3me for n=1000 (1 opera3on=1s) O(c) Dream… Seconds O(log(n)) Really good 10 seconds O(n) good 1000 seconds = 5 mins O(n2) Not so good 106 seconds = 11.5 days O(n3) Bad 109 seconds = 31 years O(cn) Catastrophic! Millions of years!! MSA: Dynamic programming?
Computational complexity of dynamic programming:
 Two sequences of length M : O(M2)
 Three sequences of length M: O(M3)
 N sequences of length M:
O(MN)
> dynamic programming is not a reasonable option for
aligning multiple sequences! 23 1/28/14 MSA: Approximate methods 1. Progressive global alignment
Start with the most similar sequences and builds the alignment by
adding the rest of the sequences
2. Iterative methods
Start by making alignments of small group of sequences and then
revise the alignment for better results
3. Alignment based on small conserved domains
4. Alignment based on statistical or probabilistic models of the s...
View
Full
Document
This document was uploaded on 03/12/2014 for the course CSCI 129 at UC Davis.
 Winter '14
 PatriceKoehl
 C Programming

Click to edit the document details