129_Lecture5_2014

129_Lecture5_2014

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: E = mn 2 − S ' 20 1/28/14 The P-value of a sequence alignment The number of random HSP with score greater of equal to S follows a Poisson distribution: P (X random HSP with score ≥ S ) = exp(− E ) (E: E-value) Then: EX X! P (0 random HSP with score ≥ S ) = exp(− E ) Pval = P (at least 1 random HSP with score ≥ S ) = 1 − exp(− E ) Note: when E <<1, P ≈E The database E-value for a sequence alignment Database search, where database contains NS sequences corresponding to NR residues: 1)  All sequences are a priori equally likely to be related to the query: EDB = N S Kmn exp(− λS ) 2)  Longer sequences are more likely to be related to the query: EDB 2 = KmN R exp(− λS ) BLAST reports EDB2 Sequence Analysis 1.  Why do we compare sequences? 2.  Sequence comparison: from qualitative to quantitative methods 3.  Deterministic methods: Dynamic programming 4.  Heuristic methods: BLAST 5.  Multiple Sequence Alignment 1.  Concept 2.  Dynamic programming 3.  Heuristics 21 1/28/14 Why multiple sequence alignment? Seq1: AALGCLVKDYFPEP--VTVSWNSG--Seq2: VSLTCLVKGFYPSD--IAVEWWSNG-- Why multiple sequence alignment? Seq1: Seq2: Seq3: Seq4: Seq5: Seq6: Seq7: Seq8: AALGCLVKDYFPEP--VTVSWNSG--VSLTCLVKGFYPSD--IAVEWWSNG-VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-ATLVCLISDFYPGA--VTVAWKADS-- MSA: Dynamic programming? Theoretically, it is possible to extend the dynamic programming technique to N sequences. 22 1/28/14 MSA: Dynamic programming? - One of the most important properties of an algorithm is how its execution time increases as the problem is made larger. This is the computational complexity of the algorithm - There is a notation to describe the algorithmic complexity, called the big-O notation. If we have a problem of size (i.e. number of input data points) n, then an algorithm takes O(n) time if the time increases linearly with n. - It is important to realize that an algorithm that is quick on small problems may be totally useless on large problems if it has a bad O() behavior. MSA: Dynamic programming? Standard description of algorithms, where n is the size of the problem, and c is a constant: Complexity Type Compu3ng 3me for n=1000 (1 opera3on=1s) O(c) Dream… Seconds O(log(n)) Really good 10 seconds O(n) good 1000 seconds = 5 mins O(n2) Not so good 106 seconds = 11.5 days O(n3) Bad 109 seconds = 31 years O(cn) Catastrophic! Millions of years!! MSA: Dynamic programming? Computational complexity of dynamic programming: - Two sequences of length M : O(M2) - Three sequences of length M: O(M3) -  N sequences of length M: O(MN) -> dynamic programming is not a reasonable option for aligning multiple sequences! 23 1/28/14 MSA: Approximate methods 1.  Progressive global alignment Start with the most similar sequences and builds the alignment by adding the rest of the sequences 2.  Iterative methods Start by making alignments of small group of sequences and then revise the alignment for better results 3.  Alignment based on small conserved domains 4.  Alignment based on statistical or probabilistic models of the s...
View Full Document

This document was uploaded on 03/12/2014 for the course CSCI 129 at UC Davis.

Ask a homework question - tutors are online