Unformatted text preview: A GGC AATGC AGGC Sta<s<cal Significance of alignment: Shuffling Score: 355 Shuffling a sequence: THISISTHECORRECTSEQUENCE TSTCRTQNHIHOESUCISERCEEE 13 1/28/14 Gap penalty Most common model: WN = G0 + N * G1 WN : gap penalty for a gap of size N G0 : cost of opening a gap G1 : cost of extending the gap by one N : size of the gap Global versus Local Alignment Global alignment finds the arrangement that maximizes total score Best known algorithm: Needleman and Wunsch. Local alignment identifies highest scoring subsequences, sometimes at the expense of the overall score. Best known algorithm: Smith and Waterman. Local alignment algorithm is just a variation of the global alignment algorithm! Modifica<ons for local alignment 1)  The scoring matrix has nega<ve values for mismatches 2)  The minimum score for any (i,j) in the alignment matrix is 0. 3)  The best score is found anywhere in the filled alignment matrix These 3 modifications cause the algorithm to search for matching sub-sequences which are not penalized by other regions (modif. 2), with minimal poor matches (modif 1), which can occur anywhere (modif 3). 14 1/28/14 Global versus Local Alignment Match: +1; Mismatch: - 2; Gap: - 1 A C C N S A 1 -3 -3 -3 -3 Global: C -3 2 1 -2 -2 C -3 1 3 -1 -1 T -3 -2 -1 1 0 ACCTGS ACC-NS G -3 -2 -1 0 -1 S -3 -2 -1 0 1 A C C N S ACCTGS 100000 021000 013000 000100 000001 Local: ACCTGS ACCN-S ACC ACC Sequence Analysis 1.  Why do we compare sequences? 2.  Sequence comparison: from qualitative to quantitative methods 3.  Deterministic methods: Dynamic programming 4.  Heuristics: BLAST 1.  Concept 2.  Ungapped BLAST 3.  Gapped BLAST 5.  Multiple Sequence Alignment BLAST (Basic Local Alignment Search Tool) Main ideas: 1. Construct a list of all words in the query sequence 2. Scan database for sequences that contain one or more of the query words 1. Ini<ate a local alignment for each word match between query and database Database Query sequence 15 1/28/14 Original BLAST 1.  Define dic<onary All words of length k (typically k=11) 2.  Scan database sequences for matches with alignment score ≥ T (typically T = k) 3. Generate alignment ungapped extensions un<l score below sta<s<cal threshold 4. Output all local alignments with scores above the sta<s<cal threshold … Database sequence query Original BLAST G A T A A G T A A G G T C C A G T An example: k = 4, T = 4 1)  The matching word AGGT ini<ates an alignment T T C A A C T A A G G T C C T C A Original BLAST G A T A A G T A A G G T C C A G T An example: k = 4, T = 4 1)  The matching word AGGT ini<ates an alignment 2)  Extension of the alignment to the lek and right with...
