08-cs481-global_alignment-local_alignment.pdf - MORE ON...

• 39

This preview shows page 1 - 9 out of 39 pages.

MORE ON PAIRWISE ALIGNMENT
From LCS to Alignment: Change up the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence alignment allows only insertions and deletions (no mismatches). In the LCS Problem, we scored 1 for matches and 0 for indels Consider penalizing indels and mismatches with negative scores Simplest scoring schema : +1 : match premium - μ : mismatch penalty - σ : indel penalty
Simple Scoring When mismatches are penalized by –μ , indels are penalized by –σ , and matches are rewarded with +1 , the resulting score is: #matches – μ ( #mismatches) σ ( #indels)
The Global Alignment Problem Find the best alignment between two strings under a given scoring schema Input : Strings v and w and a scoring schema Output : Alignment of maximum score ↑→ = - σ = 1 if match (or +score) = - µ if mismatch Initialize: S 0,0 = 0 S i,0 = i* σ , S 0,j = j* σ s i-1,j-1 +1 if v i = w j s i,j = max s i-1,j-1 if v i ≠ w j s i-1,j - σ s i,j-1 - σ { : mismatch penalty σ : indel penalty Needleman-Wunsch algorithm
Percent Sequence Identity The extent to which two nucleotide or amino acid sequences are invariant A C C T G A G A G A C G T G G C A G Alignment length = 10 Matches = 7 70% identical mismatch indel
Scoring Matrices To generalize scoring, consider a (4+1) x(4+1) scoring matrix δ . In the case of an amino acid sequence alignment, the scoring matrix would be a (20+1)x(20+1) size. The addition of 1 is to include the score for comparison of a gap character “ - ”. This will simplify the algorithm as follows: s i-1,j-1 + δ (v i , w j ) s i,j = max s i-1,j + δ (v i , -) s i,j-1 + δ (-, w j )
Making a Scoring Matrix Scoring matrices are created based on biological evidence. Alignments can be thought of as two sequences that differ due to mutations. Some of these mutations have little effect on the protein’s function, therefore some penalties, δ (v i , w j ), will be less harsh than others.
Scoring Matrix: Example A R N K A 5 -2 -1 -1 R - 7 -1 3 N - - 7 0 K - - - 6 Notice that although R and K are different amino acids, they have a positive score.