# MSA - Computational Molecular Biology Multiple Sequence...

This preview shows pages 1–13. Sign up to view the full content.

Computational Molecular Biology Multiple Sequence Alignment

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
My T. Thai mythai@cise.ufl.edu 2 Sequence Alignment Problem Definition: Given: 2 DNA or protein sequences Find: Best match between them What is an Alignment: Given: 2 Strings S and S’ Goal: The lengths of S and S’ are the same by inserting spaces (--; sometimes denote as ∆) into these strings A -- T C -- A -- C T C A A
My T. Thai mythai@cise.ufl.edu 3 Matches, Mismatches and Indels Match: two aligned, identical characters in an alignment Mismatch: two aligned, unequal characters Indel: A character aligned with a space A A C T A C T -- C C T A A C A C T -- -- -- -- C T C C T A C C T -- -- T A C T T T 10 matches, 2 mismatches, 7 indels

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
My T. Thai mythai@cise.ufl.edu 4 Basic Algorithmic Problem Find the alignment of the two strings that: max m where m = (# matches – mismatches – indels) Or min m where m is the SP-score of an alignment m defines the similarity of the two strings, also called Optimal Global Alignment Biologically: a mismatch represents a mutation, whereas an indel represents a historical insertion or deletion of a single character
My T. Thai mythai@cise.ufl.edu 5 Multiple Sequence Alignment Problem Definition: Similar to the sequence alignment problem but the input has more than 2 strings Challenges: NP-hard, MAX-SNP Guarantee factor: 2 – 2/k where k is the number of the input sequences. More work to reduce the time and space complexity

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Sum of Pairs Score (SP-Score) Given a finite alphabet and where ∆ denotes a space Consider k sequences over that we want to align. After an alignment, each sequence has length l A score d is assigned to each pair of letters: My T. Thai mythai@cise.ufl.edu 6 Σ } { Σ = Γ Σ
SP-Score The SP-Score of an alignment A is defined as : Consider a matrix of l columns and k rows where the rows represents the sequences and columns represent the letters SP-Score is the sum of the scores of all columns: Score of each column is the sum of the scores of all distinct unordered pairs of letters in the column Or we can view as sum of pairwise sequence alignment values. Find an (optimal) alignment to minimize the SP-Score value My T. Thai mythai@cise.ufl.edu 7

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Proving MSA with SP-Score that is a Metric is NP-hard My T. Thai mythai@cise.ufl.edu 8
Some Notations My T. Thai mythai@cise.ufl.edu 9

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Some Basic Properties Lemma 1 : Let s 1 , s 2 be two sequences over Σ such that l 1 =| s 1 |, l 2 =| s 2 |, l 2 l 1 and there are m symbols of s 1 that are not in s 2 . Then every alignment of the set { s 1 , s 2 } has at least m + l 2 - l 1 mismatches My T. Thai mythai@cise.ufl.edu 10
My T. Thai mythai@cise.ufl.edu 11

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The construction Reduce the vertex cover (or node cover) to MSA. Vertex cover:
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 05/20/2011 for the course CAP 5515 taught by Professor Ungor during the Spring '08 term at University of Florida.

### Page1 / 86

MSA - Computational Molecular Biology Multiple Sequence...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online