MSA - Computational Molecular Biology Multiple Sequence...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Computational Molecular Biology Multiple Sequence Alignment
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
My T. Thai mythai@cise.ufl.edu 2 Sequence Alignment Problem Definition: Given: 2 DNA or protein sequences Find: Best match between them What is an Alignment: Given: 2 Strings S and S’ Goal: The lengths of S and S’ are the same by inserting spaces (--; sometimes denote as ∆) into these strings A -- T C -- A -- C T C A A
Background image of page 2
My T. Thai mythai@cise.ufl.edu 3 Matches, Mismatches and Indels Match: two aligned, identical characters in an alignment Mismatch: two aligned, unequal characters Indel: A character aligned with a space A A C T A C T -- C C T A A C A C T -- -- -- -- C T C C T A C C T -- -- T A C T T T 10 matches, 2 mismatches, 7 indels
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
My T. Thai mythai@cise.ufl.edu 4 Basic Algorithmic Problem Find the alignment of the two strings that: max m where m = (# matches – mismatches – indels) Or min m where m is the SP-score of an alignment m defines the similarity of the two strings, also called Optimal Global Alignment Biologically: a mismatch represents a mutation, whereas an indel represents a historical insertion or deletion of a single character
Background image of page 4
My T. Thai mythai@cise.ufl.edu 5 Multiple Sequence Alignment Problem Definition: Similar to the sequence alignment problem but the input has more than 2 strings Challenges: NP-hard, MAX-SNP Guarantee factor: 2 – 2/k where k is the number of the input sequences. More work to reduce the time and space complexity
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Sum of Pairs Score (SP-Score) Given a finite alphabet and where ∆ denotes a space Consider k sequences over that we want to align. After an alignment, each sequence has length l A score d is assigned to each pair of letters: My T. Thai mythai@cise.ufl.edu 6 Σ } { Σ = Γ Σ
Background image of page 6
SP-Score The SP-Score of an alignment A is defined as : Consider a matrix of l columns and k rows where the rows represents the sequences and columns represent the letters SP-Score is the sum of the scores of all columns: Score of each column is the sum of the scores of all distinct unordered pairs of letters in the column Or we can view as sum of pairwise sequence alignment values. Find an (optimal) alignment to minimize the SP-Score value My T. Thai mythai@cise.ufl.edu 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Proving MSA with SP-Score that is a Metric is NP-hard My T. Thai mythai@cise.ufl.edu 8
Background image of page 8
Some Notations My T. Thai mythai@cise.ufl.edu 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Some Basic Properties Lemma 1 : Let s 1 , s 2 be two sequences over Σ such that l 1 =| s 1 |, l 2 =| s 2 |, l 2 l 1 and there are m symbols of s 1 that are not in s 2 . Then every alignment of the set { s 1 , s 2 } has at least m + l 2 - l 1 mismatches My T. Thai mythai@cise.ufl.edu 10
Background image of page 10
My T. Thai mythai@cise.ufl.edu 11
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The construction Reduce the vertex cover (or node cover) to MSA. Vertex cover:
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/20/2011 for the course CAP 5515 taught by Professor Ungor during the Spring '08 term at University of Florida.

Page1 / 86

MSA - Computational Molecular Biology Multiple Sequence...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online