Lect5 Scoring matrices

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Fa05 CSE 182 CSE182-L5:  Scoring matrices  Dictionary Matching
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa05 CSE 182 Scoring DNA DNA has structure. QuickTime and a TIFF (LZW) decompressor are needed to see this picture.
Background image of page 2
Fa05 CSE 182 DNA scoring matrices So far, we considered a simple  match/mismatch criterion. The nucleotides can be grouped into  Purines (A,G) and Pyrimidines. Nucleotide substitutions within a  group (transitions) are more likely  than those across a group  (transversions) QuickTime < and a TIFF (LZW) decompressor are needed to see this picture.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa05 CSE 182 Scoring proteins Scoring protein sequence alignments is a much more  complex task than scoring DNA Not all substitutions are equal Problem was first worked on by Pauling and collaborators In the 1970s, Margaret Dayhoff created the first  similarity matrices. “One size does not fit all” Homologous proteins which are evolutionarily close should  be scored differently than proteins that are evolutionarily  distant  Different proteins might evolve at different rates and we  need to normalize for that
Background image of page 4
Fa05 CSE 182 PAM 1 distance Two sequences are 1 PAM apart if they differ in 1 % of the  residues. • PAM 1 (a,b) = Pr[residue b substitutes residue a, when the sequences  are 1 PAM apart] 1% mismatch
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa05 CSE 182 PAM1 matrix Align many proteins that are very similar Is this a problem? PAM1 distance is the probability of a substitution when 1% of the  residues have changed • Estimate the frequency P b|a  of residue a being substituted by  residue b. • S(a,b) = log 10 (P ab /P a P b ) = log 10 (P b|a /P b )
Background image of page 6
Fa05 CSE 182 PAM 1
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa05 CSE 182 PAM distance Two sequences are 1 PAM apart when they differ in 1% of the  residues. When are 2 sequences 2 PAMs apart? 1 PAM 1 PAM 2 PAM
Background image of page 8
Fa05 CSE 182 Higher PAMs • PAM 2 (a,b) = ∑ c  PAM 1 (a,c). PAM 1  (c,b) • PAM 2  = PAM 1  * PAM 1  (Matrix multiplication) • PAM 250 – = PAM 1 *PAM 249   – = PAM 1 250
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa05 CSE 182 •S 250 (a,b) = log 10 (P ab /P a P b ) = log 10 (PAM250(b|a)/P b ) PAM250 based scoring matrix
Background image of page 10
Fa05 CSE 182 Scoring using PAM matrices Suppose we know that two sequences are 250 PAMs  apart.  S(a,b) = log 10 (P ab /P a P b )= log 10 (P b|a /P b ) =  log 10 (PAM 250 (a,b)/P b ) How does it help? S 250 (A,V) >> S 1 (A,V) Scoring of hum vs. Dros should be using a higher  PAM matrix than scoring hum vs. mus.  An alignment with a smaller % identity could still  have a higher score and be more significant  hum mus dros
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa05 CSE 182 BLOSUM series of Matrices Henikoff & Henikoff: Sequence substitutions in evolutionarily distant proteins  do not seem to follow the PAM distributions A more direct method based on hand-curated multiple alignments of distantly  related proteins from the BLOCKS database.
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/14/2008 for the course CSE 182 taught by Professor Bafna during the Fall '06 term at UCSD.

Page1 / 63

Lect5 Scoring matrices - CSE182-L5: Scoring matrices...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online