9 Scoring Matrices

9 Scoring Matrices - 1 Introduction to Bioinformatics/...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Introduction to Bioinformatics/ Elements of Bioinformatics Scoring Matrices
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Reference Mount, D.W. (2004) Bioinformatics: Sequence and Genome Analysis. 2 nd ed. Cold Spring Harbor Lab. Press, N.Y. Chapter 3. Baxevanis, A.D., and Ouellette, B.F.F. (2005) Bioinformatics - A practical guide to the analysis of genes and proteins (3 rd ed). John Wiley and Sons, NY. Chapter 11. • Eddy S. R. (2004) Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22: 1035-1306. • Dayhoff, M.O. (1978) Atlas of Protein Sequence and Structure, vol. 5 . supplement 3. pp. 345-352. National Biomedical Research Foundation. • Henikoff, S., and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci USA 89: 10915- 10919.
Background image of page 2
3 BLOSUM62 matrix
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 • Evolutionarily related or random alignment? • The odds ratio: gives us an idea which one is more likely to be correct. likelihood that the alignment is found in related sequences likelihood that the alignment arise from chance match ···QVKGH··· || | ···KVKAH··· ···QVKGH··· ···KVKAH··· ···QVKAH··· Background
Background image of page 4
5 Odds ratio • Odds ratio for aligning i with j: q ij = probability of residue i substituted by residue j in related sequences. Estimated from counting ( i, j) pairs in alignments of related sequences. p i * p j = probability of randomly aligning i against j p i and p j are frequency of occurrence of i and j, respectively. • Take the logarithm of the odds ratio (log odds scores) for each ( i, j ) pair and then add the log odds scores together. j i ij *p p q = ratio odds ···QVKGH··· || | ···KVKAH···
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Scoring matrices Scoring matrices are log-odds matrices: s(i,j) = log-odds score for aligning i with j s(i,j) > 0 if q ij > p i *p j s(i,j) < 0 if q ij < p i *p j s(i,j) = 0 if q ij = p i *p j λ = scaling constant to round up scores to integers ) ln( λ 1 j) s(i, j i ij p p q * =
Background image of page 6
7 Nucleotide scoring matrix % Identity Match/Mismatch 99% 1/-3 1/-2 2/-3 3/-4 4/-5 1/-1 95% 90% 85% 80% 75%
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 A C G T A 0.75 0.083 0.083 0.083 C 0.083 0.75 0.083 0.083 G 0.083 0.083 0.75 0.083 T 0.083 0.083 0.083 0.75 Mutation matrix (probability that i change to j) A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Log-odds matrix (S ij ) A C G T A 0.1875 0.0208 0.0208 0.0208 C 0.0208 0.1875 0.0208 0.0208 G 0.0208 0.0208 0.1875 0.0208 T 0.0208 0.0208 0.0208 0.1875 q ij (probability of finding i, j pairs in related sequences) To construct a log-odds matrix that is optimized to find 75% identity in DNA alignment: Assume frequency of occurrence for each nucleotide = 0.25
Background image of page 8
9 1 λ set we if 1 1.0986 λ 1 ) 0.25 * 0.25 0.1875 ln( λ 1 T) s(T, G) s(G, C) s(C, A) s(A, = = = = = = = 1 λ for -1 1.0986) ( λ 1 ) 0.25 * 0.25 0.0208 ln( λ 1 j i where j) s(i, = = = = ) ln( λ 1 j) s(i, j i ij p p q * = To construct a log-odds matrix that is optimized to find 75% identity in DNA alignment: A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Log-odds matrix (S ij )
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
10 A C G T A 1 -2 -2 -2 C -2 1 -2 -2 G -2 -2 1 -2 T -2 -2 -2 1 To construct a log-odds matrix that is optimized to find 95% identity in DNA alignment: q AA = q CC = q GG = q TT = 0.25*0.95 = 0.2375 Assume all mismatches equiprobable = 0.25*(0.05/3) = 0.0042 1.335 λ set we if 1 ) 0.25 * 0.25 0.2375 ln( λ 1 T) s(T, G) s(G, C) s(C, A) s(A, = = = = = = 1.335 λ for -2 ) 0.25 * 0.25 0.0042 ln( λ 1 j i where j) s(i, = = =
Background image of page 10
11 PAM and BLOSUM matrices • The two commonly used protein log odds scoring matrices.
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/29/2010 for the course BIOC BIOC1805 taught by Professor Dr.brianwong during the Summer '09 term at HKU.

Page1 / 50

9 Scoring Matrices - 1 Introduction to Bioinformatics/...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online