9 Scoring Matrices

# 9 Scoring Matrices - 1 Introduction to Bioinformatics/...

This preview shows pages 1–12. Sign up to view the full content.

1 Introduction to Bioinformatics/ Elements of Bioinformatics Scoring Matrices

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Reference Mount, D.W. (2004) Bioinformatics: Sequence and Genome Analysis. 2 nd ed. Cold Spring Harbor Lab. Press, N.Y. Chapter 3. Baxevanis, A.D., and Ouellette, B.F.F. (2005) Bioinformatics - A practical guide to the analysis of genes and proteins (3 rd ed). John Wiley and Sons, NY. Chapter 11. • Eddy S. R. (2004) Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22: 1035-1306. • Dayhoff, M.O. (1978) Atlas of Protein Sequence and Structure, vol. 5 . supplement 3. pp. 345-352. National Biomedical Research Foundation. • Henikoff, S., and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci USA 89: 10915- 10919.
3 BLOSUM62 matrix

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 • Evolutionarily related or random alignment? • The odds ratio: gives us an idea which one is more likely to be correct. likelihood that the alignment is found in related sequences likelihood that the alignment arise from chance match ···QVKGH··· || | ···KVKAH··· ···QVKGH··· ···KVKAH··· ···QVKAH··· Background
5 Odds ratio • Odds ratio for aligning i with j: q ij = probability of residue i substituted by residue j in related sequences. Estimated from counting ( i, j) pairs in alignments of related sequences. p i * p j = probability of randomly aligning i against j p i and p j are frequency of occurrence of i and j, respectively. • Take the logarithm of the odds ratio (log odds scores) for each ( i, j ) pair and then add the log odds scores together. j i ij *p p q = ratio odds ···QVKGH··· || | ···KVKAH···

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Scoring matrices Scoring matrices are log-odds matrices: s(i,j) = log-odds score for aligning i with j s(i,j) > 0 if q ij > p i *p j s(i,j) < 0 if q ij < p i *p j s(i,j) = 0 if q ij = p i *p j λ = scaling constant to round up scores to integers ) ln( λ 1 j) s(i, j i ij p p q * =
7 Nucleotide scoring matrix % Identity Match/Mismatch 99% 1/-3 1/-2 2/-3 3/-4 4/-5 1/-1 95% 90% 85% 80% 75%

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
8 A C G T A 0.75 0.083 0.083 0.083 C 0.083 0.75 0.083 0.083 G 0.083 0.083 0.75 0.083 T 0.083 0.083 0.083 0.75 Mutation matrix (probability that i change to j) A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Log-odds matrix (S ij ) A C G T A 0.1875 0.0208 0.0208 0.0208 C 0.0208 0.1875 0.0208 0.0208 G 0.0208 0.0208 0.1875 0.0208 T 0.0208 0.0208 0.0208 0.1875 q ij (probability of finding i, j pairs in related sequences) To construct a log-odds matrix that is optimized to find 75% identity in DNA alignment: Assume frequency of occurrence for each nucleotide = 0.25
9 1 λ set we if 1 1.0986 λ 1 ) 0.25 * 0.25 0.1875 ln( λ 1 T) s(T, G) s(G, C) s(C, A) s(A, = = = = = = = 1 λ for -1 1.0986) ( λ 1 ) 0.25 * 0.25 0.0208 ln( λ 1 j i where j) s(i, = = = = ) ln( λ 1 j) s(i, j i ij p p q * = To construct a log-odds matrix that is optimized to find 75% identity in DNA alignment: A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Log-odds matrix (S ij )

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
10 A C G T A 1 -2 -2 -2 C -2 1 -2 -2 G -2 -2 1 -2 T -2 -2 -2 1 To construct a log-odds matrix that is optimized to find 95% identity in DNA alignment: q AA = q CC = q GG = q TT = 0.25*0.95 = 0.2375 Assume all mismatches equiprobable = 0.25*(0.05/3) = 0.0042 1.335 λ set we if 1 ) 0.25 * 0.25 0.2375 ln( λ 1 T) s(T, G) s(G, C) s(C, A) s(A, = = = = = = 1.335 λ for -2 ) 0.25 * 0.25 0.0042 ln( λ 1 j i where j) s(i, = = =
11 PAM and BLOSUM matrices • The two commonly used protein log odds scoring matrices.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 07/29/2010 for the course BIOC BIOC1805 taught by Professor Dr.brianwong during the Summer '09 term at HKU.

### Page1 / 50

9 Scoring Matrices - 1 Introduction to Bioinformatics/...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online