This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Lecture 9: Protein Sequence Profiles and Motif Applications Calculating profiles of protein sequences- Average Score Method Pattern and Profile applications PSI-BLAST Identifying new sequence motifs:- Gibbs sampling Some slides adapted from slides by Dr. Keith Dunker Some slides adapted from slides created by Dr. Zhiping Weng (Boston University) Protein Sequence Profiles A profile is a position-specific scoring matrix that gives a quantitative description of a sequence motif For protein sequences, the profile scoring matrix has N rows and 20+ columns, N being the length of the profile (# of sequence positions) The first 20 columns indicate the score (or probability) for finding, at that position in the target sequence, one of the 20 amino acids Additional columns contain gap penalties for insertions/deletions at that position in the target sequence M kj = score for the j th amino acid (or gap) at the k th position in the sequence Calculating the Profile Matrix for Protein Sequences: Average Score Method M kj = C ki Z S ij i = 1 20 " M kj = Profile matrix element (score for j th amino acid at the k th position) C ki = Number of i th type amino acid at position k in the sequence/profile Z = Number of aligned sequences S ij = Score between the i th and the j th amino acids based on a scoring matrix (e.g., PAM250 or BLOSUM62) Derived from paper by Gribskov et al, (1987) PNAS 84 :4355-8 1 AGGCTH F WKGESM 2 SGACSR W YRGQSL 3 TGSCLK F FHG-LM 4 SGACSR M YRGESL 5 TGGCSK W MRGQSV 6 SGNCSK M WKGNSI 7 FGACSH W YKGDSL Z=8 SGQCSR F YRGQSL Average Score Method: Example Position k = 7 M kj = C ki Z S ij i = 1 20 " C 7F = 3, C 7W = 3, C 7M = 2, other C 7i = 0 M 7 F = 3 8 S FF + 3 8 S WF + 2 8 S MF M 7 W = 3 8 S FW + 3 8 S WW + 2 8 S MW M 7 M = 3 8 S FM + 3 8 S WM + 2 8 S MM M 7 j = 3 8 S Fj + 3 8 S Wj + 2 8 S Mj Using BLOSUM62 : S FF = 6; S WF = 1; S MF = 0 M 7F = (3/8)(6) + (3/8)(1) + (2/8)(0) = 2.625 Average Score Method: Example M 7 Y = 3 8 S FY + 3 8 S WY + 2 8 S MY = 3 8 (3) + 3 8 (2) + 2 8 ( " 1) ~1.6 M 7 E = 3 8 S FE + 3 8 S WE + 2 8 S ME = 3 8 ( " 3) + 3 8 ( " 3) + 2 8 ( " 2) ~ " 2.8 Calculating the profile values for two unobserved amino acids (Y and E): From the above two equations, it is easy to predict that M 7Y is much more favorable than M 7E , even though neither Y nor E has been observed at this position (k = 7). Why? Searching for PSSM/Profile Matches If we do not allow gaps (i.e., no insertions or deletions): Can simply do a linear scan, scoring the match to the position-specific scoring matrix (PSSM) at each position in the sequence If we allow gaps: Can use dynamic programming to align the profile to the protein sequence(s) (with gap penalties)- see Mount, Bioinformatics: sequence and genome analysis (2004) Can use hidden Markov Model-based methods - see Durbin et al., Biological Sequence Analysis (1998) Sequence Pattern and Profile Applications Predicting structural or functional domains in protein sequences...
View Full Document
- Fall '11