{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture9 - Protein Sequence Profiles Lecture 9 Protein...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Lecture 9: Protein Sequence Profiles and Motif Applications • Calculating profiles of protein sequences - Average Score Method • Pattern and Profile applications • PSI-BLAST • Identifying new sequence motifs: - Gibbs sampling Some slides adapted from slides by Dr. Keith Dunker Some slides adapted from slides created by Dr. Zhiping Weng (Boston University) Protein Sequence Profiles § A profile is a position-specific scoring matrix that gives a quantitative description of a sequence motif § For protein sequences, the profile scoring matrix has N rows and 20+ columns, N being the length of the profile (# of sequence positions) § The first 20 columns indicate the score (or probability) for finding, at that position in the target sequence, one of the 20 amino acids § Additional columns contain gap penalties for insertions/deletions at that position in the target sequence § M kj = score for the j th amino acid (or gap) at the k th position in the sequence Calculating the Profile Matrix for Protein Sequences: Average Score Method M kj = C ki Z S ij i = 1 20 " M kj = Profile matrix element (score for j th amino acid at the k th position) C ki = Number of i th type amino acid at position k in the sequence/profile Z = Number of aligned sequences S ij = Score between the i th and the j th amino acids based on a scoring matrix (e.g., PAM250 or BLOSUM62) Derived from paper by Gribskov et al, (1987) PNAS 84 :4355-8 1 AGGCTH F WKGESM 2 SGACSR W YRGQSL 3 TGSCLK F FHG-LM 4 SGACSR M YRGESL 5 TGGCSK W MRGQSV 6 SGNCSK M WKGNSI 7 FGACSH W YKGDSL Z=8 SGQCSR F YRGQSL Average Score Method: Example Position k = 7 M kj = C ki Z S ij i = 1 20 " C 7F = 3, C 7W = 3, C 7M = 2, other C 7i = 0 M 7 F = 3 8 S FF + 3 8 S WF + 2 8 S MF M 7 W = 3 8 S FW + 3 8 S WW + 2 8 S MW M 7 M = 3 8 S FM + 3 8 S WM + 2 8 S MM M 7 j = 3 8 S Fj + 3 8 S Wj + 2 8 S Mj Using BLOSUM62 : S FF = 6; S WF = 1; S MF = 0 M 7F = (3/8)(6) + (3/8)(1) + (2/8)(0) = 2.625
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Average Score Method: Example M 7 Y = 3 8 S FY + 3 8 S WY + 2 8 S MY = 3 8 (3) + 3 8 (2) + 2 8 ( " 1) ~ 1.6 M 7 E = 3 8 S FE + 3 8 S WE + 2 8 S ME = 3 8 ( " 3) + 3 8 ( " 3) + 2 8 ( " 2) ~ " 2.8 § Calculating the profile values for two unobserved amino acids (Y and E): § From the above two equations, it is easy to predict that M 7Y is much more favorable than M 7E , even though neither Y nor E has been observed at this position (k = 7). Why? Searching for PSSM/Profile Matches § If we do not allow gaps (i.e., no insertions or deletions): • Can simply do a linear scan, scoring the match to the position-specific scoring matrix (PSSM) at each position in the sequence § If we allow gaps: • Can use dynamic programming to align the profile to the protein sequence(s) (with gap penalties) - see Mount, Bioinformatics: sequence and genome analysis (2004) • Can use hidden Markov Model-based methods - see Durbin et al., Biological Sequence Analysis (1998) Sequence Pattern and Profile Applications § Predicting structural or functional domains in protein sequences • Example: PROSITE database of protein sequence motifs § Predicting protein-protein interaction motifs
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}