423f11-profilehmm

423f11-profilehmm - Motif Search CMSC 423 Sequence Proles...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Motif Search CMSC 423
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Sequence Profles CCT domain, oFten Found near one end oF plant proteins. Suppose we want to search For other examples oF this domain. How can we represent the pattern implied by these sequences? One way is a Sequence Profle
Background image of page 2
Sequence Profles (PSSM) ... A C D E T V W Y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MotiF Position Amino Acid = 1 Color Probability that the i th position has the given amino acid = e i (x).
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Sequence Logos Motif Position Height of letter fraction of time that letter is observed at that position. (Height of all the letters in a column to how conserved the column is)
Background image of page 4
Scoring a Sequence MRGSAMASINDSKILSLQNKKNALVDTSGYNAEVRVGDNVQLNTIYTNDFKLSSSGDKIIVN Color Probability that the i th position has the given amino acid = e i (x). x M= Score( x )=Pr( x | M )= L Y i =1 e i ( x i ) Score of a string according to proFle M = Product of the probabilities you would observe the given letters.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background Frequencies ScoreCorrected( x )= Pr( x | M ) Pr( x | background) = L Y i =1 e i ( x i ) b ( x i ) Interested in how different this motif position is from we expect by chance. Correct for “expect by chance” by dividing by the probability of observing x in a random string: b(x i ) := probability of observing character x i at random. Usually computed as (# x i in entire string) / (length of string) ScoreCorrectedLog( x ) = log L Y i =1 e i ( x i ) b ( x i ) = L X i =1 log e i ( x i ) b ( x i ) Often, to avoid multiplying lots of terms, we take the log and then sum:
Background image of page 6
The PSSM doesn’t handle either: insertions of characters in the string that are not in the proFle. deletions of positions in the proFle (that don’t have a match in the string). A solution: use an HMM to model the proFle! AMASINDSKILSLQ-NKKNALVD
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Page1 / 15

423f11-profilehmm - Motif Search CMSC 423 Sequence Proles...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online