423f11-profilehmm

# 423f11-profilehmm - Motif Search CMSC 423 Sequence Proles...

This preview shows pages 1–8. Sign up to view the full content.

Motif Search CMSC 423

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Sequence Profles CCT domain, oFten Found near one end oF plant proteins. Suppose we want to search For other examples oF this domain. How can we represent the pattern implied by these sequences? One way is a Sequence Profle
Sequence Profles (PSSM) ... A C D E T V W Y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MotiF Position Amino Acid = 1 Color Probability that the i th position has the given amino acid = e i (x).

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Sequence Logos Motif Position Height of letter fraction of time that letter is observed at that position. (Height of all the letters in a column to how conserved the column is)
Scoring a Sequence MRGSAMASINDSKILSLQNKKNALVDTSGYNAEVRVGDNVQLNTIYTNDFKLSSSGDKIIVN Color Probability that the i th position has the given amino acid = e i (x). x M= Score( x )=Pr( x | M )= L Y i =1 e i ( x i ) Score of a string according to proFle M = Product of the probabilities you would observe the given letters.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Background Frequencies ScoreCorrected( x )= Pr( x | M ) Pr( x | background) = L Y i =1 e i ( x i ) b ( x i ) Interested in how different this motif position is from we expect by chance. Correct for “expect by chance” by dividing by the probability of observing x in a random string: b(x i ) := probability of observing character x i at random. Usually computed as (# x i in entire string) / (length of string) ScoreCorrectedLog( x ) = log L Y i =1 e i ( x i ) b ( x i ) = L X i =1 log e i ( x i ) b ( x i ) Often, to avoid multiplying lots of terms, we take the log and then sum:
The PSSM doesn’t handle either: insertions of characters in the string that are not in the proFle. deletions of positions in the proFle (that don’t have a match in the string). A solution: use an HMM to model the proFle! AMASINDSKILSLQ-NKKNALVD

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

### Page1 / 15

423f11-profilehmm - Motif Search CMSC 423 Sequence Proles...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online