HMM-Lec8-091604

HMM-Lec8-091604 - Hidden Markov model BioE 480 In general...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Hidden Markov model BioE 480 Sept 16, 2004
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
In general, we have Bayes theorem: P(X|Y) = P(Y|X)P(X)/P(Y) Event X: the die is loaded, Event Y: 3 sixes. Example: Assume we know that on average extracellular proteins have a slightly different a.a. composition than intracellular ones. Eg. More cysteines. How do we use this information to predict a new protein sequence x=x 1 x 2 …x n whether it is intracellular or extracellular. We first split the training examples from Swiss-Prot into intracellular and extracellular proteins, leaving aside those unclassifiable. We then estimate a set of frequencies for intraceullar proteins and a set of extracellular frequencies. Also estimate the probability that any new sequence is extracelluar, p ext and intracellular p int , called prior probabilites , because they are best guesses about a sequence before we actually see the sequence itself. int a q ext a q
Background image of page 2
We now have: Because we assume that every sequence must be either extracellular or intracelluar, we have: By Bayes’ theorem, This is the number we want: the posterior probability that a sequence is extracellular. It is our best guess after we have seen the data. More complicated: transmembrane proteins have both intra and extra cellular components. = = i x i ext x i i q x P q ext x P int ) int | ( , ) | ( int) | ( ) | ( ) ( int x P p ext x P p x P ext + = + = i x i ext x ext i ext x ext i i i q p q p q p x ext P int int ) | (
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Random Model R : For two sequences x and y , of lengths n and m . If x i is the i th symbol in x , and y i the i th symbol in y . Assume that letter a occurs independently with some frequency q a . The probability of the two sequences x and y is just the product of the probabilities of each amino acid: P(x,y|R) = Π q xi Π q yi An alternative model: Match Model M : Aligned pairs of residues occur with a joint probability P ab . Its value can be thought of as the probability that the resdiues a and b have each independently been derived from some unknown original residue c in their common ancester. c might be the same as a and/or b . The probability of the whole alignment is: P(x,y|M) = Π p xiyi The ratio of these two likelihoods is the odds ratio : P(x,y|M) / P(x,y|R) = Π p xiyi / ( Π q xi Π q yi )= Π p xiyi / q xi q yi To make this additive, we take the logarithm of this ratio, the log-odd ratio . S =
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 22

HMM-Lec8-091604 - Hidden Markov model BioE 480 In general...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online