This preview shows pages 1–5. Sign up to view the full content.
Hidden Markov model
BioE 480
Sept 16, 2004
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document •
In general, we have Bayes theorem:
P(XY) = P(YX)P(X)/P(Y)
•
Event X: the die is loaded, Event Y: 3 sixes.
•
Example: Assume we know that on average extracellular proteins have
a slightly different a.a. composition than intracellular ones.
Eg. More
cysteines.
How do we use this information to predict a new protein
sequence
x=x
1
x
2
…x
n
whether it is intracellular or extracellular.
–
We first split the training examples from SwissProt into
intracellular and extracellular proteins, leaving aside those
unclassifiable.
–
We then estimate a set of frequencies
for intraceullar proteins
and a set
of extracellular frequencies.
–
Also estimate the probability that any new sequence is
extracelluar,
p
ext
and intracellular
p
int
, called
prior probabilites
,
because they are best guesses about a sequence before we actually
see the sequence itself.
int
a
q
ext
a
q
•
We now have:
•
Because we assume that every sequence must be either
extracellular or intracelluar, we have:
•
By Bayes’ theorem,
•
This is the number we want: the
posterior probability
that
a sequence is extracellular.
–
It is our best guess
after
we have seen the data.
•
More complicated: transmembrane proteins have both
intra and extra cellular components.
∏
∏
=
=
i
x
i
ext
x
i
i
q
x
P
q
ext
x
P
int
)
int

(
,
)

(
int)

(
)

(
)
(
int
x
P
p
ext
x
P
p
x
P
ext
+
=
∏
∏
∏
+
=
i
x
i
ext
x
ext
i
ext
x
ext
i
i
i
q
p
q
p
q
p
x
ext
P
int
int
)

(
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Random Model
R
:
For two sequences
x
and
y
, of lengths
n
and
m
.
If
x
i
is the
i
th symbol in
x
, and
y
i
the
i
th symbol in
y
.
Assume that letter
a
occurs
independently with some frequency
q
a
.
–
The probability of the two sequences
x
and
y
is just the product of the
probabilities of each amino acid:
P(x,yR) =
Π
q
xi
Π
q
yi
•
An alternative model:
Match Model
M
:
Aligned pairs of residues occur
with a joint probability
P
ab
.
Its value can be thought of as the probability that
the resdiues
a
and
b
have each independently been derived from some
unknown original residue
c
in their common ancester.
–
c
might be the same as
a
and/or
b
.
–
The probability of the whole alignment is:
P(x,yM) =
Π
p
xiyi
•
The ratio of these two likelihoods is the
odds ratio
:
P(x,yM) / P(x,yR) =
Π
p
xiyi
/ (
Π
q
xi
Π
q
yi
)=
Π
p
xiyi
/ q
xi
q
yi
•
To make this additive, we take the logarithm of this ratio, the
logodd ratio
.
S =
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 02/13/2012 for the course CS 91.510 taught by Professor Staff during the Fall '09 term at UMass Lowell.
 Fall '09
 Staff

Click to edit the document details