This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Computational Biology, Part 9
BaumWelch Algorithm and
HMMER
Robert F. Murphy
Copyright © 20052006.
Copyright
All rights reserved. Parameter estimation for HMMs
s Simple when state sequence is known for
Simple
training examples
training
s Can be very complex for unknown paths
Can Estimation when state sequence
known
s Count number of times each transition
Count
occurs, Akl s Count number of times each emission
Count
occurs from each state, Ek(b) s Convert to probabilities E k (b)
Akl
ek (b) =
akl =
å E k (b' )
å Akl '
l' b' BaumWelch
s Make initial parameter estimates
s Use forward algorithm and backward
Use
algorithm to calculate probability of each
sequence according to the model
sequence
s Calculate new model parameters
s Repeat until termination criteria met
Repeat
(change in log likelihood < threshold)
(change Estimating transition frequencies
Probability that akl is used as position i in
Probability
sequence x
f k (i) akl el ( x i +1)bl (i + 1)
P (p i = k, p i +1 = l  x,q ) =
P( x) s s Sum over all positions (i) and all sequences
Sum
(j) to get expected number of times akl is used
1
j
j
j
Akl = å
f k (i) akl el ( x i +1 )bl (i + 1)
jå
j P( x ) i Estimating emission frequencies
s Sum over all positions for which the emitted
Sum
character is b and all sequences E k (b) = å
j 1
j
j
åj f k (i)bk (i)
j
P ( x ) i x = b
i Updating model parameters
s Convert expected numbers to probabilities
Convert
as if expected numbers were actual counts
as Akl
E k (b)
akl =
ek (b) =
å Akl '
å E k (b' )
l' b' Test for termination
s Calculate the log likelihood of the model for all of
Calculate
the sequences using the new parameters
the n å j log P ( x  q ) j =1
s If the change in log likelihood exceeds some
If
threshold, go back and make new estimates of a
and e Putting multiple sequence
alignment and HMMs together
s Given set of (unaligned) sequences
s Can use CLUSTALW to create multiple
Can
sequence alignment
sequence
x http://www.ebi.ac.uk/clustalw/ s Then use HMMER to create HMM to
Then
represent conserved features
represent Building ClustalW input file
s Get collection of sequence files in FASTA
Get
format
format
s Convert from Mac to Unix format if
Convert
necessary
necessary
s Ensure each has a unique name (may
Ensure
already be true, depends on source of file)
already
s Concatenate to create input file Convert.pl
#!/usr/bin/perl
# for all .fa files in the current directory do two things:
#
convert from Mac to Unix style using "tr"
#
convert " NAME = " to " " using "sed"
# then concate them into a single file called "input.ux"
opendir(T, ".");
while($name = readdir(T))
{
if (d $name) { next; }
if
$newname = $name;
$newname
if ($newname =~ s/\.fa/\.1fa/) {
if
print "$name\n";
print
system "/usr/bin/tr '\015' '\n' < $name > $newname";
system
$newname2 = $name;
$newname2
($newname2 =~ s/\.fa/\.2fa/);
($newname2
system "/usr/bin/sed 's/ NAME = / /' $newname > $newname2";
system
}
}
system "cat *.2fa > input.ux";
system "rm *.1fa"; system "rm *.2fa"; ClustalW
s Use either ClustalW from command line or
Use
ClustalX via graphical interface to align set
of sequences and output as .aln file
of
s The file can be used as input to build an
The
HMM
HMM HMMER
s Free HMM builder and searcher from Sean
Free
Eddy at Washington University
Eddy
s http://hmmer.wustl.edu ...
View
Full
Document
This note was uploaded on 01/13/2012 for the course BIO 101 taught by Professor Staff during the Fall '10 term at DePaul.
 Fall '10
 Staff
 Biology, Computational Biology

Click to edit the document details