LecturesPart09 - Computational Biology, Part 9 Baum-Welch...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Computational Biology, Part 9 Baum-Welch Algorithm and HMMER Robert F. Murphy Copyright © 2005-2006. Copyright All rights reserved. Parameter estimation for HMMs s Simple when state sequence is known for Simple training examples training s Can be very complex for unknown paths Can Estimation when state sequence known s Count number of times each transition Count occurs, Akl s Count number of times each emission Count occurs from each state, Ek(b) s Convert to probabilities E k (b) Akl ek (b) = akl = å E k (b' ) å Akl ' l' b' Baum-Welch s Make initial parameter estimates s Use forward algorithm and backward Use algorithm to calculate probability of each sequence according to the model sequence s Calculate new model parameters s Repeat until termination criteria met Repeat (change in log likelihood < threshold) (change Estimating transition frequencies Probability that akl is used as position i in Probability sequence x f k (i) akl el ( x i +1)bl (i + 1) P (p i = k, p i +1 = l | x,q ) = P( x) s s Sum over all positions (i) and all sequences Sum (j) to get expected number of times akl is used 1 j j j Akl = å f k (i) akl el ( x i +1 )bl (i + 1) jå j P( x ) i Estimating emission frequencies s Sum over all positions for which the emitted Sum character is b and all sequences E k (b) = å j 1 j j åj f k (i)bk (i) j P ( x ) i| x = b i Updating model parameters s Convert expected numbers to probabilities Convert as if expected numbers were actual counts as Akl E k (b) akl = ek (b) = å Akl ' å E k (b' ) l' b' Test for termination s Calculate the log likelihood of the model for all of Calculate the sequences using the new parameters the n å j log P ( x | q ) j =1 s If the change in log likelihood exceeds some If threshold, go back and make new estimates of a and e Putting multiple sequence alignment and HMMs together s Given set of (unaligned) sequences s Can use CLUSTALW to create multiple Can sequence alignment sequence x http://www.ebi.ac.uk/clustalw/ s Then use HMMER to create HMM to Then represent conserved features represent Building ClustalW input file s Get collection of sequence files in FASTA Get format format s Convert from Mac to Unix format if Convert necessary necessary s Ensure each has a unique name (may Ensure already be true, depends on source of file) already s Concatenate to create input file Convert.pl #!/usr/bin/perl # for all .fa files in the current directory do two things: # convert from Mac to Unix style using "tr" # convert " NAME = " to " " using "sed" # then concate them into a single file called "input.ux" opendir(T, "."); while($name = readdir(T)) { if (-d $name) { next; } if $newname = $name; $newname if ($newname =~ s/\.fa/\.1fa/) { if print "$name\n"; print system "/usr/bin/tr '\015' '\n' < $name > $newname"; system $newname2 = $name; $newname2 ($newname2 =~ s/\.fa/\.2fa/); ($newname2 system "/usr/bin/sed 's/ NAME = / /' $newname > $newname2"; system } } system "cat *.2fa > input.ux"; system "rm *.1fa"; system "rm *.2fa"; ClustalW s Use either ClustalW from command line or Use ClustalX via graphical interface to align set of sequences and output as .aln file of s The file can be used as input to build an The HMM HMM HMMER s Free HMM builder and searcher from Sean Free Eddy at Washington University Eddy s http://hmmer.wustl.edu ...
View Full Document

This note was uploaded on 01/13/2012 for the course BIO 101 taught by Professor Staff during the Fall '10 term at DePaul.

Ask a homework question - tutors are online