HMMBiologicalSeq_Baldi94PNAS

HMMBiologicalSeq_Baldi94PNAS - Proc.Nati.Acad. Sci. USA

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Proc.Nati.Acad. Sci. USA Vol.91,pp.1059-1063,February1994 Biochemistry Hidden Markov models ofbiologicalprimary sequenceinformation (multiplesequencealgnments/proteinmodeling/adaptivealgorithms/sequenceclassification) PIERRE BALDI*t,YVES CHAUVINt§, TIM HUNKAPILLER*¶, AND MARCELLA A. MCCLUREII** *DivisionofBiology,CaliforniaInstituteofTechnology, Pasadena,CA 91125;tNetID,Inc.,SanFrancisco,CA 94107;IDepartmentofMolecular Biotechnology, UniversityofWashington, Seattle,WA 98195; IlDepartmentofEcologyandEvolutionaryBiology,UniversityofCalifornia,Irvine, CA 92717;tJetPropulsionLaboratory,CaliforniaInstituteofTechnology, Pasadena,CA 91109;and§DepartmentofPsychology,StanfordUniversity, Stanford,CA 94025 Communicated byLeroyHood, October12,1993(receivedforreviewJanuary 14,1993) ABSTRACT Hidden Markov model (HMM) techniques areusedtomodel familiesofbiologicalsequences.A smooth andconvergentalgorithmisintroducedtoiterativelyadaptthe transition and emission parameters ofthe models from the examplesinagivenfamily.TheHMM approachisappliedto threeproteinfamilies:globins,immunoglobulins,andkinases. Inallcases,themodelsderivedcapturetheimportantstatistical characteristicsofthefamilyandcanbeusedforanumber of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linearinthe number of sequences. Comparativeanalysisofprimary sequence informationisa majortoolintheelucidationofthemolecularmechanismsof replicationandevolutionoforganismsandthestructureand function ofproteins. For the simple case of pairwise se- quencecomparison,goodalgorithmsexist(seerefs.1and2 forrecentreviews)thatcanaligntwosequencesoflengthN inroughly O(N2)steps.Most ofthesealgorithmsarebased ondynamicprogramming(3),withlocation-independentsub- stitutionand gap penalties. Unfortunately, when dynamic programming is applied to a family of K sequences its behaviorscaleslike O(NK),exponentiallyinthenumberof sequences (4). A numberofalgorithmshavebeendevisedtotrytotackle themultiplealignmentproblem(seerefs.5-7forsomeofthe most recentones).Most proteinsequencerelationshipsex- hibiting >50%oidenticalresiduescanbealignedbyseveralof thesealgorithms.Many ofthemostinterestingproteinfam- ilies,however,exhibitconservationfarbelow 50%oidentity. To date,alignmentmethods have notbeen developedthat cancorrectly identifyallthemotifsthatdefineeachprotein family(2). Here, we apply a differentapproach, based on hidden Markov models (HMMs), tothe problem ofmodelingand aligning afamilybyusingprimarystructureinformationonly. Initialresultswere presented(8).Markov models and the relatedexpectation-maximization(EM)(9)algorithminsta- tisticshavealreadybeenappliedtobiocomputationalprob- lems(10-13).Kroghetal.(14)werethefirsttodemonstrate thepower ofasimilarmethod on theglobinfamily.Rather thanstartingfrompairwisealignments,theapproachseeksto take advantageofthemassive amount ofinformationtypi-...
View Full Document

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Page1 / 5

HMMBiologicalSeq_Baldi94PNAS - Proc.Nati.Acad. Sci. USA

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online