9_ref - Proc.Natl.Acad. Sci. USA Vol.89,pp. 10915-10919,...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Proc.Natl.Acad. Sci. USA Vol.89,pp. 10915-10919, November 1992 Biochemistry Amino acidsubstitutionmatricesfromproteinblocks (aminoaddsequence/alignmentalgorithms/databasesrching) STEVEN HENIKOFF* AND JORJAG. HENIKOFF Howard Hughes MedicalInstitute,BasicSciencesDivision,FredHutchinsonCancerResearchCenter,Seattle, WA 98104 CommunicatedbyWalterGilbert,August28,1992(receivedforreviewJuly13, 1992) ABSTRACT Methods foralignmentofproteinsequences typically measuresimilait byusingasubstitutionmatrixwith scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoffmodel ofevolutionary rates. Using a different ap- proach,wehavederivedsubstitutionmatricesfromabout2000 blocksofalignedsequencesegmentscharacterizingmorethan 500groups ofrelatedproteins.Thisledtomarked improve- mentsinalignmentsandinsearchesusingqueriesfromeachof thegroups. Among the most useful computer-based tools in modem biologyarethosethatinvolvesequencealignmentsofpro- teins,sincethesealignmentsoftenprovideimportantinsights intogeneandproteinfunction.Thereareseveraldifferent typesofalignments: globalalignmentsofpairsofproteins relatedbycommon ancestrythroughouttheirlengths,local alignmentsinvolvingrelatedsegmentsofproteins,multiple alignmentsofmembers ofproteinfamilies,andalignments madeduringdatabasesearchestodetecthomology.Ineach case,competingalignmentsareevaluatedbyusingascoring schemeforestimatingsimilarity.Althoughseveraldifferent scoringschemeshavebeenproposed(1-6),themutationdata matrices ofDayhoff(1,7-9)aregenerallyconsidered the standardandareoftenthedefaultinalignmentandsearching programs. IntheDayhoffmodel, substitutionratesarede- rivedfromalignmentsofproteinsequencesthatareatleast 85% identical.However, themost common taskinvolving substitutionmatricesisthedetectionofmuch moredistant relationships,whichareonlyinferredfromsubstitutionrates inthe Dayhoffmodel. Therefore, we wondered whethera betterapproachmightbetouse alignmentsinwhichthese relationships are explicitly represented. An incentive for investigating this possibility isthat implementation ofan improved matrix in numerous important applications re- quiresonlytrivialeffort. METHODS DerivingaFrequency Tablefrom aDataBaseofBlocks. Localalignmentscanberepresentedasungappedblockswith each row a differentproteinsegment and each column an alignedresidueposition.Previously,we describedanauto- matedsystem,PROTOMAT,forobtainingasetofblocksgiven agroupofrelatedproteins(10).Thissystemwasappliedto acatalogofseveralhundredproteingroups,yieldingadata baseof>2000blocks.Considerasingleblockrepresentinga conservedregionofaproteinfamily.Foranew memberof thisfamily, we seeka setofscoresformatches and mis- matchesthatbestfavorsacorrectalignmentwitheachofthe other segments intheblock relativetoan incorrectalign- ment. For each column ofthe block, we firstcount the numberofmatchesandmismatchesofeachtypebetweenthe new sequenceandeveryothersequenceintheblock. For example,iftheresidueofthenew sequencethatalignswith...
View Full Document

This note was uploaded on 07/29/2010 for the course BIOC BIOC1805 taught by Professor Dr.brianwong during the Summer '09 term at HKU.

Page1 / 5

9_ref - Proc.Natl.Acad. Sci. USA Vol.89,pp. 10915-10919,...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online