{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

9_ref - Proc Natl Acad Sci USA Vol 89 pp 10915-10919...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Proc. Natl. Acad. Sci. USA Vol. 89, pp. 10915-10919, November 1992 Biochemistry Amino acid substitution matrices from protein blocks (amino add sequence/alignment algorithms/data base srching) STEVEN HENIKOFF* AND JORJA G. HENIKOFF Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98104 Communicated by Walter Gilbert, August 28, 1992 (received for review July 13, 1992) ABSTRACT Methods for alignment of protein sequences typically measure similait by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different ap- proach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improve- ments in alignments and in searches using queries from each of the groups. Among the most useful computer-based tools in modem biology are those that involve sequence alignments of pro- teins, since these alignments often provide important insights into gene and protein function. There are several different types of alignments: global alignments of pairs of proteins related by common ancestry throughout their lengths, local alignments involving related segments of proteins, multiple alignments of members of protein families, and alignments made during data base searches to detect homology. In each case, competing alignments are evaluated by using a scoring scheme for estimating similarity. Although several different scoring schemes have been proposed (1-6), the mutation data matrices of Dayhoff (1, 7-9) are generally considered the standard and are often the default in alignment and searching programs. In the Dayhoff model, substitution rates are de- rived from alignments of protein sequences that are at least 85% identical. However, the most common task involving substitution matrices is the detection of much more distant relationships, which are only inferred from substitution rates in the Dayhoff model. Therefore, we wondered whether a better approach might be to use alignments in which these relationships are explicitly represented. An incentive for investigating this possibility is that implementation of an improved matrix in numerous important applications re- quires only trivial effort. METHODS Deriving a Frequency Table from a Data Base of Blocks. Local alignments canbe represented as ungapped blocks with each row a different protein segment and each column an aligned residue position. Previously, we described an auto- mated system, PROTOMAT, for obtaining a set of blocks given a group of related proteins (10). This system was applied to a catalog of several hundred protein groups, yielding a data base of >2000 blocks. Consider a single block representing a conserved region of a protein family. For a new member of this family, we seek a set of scores for matches and mis- matches that best favors a correct alignment with each of the other segments in the
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}