This question has been answered
Having serious trouble with this assignment and could use some help.
Structure Short Article Conservation of Protein Structure over Four Billion Years Alvaro Ingles-Prieto, 1 , 5 Beatriz Ibarra-Molero, 1 Asuncion Delgado-Delgado, 1 Raul Perez-Jimenez, 2 , 6 Julio M. Fernandez, 2 Eric A. Gaucher, 3 Jose M. Sanchez-Ruiz, 1 , * and Jose A. Gavira 4 , * 1 Departamento de Quı ´mica Fı ´sica, Facultad de Ciencias, Universidad de Granada, Granada 18071, Spain 2 Department of Biological Sciences, Columbia University, New York, NY 10027, USA 3 Georgia Institute of Technology, School of Biology, School of Chemistry and Biochemistry, and Parker H. Petit Institute for Bioengineering and Biosciences, Atlanta, GA 30332, USA 4 Laboratorio de Estudios Cristalogra ´ ﬁcos, Instituto Andaluz de Ciencias de la Tierra (Consejo Superior de Investigaciones Cientı ´ﬁcas – Universidad de Granada), Avenida de las Palmeras 4, Armilla, Granada 18100, Spain 5 Present address: IST Austria, Am Campus 1, 3400 Klosterneuburg, Austria 6 Present address: Ikerbasque Research Foundation, CIC nanoGUNE, Tolosa Hiribidea 76, 20018 San Sebastian, Spain *Correspondence: [email protected] (J.M.S.-R.), [email protected] (J.A.G.) http://dx.doi.org/10.1016/j.str.2013.06.020 SUMMARY Little is known about the evolution of protein struc- tures and the degree of protein structure conserva- tion over planetary time scales. Here, we report the X-ray crystal structures of seven laboratory resurrec- tions of Precambrian thioredoxins dating up to approximately four billion years ago. Despite consid- erable sequence differences compared with extant enzymes, the ancestral proteins display the canoni- cal thioredoxin fold, whereas only small structural changes have occurred over four billion years. This remarkable degree of structure conservation since a time near the last common ancestor of life supports a punctuated-equilibrium model of structure evolu- tion in which the generation of new folds occurs over comparatively short periods and is followed by long periods of structural stasis. INTRODUCTION Little is known with certainty about the evolution of protein struc- tures, despite the substantial number of different protein folds revealed by the structures deposited in the Protein Data Bank (PDB). As elaborated below, several facts contribute to this un- desirable situation. While it is generally admitted that structures change at a slower pace than sequences do, evidence has accumulated in recent years supporting that protein structures are not invariant and, therefore, that they may change during the course of evolu- tion ( Grishin, 2001; Murzin, 2008; Sikosek et al., 2012; Taylor, 2007; Tokuriki and Tawﬁk, 2009; Valas et al., 2009 ). In fact, due to the so-called shape-covering properties of the mapping of sequence into structure ( Caetano-Anolle ´ s et al., 2009 ), different structures may be just a few mutational steps away in sequence space, as has been experimentally demonstrated ( Cordes et al., 1999; He et al., 2012 ). Moreover, the possibility of convergent evolution of folds is generally accepted and, hence, common ancestry does not necessarily follow from struc- tural similarity ( Grishin, 2001; Krishna and Grishin, 2004; Murzin, 2008; Orengo et al., 1994; Schaeffer and Daggett, 2011; Taylor, 2007 ). That is, transitions between folds and convergent evolu- tion of folds may both conceivably occur during protein evolu- tion; therefore, the identiﬁcation of basic principles of structure evolution may be difﬁcult to extract from the study of extant pro- tein structures ( Caetano-Anolle ´ s et al., 2009; Murzin, 2008 ). Consequently, many current fold classiﬁcations are phenetic (based on a metric of structure similarity) and the viability of phyletic classiﬁcations (based on evolutionary relationships) remains an open issue ( Murzin, 2008; Valas et al., 2009 ). As a result, age estimates for protein folds are uncertain and based on indirect methods, such as the census of (assigned) folds in genomes ( Caetano-Anolle ´ s et al., 2009; Winstanley et al., 2005 ). Even the usefulness of the fold concept is at stake, as several authors have discussed that fold space must be viewed as continuous rather than discrete ( Honig, 2007; Sadreyev et al., 2009; Xie and Bourne, 2008 ). The above observations summarize what may be viewed as a particularly clear example of the limitations of ‘‘horizontal’’ approaches (i.e., based on the comparison between extant pro- teins) to molecular evolution ( Harms and Thornton, 2010 ). In fact, some recent work has used sequence reconstruction analyses targeting ancestral states represented by nodes in phylogenetic trees and the subsequent laboratory ‘‘resurrection’’ of their en- coded proteins ( Benner et al., 2007; Harms and Thornton, 2010 ) to address important issues in protein evolution, such as the role of epistasis in formation of new function ( Ortlund et al., 2007 ), the evolution of complex biomolecular machines ( Finni- gan et al., 2012 ), the mechanisms of evolutionary innovation through gene duplication ( Voordeckers et al., 2012 ), and the adaptation of proteins to changing environments over planetary time scales ( Gaucher et al., 2008; Perez-Jimenez et al., 2011; Risso et al., 2013 ). Here we explore the potential of this ‘‘vertical’’ approach to probe the evolution of protein structures. To this end, we have obtained the three-dimensional (3D) structures of several laboratory resurrections of Precambrian enzymes dating up to approximately four billion years (Gyr) ago, i.e., up to a time close to the origin of life. In particular, we target thioredoxin en- zymes corresponding to the last bacterial common ancestor 1690 Structure 21 , 1690–1697, September 3, 2013 ª 2013 Elsevier Ltd All rights reserved
(LBCA); the last archaeal common ancestor (LACA); the archaeal-eukaryotic common ancestor (AECA); the last eukary- otic common ancestor (LECA); the last common ancestor of fungi and animals (LAFCA); the last common ancestor of the cy- anobacterial, deinococcus and thermus groups (LPBCA); and the last common ancestor of g -proteobacteria (LGPCA). As brieﬂy described subsequently, we recently ‘‘resurrected’’ and characterized these proteins in terms of stability and function ( Perez-Jimenez et al., 2011 ). We used ± 200 diverse extant thioredoxin sequences encom- passing the three domains of life to construct a highly articulated phylogenetic tree and subsequently perform a maximum likeli- hood sequence reconstruction targeting several Precambrian nodes during thioredoxin evolution ( Perez-Jimenez et al., 2011 ). The resultant phylogenetic tree was sufﬁciently close to an accepted organism phylogeny to allow us to assign the recon- structed nodes to well-deﬁned Precambrian ancestors (see pre- vious) and to date those nodes (see Figure 1 A; Hedges and Kumar, 2009 ; for further details, see Perez-Jimenez et al., 2011 ). In the laboratory, we resurrected the proteins encoded by the reconstructed sequences and determined their stability and catalytic features. We found an increase in denaturation temperature of ± 30 ² C when ‘‘traveling back in time’’ several billion years. This result afforded support for our ancestral recon- struction exercise, because it is consistent with the generally proposed thermophilic character of Precambrian life and, indeed, similar stability enhancements have been reported in Precambrian resurrection studies on other proteins systems, such as elongation factors ( Gaucher et al., 2008 ) and b -lacta- mases ( Risso et al., 2013 ). It is also noteworthy that some Figure 1. Overall Structural Features of Extant Thioredoxins and Laboratory Resurrections of Precambrian Thioredoxins (A) Schematic phylogenetic tree showing the geological time ( Perez-Jimenez et al., 2011 ) and the phylogenetic nodes targeted in this work. (B) Spatial course of the polypeptide chain for the human and E. coli thioredoxins, as well as for the several laboratory resurrections of Precambrian thioredoxins studied in this work. The color code is that given in (A). (C) Sequences ( Perez-Jimenez et al., 2011 ) and secondary structure assignments for the extant thioredoxins and the laboratory resurrections of Precambrian thioredoxins studied in this work. See also Table S1 for root-mean-square deviation and sequence identity values for all thioredoxin structure pairs. Structure Structure of Four Billion-Year-Old Proteins Structure 21 , 1690–1697, September 3, 2013 ª 2013 Elsevier Ltd All rights reserved 1691
End of preview
A minimal sequence code for switching protein structure and function Patrick A. Alexander, Yanan He, Yihong Chen, John Orban, and Philip N. Bryan 1 Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850 Edited by Adriaan Bax, National Institutes of Health, Bethesda, MD, and approved October 12, 2009 (received for review June 9, 2009) We present here a structural and mechanistic description of how a protein changes its fold and function, mutation by mutation. Our approach was to create 2 proteins that ( i ) are stably folded into 2 different folds, ( ii ) have 2 different functions, and ( iii ) are very similar in sequence. In this simpliFed sequence space we explore the muta- tional path from one fold to another. We show that an IgG-binding, 4 b 1 a fold can be transformed into an albumin-binding, 3- a fold via a mutational pathway in which neither function nor native structure is completely lost. The stabilities of all mutants along the pathway are evaluated, key high-resolution structures are determined by NMR, and an explanation of the switching mechanism is provided. We show that the conformational switch from 4 b 1 a to 3- a structure can occur via a single amino acid substitution. On one side of the switch point, the 4 b 1 a fold is > 90% populated (pH 7.2, 20 °C). A single mutation switches the conformation to the 3- a fold, which is > 90% populated (pH 7.2, 20 °C). We further show that a bifunctional protein exists at the switch point with afFnity for both IgG and albumin. evolution u NMR u protein design u protein folding P rotein molecules are capable of self-organizing into 3D topol- ogies that create biologic functions. The fundamental principles of how the sequence of amino acids in a protein determines its structure remain poorly understood, however, despite its central importance to biology. The primary approach to the ‘‘folding problem’’ has been to determine a detailed structural and energetic description of the equilibrium between the native state and the random population of disordered, unfolded states. It is well known that the equilibrium between folded and unfolded can be radically shifted in either direction with a few mutations. There is accumu- lating evidence, however, that a few mutations sometimes can dramatically shift the equilibrium into new tertiary (and/or quater- nary) structures (1, 2). Understanding the capacity of a protein to acquire a completely different structure as a result of minor mutagenic perturbation is central to understanding both protein folding in general and more specifically how new protein structures and functions evolve. Most natural proteins populate only the native state significantly, with D G unfolding $ 5 kcal/mol. It is also generally assumed that many mutations are required to shift the equilibrium such that D G unfolding for some alternative state is $ 5 kcal/mol. This assumption underpins most bioinformatics methods, in fact. Most mutations in a protein that increase its propensity toward an alternative fold destabilize the original fold. Thus, it seems intuitive that a pathway of single amino acid substitutions would result in a long series of mutants that would be unfolded before enough folding information accumulates to significantly populate an alter- native fold. Both natural and engineered examples demonstrate, however, that the sequence space separating 2 proteins with dif- ferent structures can be quite small (3–5). To understand this seemingly paradoxical situation, one needs to methodically examine the sequence space separating 2 stable folds. In concept this is simple. One begins with 2 stable proteins of similar size but different folds and mutates one to be more like the other until a switch in structure occurs. In practice this approach is not trivial, however. Any mutation in a protein will change the context of other amino acids. This is the essence of the folding problem. Our approach, therefore, was to create a simplified sequence space in which the mutational path from one fold to another can be explored and shifts in the equilibrium between the 2 folded states (and unfolded states) can be measured as a function of mutation. Previously we and others have studied the structure, folding, and stability of 2 binding domains of Streptococcus protein G (6). Protein G contains 2 types of domains that bind to serum proteins in blood: the G A domain of 45 structured amino acids that bind to human serum albumin (HSA) (7, 8), and the G B domain of 56 structured amino acids that bind to the constant (Fc) region of IgG (9, 10). The natural versions of G A and G B domains share no significant sequence homology and have different folds, 3- a and 4 b 1 a , respectively. From these studies we have been able to create high-identity versions of G A and G B , which have wild-type stabilities and binding function but which are 77% identical. These proteins are denoted G A 77 and G B 77. G A 77 binds to HSA with a K d 5 100 nM and has a D G unfolding of 5 kcal/mol (20 °C, 0.1 M KPO 4 , pH 7.2) (11). Amino acids 1–8 and 54–56 are disordered in G A 77. The remaining 45 aa are well ordered in a 3- a helix bundle (12). G B 77 binds to the constant (Fc) region of IgG with a K d 5 100 nM and has a D G unfolding of 5 kcal/mol (20 °C, 0.1 M KPO 4 , pH 7.2) (13, 14). A l l56aao fG B 77 are well ordered in a 4-stranded b -sheet with an a -helix connecting strands 2 and 3 (12). The proteins were engineered such that the IgG and HSA binding epitopes are encoded in both proteins. The IgG-binding epitope is functional in the 4 b 1 a fold and latent in the 3- a fold, whereas the albumin-binding epitope is functional in the 3- a fold and latent in the 4 b 1 a fold. This results in an experimental system in which unmasking the latent function is linked to a switch in conformation (Fig. 1). This work is described in refs. 3, 4, 11, 12, and 15. The fact that G A 77 and G B 77 are so close in mutational space greatly simplifies subsequent searches of the sequence space that separates them. The context problem is not eliminated but is reduced to a practicable level. This allowed a systematic exploration of the sequence space separating these 2 functional folds. Results The positions of nonidentity between the G A 77 and G B 77 proteins are shown in Fig. 1. Our approach was to explore binary sequence space (choice of either the G A or G B amino acid at positions of nonidentity). Obviously making 13 sequential substitutions in any order for the corresponding amino acid at a position of nonidentity will result in a conformational switch. Finding the path with the fewest unstructured intermediates required a systematic approach, however. We first analyzed all 13 single-site mutants in G A 77 and G B 77. We were able to produce and purify folded proteins for approximately half of these mutants. Proteins were purified using an affinity-cleavage tag system that we developed (16), essentially Author contributions: P.A.A., Y.H., J.O., and P.N.B. designed research; P.A.A., Y.H., Y.C., J.O., and P.N.B. performed research; P.A.A., Y.H., J.O., and P.N.B. analyzed data; and P.N.B. wrote the paper. The authors declare no conFict of interest. This article is a PNAS Direct Submission. ±reely available online through the PNAS open access option. See Commentary on page 21011. 1 To whom correspondence should be addressed. E-mail: [email protected] This article contains supporting information online at www.pnas.org/cgi/content/full/ 0906408106/DCSupplemental . www.pnas.org y cgi y doi y 10.1073 y pnas.0906408106 PNAS u December 15, 2009 u vol. 106 u no. 50 u 21149–21154 BIOPHYSICS AND COMPUTATIONAL BIOLOGY SEE COMMENTARY
as described in ref. 3. The system enabled the rapid, standardized purification of mutant proteins, even of low stability. Mutants were characterized by circular dichroism (CD) to assess secondary structure ( Fig. S1 ), thermal denaturation by CD to assess stability ( Fig. S2 ), the ability to bind HSA and IgG to assess function, and by 2D heteronuclear single quantum correlation (HSQC) spectra using NMR to assess tertiary structure. Monomeric state was established using size exclusion chromatography and multiple-angle laser-light scattering. High-resolution structures were determined by standard 3D NMR methods for key proteins. Midpoints of thermal denaturation (T M ) (0.1 M KPO4, pH 7.2, 20 °C) are reported for all folded mutants in Table 1. We found that every position in one fold or the other could be mutated without uncoding the native structure. Key mutants in the pathway to a conforma- tional switch are shown in Fig. 2. A heteromorphic pair of 88% identity (G A 88 and G B 88b) was found in which stability and function remain similar to some naturally occurring IgG and HSA binding domains. Assembling heteromorphic pairs of greater than 88% identity was also possible, although additional mutation causes stability to fall below the threshold observed for most natural proteins. The pair G A 91 and G B 91 have a D G unfolding of . 3 kcal/mol at 20 °C. Both proteins show undiminished binding affinity to their respective ligands and are monomeric. The binary mutational space separating G A 91 and G B 91 com- prises only 32 sequences. We constructed, expressed, and purified 17 of these variants to effectively sample the sequence space (Tables 1 and S1 ). The main observations are as follows. The mutation Y33I has little effect on the stability of the 4 b 1 a fold ( 2 0.1 kcal/mol), and the mutation L50K has the least effect on the stability of the 3- a fold ( 2 1.0 kcal/mol). Mutation at position 20, 33, or 45 is not tolerated in the 3- a fold in any of the 32 contexts ( Table S1 ). Mutation at position 30, 45, or 50 is not tolerated in the 4 b 1 a fold in any of the contexts. The only position that cannot be changed in either fold (without unfolding it) was position 45. Eight of the 17 proteins were predominantly folded into one of the native struc- tures: four were 3- a and 4 were 4 b 1 a . Of the 9 ‘‘unfolded’’ proteins, all were purified and can be seen by CD to have significant secondary structure content. The exact nature of this residual structure can probably be determined in the future using 3D NMR techniques. The variants denoted G A 95 and G B 95 differ only at positions 20, 30, and 45, yet are fully folded, with D G unfolding of ' 3 kcal/mol at 20 °C ( Fig. S2 ). Both proteins show binding affinity to their respective ligands (K D , 1 m M) and are monomeric. High- resolution structures of these proteins were determined to better understand how so few amino acids control the conformational switch. This is discussed in detail below. Statistics for the G B 95 and G A 95 ensembles of 20 structures are shown in Table S2 . Protein Data Bank accession codes for G A 95 and G B 95 are 2KDL and 2KDM, respectively. Mutation of I30F in G A 95 (G A 98) and A20L in G B 95 (G B 98) leads to a heteromorphic pair differing at only 1 amino acid. G A 98 is folded into the 3- a conformation ( . 90% populated, pH 7.2, 20 °C), and G B 98 is folded into the 4 b 1 a conformation ( . 90% populated, pH 7.2, 20 °C). The CD spectra of G A 98 and G B 98 are essentially identical to the spectra of their parent G A and G B proteins ( Fig. S1 ). The assigned HSQC spectra of G A 98 and G B 98 are compared in Fig. 3. G A 98 exhibits diminished affinity for HSA (K D . 1 m M) but has acquired affinity for IgG (K D , 1 m M). G B 98 binds tightly to IgG but not HSA. The ability of G A 98 to bind IgG as well as HSA may reveal a hidden propensity to switch into the 4 b 1 a conformation and unmask the IgG-binding epitope. The population of the 4 b 1 a conformation is too low to detect in the unliganded state of G A 98, but IgG binding may shift the equilibrium away from 3- a and toward 4 b 1 a . It is also possible that G A 98 binds IgG via a new mechanism. This will be sorted out using NMR to map the binding epitope (17). In either case, exploration of binary sequence space shows that one functional protein can switch into a completely different conformation with a different function via a mutational pathway in which neither function nor native structure is completely lost. The G A 98 and G B 98 proteins are only marginally stable, but restoration of close to wild-type stability and function in either direction can be attained with only 3 additional mutations (e.g., to G A 88 and G B 88b). Overall Change in Topology. High-resolution structures were deter- mined for G A 95 and G B 95 and provide insight into the structural Fig. 1. G A 77 in complex with HSA (from 1TFO.pdb) (36) and G B 77 in complex in IgG (from 1fcc.pdb) (37). The side chains of amino acids at the 13 positions of nonidentity are depicted as yellow sticks. 21150 u www.pnas.org y cgi y doi y 10.1073 y pnas.0906408106 Alexander et al.
End of preview
BCH 335 Name:_________________________________ Writng PrompT 1: ProTein STrucTure Predicton (30 pTs) 1.) Read The following Two papers and synThesize a cohesive argumenT ThaT discusses The seemingly incompatble sTaTemenTs: “Sequence deFnes fold” and “±old is more conserved Than sequence.” Discuss how This paradigm relaTed To The predicton of Tertary sTrucTure. Please limiT argumenT To 2 pages double spaced. Alexander, eT al. A minimal sequence code for swiTching proTein sTrucTure and functon. PNAS. 2009. Vol. 106. Pgs. 21149-21154 Ingles-PrieTo, eT al. Conservaton of ProTein STrucTure over ±our Billion Years. STrucTure. 2013. Vol. 21. Pgs. 1690- 1697.
Answered by Expert Tutors
204,222 students got unstuck by Course
Hero in the last week
Our Expert Tutors provide step by step solutions to help you excel in your courses