lecture8_S2009 - 18.417: Introduction to Computational...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 18.417: Introduction to Computational Structural Biology Evolution of RNA sequences Jerome Waldispuhl Department of Mathematics, MIT Principles Central assumptions: • The structure of a sequence is only determined by its (minimum) free energy. •The structure determines the function. •Evolution tends to preserve and optimize the function. Figure from (Cowperthwaite&Meyers,2007) Sequence evolution For short sequences, the set of evolutionary operations can be restricted to: • Insertion • Insertion/Deletion • Mutation We also limit their effect to single nucleotides. Figure from (Gobel,2000) Modeling the mutation landscape When the length of the sequence is fixed, the set of operations can be restricted to mutations. The mutation landscape is represented with Hamming graphs, where nodes are the sequences and edges connect sequences differing from one single nucleotide (i.e. 1 mutation). Figure from (Gobel,2000) Fitness model Objective: Evaluate the dynamic of the evolution of shapes. Requirement: a metric to compare a predicted structure and a target shape. Models: • simple: The predicted structure is the m.f.e. structure. • plastic: Suboptimal structures can be considered. Figure from (Cowperthwaite&Meyers,2007) Structure comparison Hamming distance: Base pair distance: Base pair distance is the standard. It corresponds to the number of base pairs we have to remove and add to obtain one structure from the other. Both metrics have to be applied on structures of equal length. Figure from (Schuster&Stadler,2007) Neutral network Genotype network Phenotype network • A structure is associated to each node (sequence) of the Hamming graph. • Nodes connected and labeled with the same structure form a neutral network. • Introduced by P.Schuster and Vienna group in 1992. Figure from (Cowperthwaite&Meyers,2007) Compatible mutations and structures • Mutations in neutral networks must conserve the phenotype. • But it is hard to decide if a mutation conserve the m.f.e. structure and hence the phenotype. • The networks have been explored through simulations. • The number of acceptable structures can be recursively computed: Hairpin minimum length required and length of stacks bounded . Figure from (Gobel,2000) Role of neutral networks • Evolution tends to select mutations improving the structure. • A smooth landscape (few maxima) favors the strategy. • Facilitate evolution by allowing populations to explore genotype space while structure is preserved. Figure from (Gobel,2000) Properties of neutral networks • More sequences than structures. • Few common and many rare structures. • Distribution of neutral genotype is approximately random. • Neutral networks are connected unless specific features of RNA structure. • The fraction of neutral neighbors < > characterizes the neutral networks. Theory predicts a phase transition in their structures with c=1-k-1/(k-1). < >< c << c: many isolated parts and one giant component. >: generally connected. • Few mutations almost certainly lead to a change of the structure. • The number of disjoint components in a phenotype s neutral network does not appear to correlate with its abundance. Neutral network and shape space covering: Examples Full neutral network of GC sequence space with length=30. u: fraction of neutral mutations in unpaired regions. p: fraction of neutral mutations in paired regions. Grey: fragmented networks ( x below threshold). Red: 4 connected components ( x above threshold ). Shape space covering radius (radius of sphere containing in average at least one sequence per possible structure) Data from (Gruner et al.,1999) Figure from (Hofacker&Stadler,2006) Comparison of exhaustively folded sequence spaces Values computed on five different alphabets: GC, UGC, AUG, AU. Structures with a single base pair are excluded from the enumeration. Data from (Schuster&Stadler,2007) Estimation of the degree of neutrality on tRNAs Fraction of neutral neighbors (degree of neutrality) computed from 1,000 random sequences fitting the structures using an inverse folding algorithm. • Weak structure depence. • Different network structures for 2 ( c=0.5) and 4-letter alphabets ( c=0.37). Data from (Schuster&Stadler,2007) Length of neutral paths • Neutral paths connects neutral sequences differing with 1 mutations. • Hamming distance from the origin strictly increase along the path. • Path ends when all neighbors are closer to the reference sequence. Data computed from 1,200 random sequences of length 100. It demonstrates the influence of multiple constraints on neutrality. Provides explanation why functional tRNAs tolerate very limited variability of the sequence (Unlike ribozymes of Schultes&Bartel). Data from (Schuster&Stadler,2007) Properties of phenotype networks • Nodes are structures. • Connect two nodes A,B if it exists 2 sequences a,b with phenotypes A,B that differ from 1 mutation. • Highly irregular, with few nodes connected to many others and most nodes connected to few others ( standard assumption used in population genetics). • Abundant shapes are connected to almost every other shapes (c.f. shape space covering). • The degree of mutational connectivity is not a binary properties. It exists some preferential connections. Moreover, these connections are always asymmetrical. • Plastic model showed that neutral networks are not homogeneous. Probability of the m.f.e. structure in the suboptimal ensemble varies. Most thermodynamically stable sequence lies in the center of the neutral network. Evolutionary dynamics • Exploration of the sequence/structure network through simulations. • Populations evolving toward a target shape experience long period of phenotypic stasis and short periods of rapid changes. • On large neutral networks, the population subdivides in several subpopulations exploring different regions of the network. • Size of neutral network increase the probability of evolving to this particular phenotype and/or from this phenotype to another one. • The needle in the haystack: Population evolving on large neutral network do not adapt more quickly than those evolving on smaller networks (due to a larger search space). Phenotype abundance of “real” RNAs • The phenotypic abundance correlates (b) with the contiguity statistic measure (a). • This estimator is used to evaluate the abundance of phenotypes in RNAs from Rfam. • Higher values on Rfam than on sequence with the same length and base composition. Figure from (Cowperthwaite&Meyers,2007) Evolutionary dynamics • Model favors mutations evolving toward the target shape. • Short period of rapid phenotypic changes are punctuated by long period of stasis. • Two types of transitions: Continuous (nearby phenotypes) and Discontinuous (radical change). • Continuous transitions appear essentially in initial period of the simulation, while discontinuous transitions are predominant later. • Phenomena mediated through neutral drifts (genotype that can change radically the phenotype through a single mutation). But these sequence are hard to find. Figure from (Cowperthwaite&Meyers,2007) Genetic robustness • Sequences carrying phenotypes should be robust to environmental and genetic perturbations. • Unlike Environment robustness, genetic robustness is hard to justify. 3 potential scenario: a. Adaptive robustness: natural selection. b. Intrinsic robustness: correlated byproduct of character selection. c. Congruent robustness: correlated byproduct of selection for environmental robsustness. • Adaptive robustness (a) is possible. Trans-generational cost of deleterious mutations drives sequence in the heart of neutral network. • Congruent robustness (c) is tested using the plastic model. Simulations showed that models targeting a shape lead to a reduction of plasticity. Also, they highlight a slow-down and possible halting of the evolutionary process. • Reduction of plasticity leads to an extreme modularity (side-effect?). Plastogenetic congruence (1) A A’: makes the m.f.e. (2) A B: makes stronger, exits . (3) B B’: same mutation brings back , but keeps on top. (1) correlates structures in the plastic repertoire to mutational neighbors. ( 3) shows the epistatic control of neutrality. The more time spent in m.f.e., the higher the fraction of neutral neighbors. • “plastogenetic congruence”: the set of shapes realized by a sequence correlates to the m.f.e. shapes of 1-mutants. • RNAs insensitive to thermal noise are also insensitive to mutations. List suboptimal structures and weight them by the time spent by the molecule in that fold (energy). Figure from (Ancel&Fontana,2000) Survival of the flattest • How mutation rates (rapidity of mutations) shape evolution? • Under low mutation rates, fitness considerations dictate dynamics. • Under high mutation rates, the breadth of the neutral network can be as more important as the fitness: the survival of the flattest. • Simulations showed that populations having evolved under low mutation rates have a better adaptation potential than populations having always evolved under a high mutation rate (Wilke et al.,2001). • Genotypes located in flatter regions are more robust to mutations. Local mutational structure • Theory and computational experiments differ on the distribution of beneficial mutations. While the beneficial effect of mutations is predicted to be exponentially distributed, in-silico experiments showed an overabundance of small-effect mutations. • Although they tend to be eliminated, at high mutation rates deleterious mutations (mutations changing radically the structure) are fixed through compensatory evolution. In other words evolution tends to “repair” the damages… sometimes even before. • Epistasis regulates the effect of mutations. Epistasis in RNA model • Presence, magnitude and direction of epistasis are key elements of many evolutionary theories. • Antagonistic epistasis: simultaneous mutations produce a smaller effect than their individual sum. • Synergistic epistasis: effect is greater than the sum. • In RNA models Antagonistic epistasis seems dominant. The rate of fitness decrease with the accumulation of deleterious mutations, regardless of their order. Figure from (Cowperthwaite&Meyers,2007) ...
View Full Document

This note was uploaded on 06/16/2011 for the course MATH 18.417 taught by Professor Jérômewaldispühl during the Spring '11 term at MIT.

Ask a homework question - tutors are online