This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 18.417: Introduction to
Computational Structural Biology
Evolution of RNA sequences Jerome Waldispuhl
Department of Mathematics, MIT Principles
• The structure of a sequence is
only determined by its (minimum)
•The structure determines the
•Evolution tends to preserve and
optimize the function. Figure from (Cowperthwaite&Meyers,2007) Sequence evolution For short sequences, the set of
evolutionary operations can be
We also limit their effect to single
nucleotides. Figure from (Gobel,2000) Modeling the mutation landscape
When the length of the sequence is fixed, the set of operations can be
restricted to mutations. The mutation landscape is
represented with Hamming
graphs, where nodes are the
sequences and edges
connect sequences differing
from one single nucleotide
(i.e. 1 mutation). Figure from (Gobel,2000) Fitness model
Objective: Evaluate the
dynamic of the evolution of
Requirement: a metric to
compare a predicted
structure and a target
• simple: The predicted
structure is the m.f.e.
• plastic: Suboptimal
structures can be
Figure from (Cowperthwaite&Meyers,2007) Structure comparison
Hamming distance: Base pair distance: Base pair distance is the standard. It corresponds to the number of base pairs we
have to remove and add to obtain one structure from the other. Both metrics have
to be applied on structures of equal length.
Figure from (Schuster&Stadler,2007) Neutral network
Genotype network Phenotype network • A structure is associated to each node (sequence) of the Hamming graph.
• Nodes connected and labeled with the same structure form a neutral network.
• Introduced by P.Schuster and Vienna group in 1992.
Figure from (Cowperthwaite&Meyers,2007) Compatible mutations and structures
• Mutations in neutral networks must conserve
• But it is hard to decide if a mutation conserve
the m.f.e. structure and hence the phenotype.
• The networks have been explored through
• The number of acceptable structures can be
recursively computed: Hairpin minimum length required and length of stacks bounded .
Figure from (Gobel,2000) Role of neutral networks • Evolution tends to select mutations improving the structure.
• A smooth landscape (few maxima) favors the strategy.
• Facilitate evolution by allowing populations to explore genotype space while
structure is preserved. Figure from (Gobel,2000) Properties of neutral networks
• More sequences than structures.
• Few common and many rare structures.
• Distribution of neutral genotype is approximately random.
• Neutral networks are connected unless specific features of RNA structure.
• The fraction of neutral neighbors < > characterizes the neutral networks. Theory
predicts a phase transition in their structures with c=1-k-1/(k-1).
c << c: many isolated parts and one giant component. >: generally connected. • Few mutations almost certainly lead to a change of the structure.
• The number of disjoint components in a phenotype s neutral network does not
appear to correlate with its abundance. Neutral network and shape space
covering: Examples Full neutral network of GC sequence space with length=30.
u: fraction of neutral mutations in unpaired regions.
p: fraction of neutral mutations in paired regions.
Grey: fragmented networks ( x below threshold).
Red: 4 connected components ( x above threshold ). Shape space covering radius (radius of sphere
containing in average at least one sequence per
possible structure) Data from (Gruner et al.,1999)
Figure from (Hofacker&Stadler,2006) Comparison of exhaustively folded
sequence spaces Values computed on five different alphabets: GC, UGC, AUG, AU.
Structures with a single base pair are excluded from the enumeration.
Data from (Schuster&Stadler,2007) Estimation of the degree of neutrality
on tRNAs Fraction of neutral neighbors (degree of neutrality) computed from 1,000
random sequences fitting the structures using an inverse folding algorithm. • Weak structure depence.
• Different network structures for 2 ( c=0.5) and 4-letter alphabets ( c=0.37).
Data from (Schuster&Stadler,2007) Length of neutral paths
• Neutral paths connects neutral sequences differing with 1 mutations.
• Hamming distance from the origin strictly increase along the path.
• Path ends when all neighbors are closer to the reference sequence. Data computed from 1,200 random sequences of length 100.
It demonstrates the influence of multiple constraints on neutrality.
Provides explanation why functional tRNAs tolerate very limited variability of the sequence
(Unlike ribozymes of Schultes&Bartel).
Data from (Schuster&Stadler,2007) Properties of phenotype networks
• Nodes are structures.
• Connect two nodes A,B if it exists 2 sequences a,b with
phenotypes A,B that differ from 1 mutation. • Highly irregular, with few nodes connected to many others and most nodes connected to
few others ( standard assumption used in population genetics).
• Abundant shapes are connected to almost every other shapes (c.f. shape space covering).
• The degree of mutational connectivity is not a binary properties. It exists some preferential
connections. Moreover, these connections are always asymmetrical.
• Plastic model showed that neutral networks are not homogeneous. Probability of the m.f.e.
structure in the suboptimal ensemble varies. Most thermodynamically stable sequence lies
in the center of the neutral network. Evolutionary dynamics
• Exploration of the sequence/structure network through simulations.
• Populations evolving toward a target shape experience long period of phenotypic stasis
and short periods of rapid changes.
• On large neutral networks, the population subdivides in several subpopulations exploring
different regions of the network.
• Size of neutral network increase the probability of evolving to this particular phenotype
and/or from this phenotype to another one.
• The needle in the haystack: Population evolving on large neutral network do not adapt
more quickly than those evolving on smaller networks (due to a larger search space). Phenotype abundance of “real” RNAs
• The phenotypic abundance correlates (b) with the contiguity statistic measure (a).
• This estimator is used to evaluate the abundance of phenotypes in RNAs from Rfam.
• Higher values on Rfam than on sequence with the same length and base composition. Figure from (Cowperthwaite&Meyers,2007) Evolutionary dynamics
• Model favors mutations evolving toward
the target shape.
• Short period of rapid phenotypic
changes are punctuated by long period
• Two types of transitions: Continuous
(nearby phenotypes) and Discontinuous
• Continuous transitions appear essentially in initial period of the simulation, while
discontinuous transitions are predominant later.
• Phenomena mediated through neutral drifts (genotype that can change radically the
phenotype through a single mutation). But these sequence are hard to find.
Figure from (Cowperthwaite&Meyers,2007) Genetic robustness
• Sequences carrying phenotypes should be robust to environmental and genetic perturbations.
• Unlike Environment robustness, genetic robustness is hard to justify. 3 potential scenario:
a. Adaptive robustness: natural selection.
b. Intrinsic robustness: correlated byproduct of character selection.
c. Congruent robustness: correlated byproduct of selection for environmental robsustness.
• Adaptive robustness (a) is possible. Trans-generational cost of deleterious mutations drives
sequence in the heart of neutral network.
• Congruent robustness (c) is tested using the plastic model. Simulations showed that models
targeting a shape lead to a reduction of plasticity. Also, they highlight a slow-down and possible
halting of the evolutionary process.
• Reduction of plasticity leads to an extreme modularity (side-effect?). Plastogenetic congruence
(1) A A’: makes the m.f.e.
(2) A B: makes stronger, exits .
(3) B B’: same mutation brings back ,
but keeps on top.
(1) correlates structures in the plastic
repertoire to mutational neighbors.
( 3) shows the epistatic control of
neutrality. The more time spent in m.f.e.,
the higher the fraction of neutral neighbors.
• “plastogenetic congruence”: the set of
shapes realized by a sequence correlates to
the m.f.e. shapes of 1-mutants.
• RNAs insensitive to thermal noise are
also insensitive to mutations.
List suboptimal structures and weight them by the
time spent by the molecule in that fold (energy). Figure from (Ancel&Fontana,2000) Survival of the flattest
• How mutation rates (rapidity of mutations) shape evolution?
• Under low mutation rates, fitness considerations dictate dynamics.
• Under high mutation rates, the breadth of the neutral network can be as more important
as the fitness: the survival of the flattest.
• Simulations showed that populations having evolved under low mutation rates have a
better adaptation potential than populations having always evolved under a high
mutation rate (Wilke et al.,2001).
• Genotypes located in flatter regions are more robust to mutations. Local mutational structure
• Theory and computational experiments differ on the distribution of beneficial mutations.
While the beneficial effect of mutations is predicted to be exponentially distributed, in-silico
experiments showed an overabundance of small-effect mutations.
• Although they tend to be eliminated, at high mutation rates deleterious mutations
(mutations changing radically the structure) are fixed through compensatory evolution. In
other words evolution tends to “repair” the damages… sometimes even before.
• Epistasis regulates the effect of mutations. Epistasis in RNA model
• Presence, magnitude and direction of
epistasis are key elements of many
• Antagonistic epistasis: simultaneous
mutations produce a smaller effect than
their individual sum.
• Synergistic epistasis: effect is greater
than the sum.
• In RNA models Antagonistic epistasis
seems dominant. The rate of fitness
decrease with the accumulation of
deleterious mutations, regardless of their
order. Figure from (Cowperthwaite&Meyers,2007) ...
View Full Document
This note was uploaded on 06/16/2011 for the course MATH 18.417 taught by Professor Jérômewaldispühl during the Spring '11 term at MIT.
- Spring '11