You've reached the end of your free preview.
Want to read all 26 pages?
Unformatted text preview: Journal of Economic Perspectives—Volume 25, Number 4—Fall 2011—Pages 57–82 Molecular Genetics and Economics†
Jonathan P. Beauchamp, David Cesarini, Magnus
Johannesson, Matthijs J. H. M. van der Loos,
Philipp D. Koellinger, Patrick J. F. Groenen, James
H. Fowler, J. Niels Rosenquist, A. Roy Thurik, and
Nicholas A. Christakis T he question of how traits and behaviors pass from one generation to the
next has been the subject of intense interest throughout the history of
science. Simple parent–child correlations are open to multiple interpretations, as parents transmit both environment and genome to their children. Until
recently, genotyping—or the direct measurement of variation in an individual’s DNA
sequence through biological assays—was exorbitantly expensive; distinguishing the
roles of genetics and environment was the realm of behavioral genetics, in which Jonathan P. Beauchamp just completed his Ph.D. in Economics at Harvard University,
Cambridge, Massachusetts. David Cesarini is Assistant Professor of Economics, Center
for Experimental Social Science, New York University, New York City, New York, and an
Affiliated Researcher, Institute for Industrial Economics (IFN), Stockholm, Sweden. Magnus
Johannesson is Professor of Economics, Stockholm School of Economics, Stockholm, Sweden.
Matthijs J. H. M. van der Loos is a Ph.D. candidate in Applied Economics, Erasmus
School of Economics, Rotterdam, Netherlands. Philipp D. Koellinger is Assistant Professor
of Economics, Erasmus School of Economics, Rotterdam, Netherlands. Patrick J. F. Groenen
is Professor of Statistics, Erasmus School of Economics, Rotterdam, Netherlands. James H.
Fowler is Professor of Medical Genetics and Political Science, University of California at San
Diego, La Jolla, California. J. Niels Rosenquist is Instructor at Harvard Medical School
and the Massachusetts General Hospital’s Psychiatric and Neurodevelopmental Genetics
Unit and a Research Fellow at the Institute for Quantitative Social Science at Harvard,
Cambridge, Massachusetts. A. Roy Thurik is Professor of Economics and Entrepreneurship,
Erasmus School of Economics, Rotterdam, Netherlands. Nicholas A. Christakis is Professor
of Medicine and of Medical Sociology, Harvard Medical School, Cambridge, Massachusetts.
The corresponding author is 〈[email protected]
[email protected]〉〉.
■ † To access the Appendix, visit .
doi=10.1257/jep.25.4.57 58 Journal of Economic Perspectives samples of twin, adoption, or other pedigree data were analyzed. However, with the
completion of the Human Genome Project in the early 2000s (Venter et al., 2001;
Lander et al., 2001) and the advent of inexpensive, genome-wide scans of variation, it is now increasingly feasible to examine specific genetic variants that predict
individual differences directly.
In fact, the costs of comprehensively genotyping human subjects have fallen
to the point where major funding bodies, even in the social sciences, are beginning to incorporate genetic and biological markers into major social surveys. The
National Longitudinal Study of Adolescent Health, the Wisconsin Longitudinal
Study, and the Health and Retirement Survey have launched, or are in the process
of launching, datasets with comprehensively genotyped subjects. Similar efforts
are also underway in Europe, for example with the Biobank Project in the United
Kingdom (Ollier, Sprosen, and Peakman, 2005) and the large-scale genotyping of
subjects at several European twin registries. These samples contain, or will soon
contain, data on hundreds of thousands of genetic markers for each individual in
the sample as well as, in most cases, basic economic variables. How, if at all, should
economists use and combine molecular genetic and economic data? What challenges arise when analyzing genetically informative data?
In this article, we lay out the terrain for such questions. We use the term
“genoeconomics,” originally proposed by Benjamin et al. (2007), to describe the
use of molecular genetic information in economics. To illustrate some of the challenges that researchers in this field are likely to encounter, we present results from
a “genome-wide association study” of educational attainment, one of the first of its
kind in economics.1 This type of study involves analyzing hundreds of thousands
of genetic markers and seeking to understand their association with some trait of
interest. We use a sample of 7,500 individuals from the Framingham Heart Study.
After quality controls, our dataset contains over 360,000 genetic markers per person.
Despite some initially promising results, the main findings from this dataset fail to
replicate in a second large replication sample of 9,500 people from the Rotterdam
Study, suggesting that the original results were probably spurious. These findings
are unfortunately typical in these types of studies of molecular genetics and therefore also cautionary.
The frequent replication failures in this molecular genetics literature are
likely a result of several forces, the most important of which is probably that the
samples used in research are too small to ensure that there is adequate power
to detect true associations (Ioannidis, 2005, 2007). When true effect sizes are
small, the power to detect true associations will of course be poor, and the ratio
of true to false signals will hence be low. Indeed, an important implication of the
genome-wide association study results reported in this paper is that they confirm 1 Preliminary results from a genome-wide association study of educational attainment have been reported
by Posthuma et al. (2008), Beauchamp, Cesarini, Rosenquist, Fowler, and Christakis (2010), and most
recently by Martin at al. (2011). A genome-wide association study of self-employment has been initiated
by van der Loos, Koellinger, Groenen, and Thurik (2010). Jonathan P. Beauchamp et al. 59 that common genetic variants with large main effects are likely to be extremely
rare for economic variables, which tend to be far removed from the molecular
genetic variant in the chain of causation. We perform power analyses to demonstrate this point and show that under plausible assumptions about the effect sizes
of a specific type of common variation in the human genome, samples in the tens
of thousands, perhaps more, may be necessary to detect genetic influences on
most complex economic variables in a robust manner. This insight suggests that
most existing genoeconomic studies, which are based on samples in the hundreds,
are dramatically underpowered and that we should expect a high false discovery
rate until this is remedied. Our choice of educational attainment as the outcome
variable of study was determined by the widespread availability of this characteristic in cohorts that have already been genotyped. An important next step of a
successful genoeconomic research agenda is to start measuring more biologically
proximate variables—such as preferences—in large samples. Variables which are
less distant from the genome in the chain of causation are more likely to require
smaller samples in order for genetic associations to be detected reliably, and any
detected associations are more likely to have a biologically meaningful interpretation and economically meaningful implications.
The empirical results in this paper are also used to discuss several other methodological issues that arise in the analysis of molecular genetic data. Our overall
assessment is cautiously optimistic: this new data source has the potential not only
to complement traditional behavioral genetic studies but also to add a new dimension to our understanding of heterogeneity in economic behaviors and outcomes,
especially when it comes to traits that are close to the underlying biology. But for
this ample potential to be realized, researchers and consumers of this literature
should be wary of the pitfalls that lie ahead (Benjamin et al., 2007). The most
urgent of these challenges is the difficulty of doing reliable inference when faced
with multiple hypothesis problems, which are on a scale that has never before been
encountered in social science. Behavioral Genetics
Over the past few decades, behavioral geneticists have produced a compelling
array of evidence that genetic endowments influence economic behaviors, outcomes,
and preferences. The general approach in these studies is to make assumptions
about the extent to which the different sibling types share genetic and environmental
conditions and infer the fraction of variance that can be statistically accounted for
by genetic variation (heritability, denoted h 2), rearing conditions (common environment, denoted c 2), and idiosyncratic factors (unique environment, denoted e 2).
These studies often compare the resemblance of adoptees reared in the same
family to that of biological siblings reared in the same family, or the resemblance of
identical (monozygotic) twins, who share their entire genomes, to that of fraternal
(dizygotic) twins, who share approximately half their genomes. Sacerdote (2010) 60 Journal of Economic Perspectives provides an accessible introduction for economists. A standard textbook is Plomin,
DeFries, McClearn, and McGuffin (2008).
The simplest behavioral genetic model is based on a host of strong assumptions,
including the independence of genetic and family effects, and functional form
assumptions; it fails to take assortative mating and nonlinear genetic effects into
account. In the 1970s, when the debate between environmentalists and hereditarians reached its peak, there was much controversy over whether the high heritability
estimates, especially for IQ, were artifacts of the simplistic behavioral genetic
framework that would go away in more elaborate designs and with better data. In
response, behavioral geneticists have built much richer datasets and expanded their
models, relaxing the various problematic assumptions. They have consistently found
that personality, IQ, and most other traits remain highly, or at least moderately,
correlated with genetic endowments (Bouchard and McGue, 2003). In fact, the
consensus in behavioral genetics that virtually all traits are associated with genotype
is so strong that it has been elevated to the status of a “law” (Turkheimer, 2000).2
Economic behaviors, preferences, and outcomes are no exception. Behavioral
genetic methods originally made limited inroads into economics through the work
of Taubman and coauthors (for example, Taubman, 1976), who demonstrated that
genetically identical (monozygotic) twins exhibit greater similarity than fraternal
(dizygotic) twins in both educational attainment and income. Since then, a number
of papers have followed suit in applying behavioral genetic research designs to
the study of economic outcomes. Many of these studies rely on quasi-experiments
such as adoption (Sacerdote, 2007; Björklund, Lindahl, and Plug, 2006; Björklund,
Jäntti, and Solon, 2005), twinning (Taubman, 1976; Lichtenstein, Pedersen, and
McClearn, 1992), or comparisons of multiple sibling types (Björklund, Jäntti,
and Solon, 2005). More recent work has also demonstrated that economic preferences elicited from incentivized experiments or surveys are heritable, with
estimates typically in the 20–30 percent range (Wallace, Cesarini, Johannesson,
and Lichtenstein, 2007; Cesarini, Dawes, Johannesson, Lichtenstein, and Wallace,
2009a, b). These estimates are biased downward because they do not take into
account measurement error in the preference elicitation.3 Two other studies of
portfolio choice data found heritability estimates of about 0.25–0.60 for various
financial decision-making variables (Barnea, Cronqvist, and Siegel, 2010; Cesarini,
Johannesson, Lichtenstein, Sandewall, and Wallace, 2010). 2
Estimates of heritability based on family data, such as twins, have also recently been corroborated by
techniques that utilize molecular genetic data in ingenious ways to estimate heritability (Visscher et al.,
2006; Yang et al., 2010).
3
Adjusting for noise appears to approximately double the heritability estimates (Beauchamp, Cesarini, and Johannesson, 2011). Test retest data (unpublished) has been collected for a sample of
about 100 twins that participated in the experiments reported in Wallace, Cesarini, Johannesson, and
Lichtenstein (2007) and Cesarini, Dawes, Johannesson, Lichtenstein, and Wallace (2009a, b). These data
suggest a test retest correlation of about 0.5. Adjusting for measurement noise would thus approximately
double the heritability estimates. Molecular Genetics and Economics 61 In interpreting heritability estimates, it is crucial to appreciate the possibility
that genetic effects may operate via environmental effects, because genotypes may
either evoke environmental responses or cause an individual to select a particular
environment endogenously (Becker and Tomes, 1979; Dickens and Flynn, 2001;
Fowler, Settle, and Christakis, 2011; Jencks, 1980; Ridley, 2003). This possibility has
given rise to the expression “nature via nurture”—as opposed to “nature versus
nurture.” Estimates of the behavioral genetic model can therefore be thought of as
reduced form coefficients from a more general model in which some environments
are endogenous to genotype (Dickens and Flynn, 2001; Jencks and Brown, 1977;
Jencks, 1980; Lizzeri and Siniscalchi, 2008).
As pointed out by Jencks (1980), a common mistake is to equate “genetic”
with “immutable”: the fact that a person’s DNA sequence is in some sense fixed
does not mean that the effects of that sequence are fixed. Goldberger (1979)
provides several examples of how the implications of heritability estimates have
been misstated and notes that high estimates do not imply that interventions are
doomed to failure. While genetic variation can statistically account for a moderate
to large share of income in contemporary Western societies, this does not mean
that it would be infeasible to use redistributive policies or policies that encourage
human capital formation to change the distribution of income. Heritability is a
population parameter that depends on both the environmental effects operating
in a specific population at a certain point in time and on the genetic variation in
that population. It says little about what would happen to the mean and variance
of the trait were the environment to change. Therefore, there is no contradiction
between observing a high heritability for height, say, and secular increases in height
over time as the environment changes. Heritability estimates do not tell us how
the genetic effects operate, of course, nor do they tell us much about whether the
mechanisms are easy or hard to modify. But far from being useless, as has sometimes
been asserted, heritability estimates tell us that for most traits, a sizable fraction of
the within-family resemblance can ultimately be traced to shared DNA. We suspect
that were it not for the impressive cumulative progress in behavioral genetics over
the last couple of decades, the issue would still be contentious. Figuring out how
and why genetic factors matter is an interesting scientific activity, and molecular
genetic methods are an exciting tool to bring to bear on these questions. Elementary Molecular Genetic Concepts
Molecular genetics is the branch of genetics that studies the structure and
function of DNA at its most basic level. Recent decades have seen major advances,
allowing researchers to better understand the numerous ways in which genomes
vary between individuals. The human genome consists of 23 pairs of chromosomes
that package DNA. One member of each pair of chromosomes is inherited from the
mother and the other from the father. DNA itself consists of two strands of elementary building blocks that together form a double helix structure. The elementary 62 Journal of Economic Perspectives building blocks, called nucleotides, each contain one of four bases—A (adenine),
C (cytosine), T (thymine), or G (guanine)—resulting in four distinct nucleotides.
Due to a property of DNA called complementarity, a nucleotide with the base A is
always paired with a nucleotide with the base T and a nucleotide with the base C
is always paired with a nucleotide with the base G, forming so-called base pairs and
holding the two strands of DNA together.
A locus is a specific position of a DNA sequence on a (pair of) chromosome(s).
A locus thus refers to a pair of base pairs (or nucleotide pairs),
), one base pair coming
from the paternal chromosome and the other base pair coming from the maternal
chromosome. The human genome consists of approximately three billion such
pairs of base pairs arranged into the 23 (pairs of) chromosomes. Because of complementarity, the second base of a base pair can be directly identified from knowledge
of the first one, and so it is common practice to refer to a locus as consisting of two
single bases rather than of a pair of base pairs. For example, the genotype AT-AT
would be referred to as AA or as TT.4
Genes are sequences of nucleotide base pairs that code for some types of
RNA products, many of which in turn code for proteins. These RNA products and
proteins begin cascades of interactions that regulate bodily structures and functions. Only a small portion of the genome actually consists of genes, and both
genetic variation in the genes and in the remaining portion of the genome can
account for variation in behaviors and traits. However, because of genes’ functional
importance, many researchers have focused their attention on genetic variation in
the genes; also, it is often said, loosely, that “a gene causes” a behavior or trait even
though what is meant is that genetic variation at a given locus—often not even in a
gene—accounts for some of the variation in the behavior or trait.
Humans share most, but not all, of their genetic material: approximately
99.6 percent of common genetic variants are the same when comparing any two
unrelated individuals (Kidd et al., 2008). Genetic variation comes in many forms, but
most can be traced to one of two types of mutation events. The simplest mutation
event is a base substitution, in which the base pair of a nucleotide pair is substituted
for another. Whenever a nucleotide varies at a specific locus across individuals in
the population, it is said to be a single nucleotide polymorphism,, or SNP (pronounced
“snip”), with the different genetic variants of a SNP called “alleles.” Most other forms
of genetic variation are due to repeated segments of DNA. In variable number of
tandem repeat (VNTR) polymorphisms, there are differences across individuals in the
number of times that particular short segments of DNA are repeated. In copy number
variation (CNV) polymorphisms, there are differences in the number of repetitions
of a long segment of DNA—of at least 1,000 base pairs and often many more.
Genotyping SNPs and other genetic variants is performed with technology that
allows high-throughput typing of hundreds of thousands of genetic variants per individual. Current technologies type around 500,000 SNPs, but versions with over one
4 For an accessible introduction to the basic concepts in molecular genetics, see Strachan and Read
(2003) or Carey (2003). Jonathan P. Beauchamp et al. 63 million SNPs and other variants are already available, and this number is expected to
increase in the very near future. Within a decade, it will be possible to genotype entire
genomes at relatively low cost. Because SNPs in the vicinity of each other are often
highly correlated, it is generally possible to impute unobserved SNPs with high accuracy if a neighboring set of SNPs has been genotyped; for that reason, even though
most arrays type only a minute fraction of the three billion base pairs in the human
genome, they can in principle capture a large part of the relevant genetic variation.5
In some rare cases, a difference at a specific locus on a chromosome can singlehandedly lead...
View
Full Document
- Winter '18