molecular_genetics_and_economics.pdf - Journal of Economic Perspectives\u2014Volume 25 Number 4\u2014Fall 2011\u2014Pages 57\u201382 Molecular Genetics and

molecular_genetics_and_economics.pdf - Journal of Economic...

This preview shows page 1 out of 26 pages.

You've reached the end of your free preview.

Want to read all 26 pages?

Unformatted text preview: Journal of Economic Perspectives—Volume 25, Number 4—Fall 2011—Pages 57–82 Molecular Genetics and Economics† Jonathan P. Beauchamp, David Cesarini, Magnus Johannesson, Matthijs J. H. M. van der Loos, Philipp D. Koellinger, Patrick J. F. Groenen, James H. Fowler, J. Niels Rosenquist, A. Roy Thurik, and Nicholas A. Christakis T he question of how traits and behaviors pass from one generation to the next has been the subject of intense interest throughout the history of science. Simple parent–child correlations are open to multiple interpretations, as parents transmit both environment and genome to their children. Until recently, genotyping—or the direct measurement of variation in an individual’s DNA sequence through biological assays—was exorbitantly expensive; distinguishing the roles of genetics and environment was the realm of behavioral genetics, in which Jonathan P. Beauchamp just completed his Ph.D. in Economics at Harvard University, Cambridge, Massachusetts. David Cesarini is Assistant Professor of Economics, Center for Experimental Social Science, New York University, New York City, New York, and an Affiliated Researcher, Institute for Industrial Economics (IFN), Stockholm, Sweden. Magnus Johannesson is Professor of Economics, Stockholm School of Economics, Stockholm, Sweden. Matthijs J. H. M. van der Loos is a Ph.D. candidate in Applied Economics, Erasmus School of Economics, Rotterdam, Netherlands. Philipp D. Koellinger is Assistant Professor of Economics, Erasmus School of Economics, Rotterdam, Netherlands. Patrick J. F. Groenen is Professor of Statistics, Erasmus School of Economics, Rotterdam, Netherlands. James H. Fowler is Professor of Medical Genetics and Political Science, University of California at San Diego, La Jolla, California. J. Niels Rosenquist is Instructor at Harvard Medical School and the Massachusetts General Hospital’s Psychiatric and Neurodevelopmental Genetics Unit and a Research Fellow at the Institute for Quantitative Social Science at Harvard, Cambridge, Massachusetts. A. Roy Thurik is Professor of Economics and Entrepreneurship, Erasmus School of Economics, Rotterdam, Netherlands. Nicholas A. Christakis is Professor of Medicine and of Medical Sociology, Harvard Medical School, Cambridge, Massachusetts. The corresponding author is 〈[email protected] [email protected]〉〉. ■ † To access the Appendix, visit . doi=10.1257/jep.25.4.57 58 Journal of Economic Perspectives samples of twin, adoption, or other pedigree data were analyzed. However, with the completion of the Human Genome Project in the early 2000s (Venter et al., 2001; Lander et al., 2001) and the advent of inexpensive, genome-wide scans of variation, it is now increasingly feasible to examine specific genetic variants that predict individual differences directly. In fact, the costs of comprehensively genotyping human subjects have fallen to the point where major funding bodies, even in the social sciences, are beginning to incorporate genetic and biological markers into major social surveys. The National Longitudinal Study of Adolescent Health, the Wisconsin Longitudinal Study, and the Health and Retirement Survey have launched, or are in the process of launching, datasets with comprehensively genotyped subjects. Similar efforts are also underway in Europe, for example with the Biobank Project in the United Kingdom (Ollier, Sprosen, and Peakman, 2005) and the large-scale genotyping of subjects at several European twin registries. These samples contain, or will soon contain, data on hundreds of thousands of genetic markers for each individual in the sample as well as, in most cases, basic economic variables. How, if at all, should economists use and combine molecular genetic and economic data? What challenges arise when analyzing genetically informative data? In this article, we lay out the terrain for such questions. We use the term “genoeconomics,” originally proposed by Benjamin et al. (2007), to describe the use of molecular genetic information in economics. To illustrate some of the challenges that researchers in this field are likely to encounter, we present results from a “genome-wide association study” of educational attainment, one of the first of its kind in economics.1 This type of study involves analyzing hundreds of thousands of genetic markers and seeking to understand their association with some trait of interest. We use a sample of 7,500 individuals from the Framingham Heart Study. After quality controls, our dataset contains over 360,000 genetic markers per person. Despite some initially promising results, the main findings from this dataset fail to replicate in a second large replication sample of 9,500 people from the Rotterdam Study, suggesting that the original results were probably spurious. These findings are unfortunately typical in these types of studies of molecular genetics and therefore also cautionary. The frequent replication failures in this molecular genetics literature are likely a result of several forces, the most important of which is probably that the samples used in research are too small to ensure that there is adequate power to detect true associations (Ioannidis, 2005, 2007). When true effect sizes are small, the power to detect true associations will of course be poor, and the ratio of true to false signals will hence be low. Indeed, an important implication of the genome-wide association study results reported in this paper is that they confirm 1 Preliminary results from a genome-wide association study of educational attainment have been reported by Posthuma et al. (2008), Beauchamp, Cesarini, Rosenquist, Fowler, and Christakis (2010), and most recently by Martin at al. (2011). A genome-wide association study of self-employment has been initiated by van der Loos, Koellinger, Groenen, and Thurik (2010). Jonathan P. Beauchamp et al. 59 that common genetic variants with large main effects are likely to be extremely rare for economic variables, which tend to be far removed from the molecular genetic variant in the chain of causation. We perform power analyses to demonstrate this point and show that under plausible assumptions about the effect sizes of a specific type of common variation in the human genome, samples in the tens of thousands, perhaps more, may be necessary to detect genetic influences on most complex economic variables in a robust manner. This insight suggests that most existing genoeconomic studies, which are based on samples in the hundreds, are dramatically underpowered and that we should expect a high false discovery rate until this is remedied. Our choice of educational attainment as the outcome variable of study was determined by the widespread availability of this characteristic in cohorts that have already been genotyped. An important next step of a successful genoeconomic research agenda is to start measuring more biologically proximate variables—such as preferences—in large samples. Variables which are less distant from the genome in the chain of causation are more likely to require smaller samples in order for genetic associations to be detected reliably, and any detected associations are more likely to have a biologically meaningful interpretation and economically meaningful implications. The empirical results in this paper are also used to discuss several other methodological issues that arise in the analysis of molecular genetic data. Our overall assessment is cautiously optimistic: this new data source has the potential not only to complement traditional behavioral genetic studies but also to add a new dimension to our understanding of heterogeneity in economic behaviors and outcomes, especially when it comes to traits that are close to the underlying biology. But for this ample potential to be realized, researchers and consumers of this literature should be wary of the pitfalls that lie ahead (Benjamin et al., 2007). The most urgent of these challenges is the difficulty of doing reliable inference when faced with multiple hypothesis problems, which are on a scale that has never before been encountered in social science. Behavioral Genetics Over the past few decades, behavioral geneticists have produced a compelling array of evidence that genetic endowments influence economic behaviors, outcomes, and preferences. The general approach in these studies is to make assumptions about the extent to which the different sibling types share genetic and environmental conditions and infer the fraction of variance that can be statistically accounted for by genetic variation (heritability, denoted h 2), rearing conditions (common environment, denoted c 2), and idiosyncratic factors (unique environment, denoted e 2). These studies often compare the resemblance of adoptees reared in the same family to that of biological siblings reared in the same family, or the resemblance of identical (monozygotic) twins, who share their entire genomes, to that of fraternal (dizygotic) twins, who share approximately half their genomes. Sacerdote (2010) 60 Journal of Economic Perspectives provides an accessible introduction for economists. A standard textbook is Plomin, DeFries, McClearn, and McGuffin (2008). The simplest behavioral genetic model is based on a host of strong assumptions, including the independence of genetic and family effects, and functional form assumptions; it fails to take assortative mating and nonlinear genetic effects into account. In the 1970s, when the debate between environmentalists and hereditarians reached its peak, there was much controversy over whether the high heritability estimates, especially for IQ, were artifacts of the simplistic behavioral genetic framework that would go away in more elaborate designs and with better data. In response, behavioral geneticists have built much richer datasets and expanded their models, relaxing the various problematic assumptions. They have consistently found that personality, IQ, and most other traits remain highly, or at least moderately, correlated with genetic endowments (Bouchard and McGue, 2003). In fact, the consensus in behavioral genetics that virtually all traits are associated with genotype is so strong that it has been elevated to the status of a “law” (Turkheimer, 2000).2 Economic behaviors, preferences, and outcomes are no exception. Behavioral genetic methods originally made limited inroads into economics through the work of Taubman and coauthors (for example, Taubman, 1976), who demonstrated that genetically identical (monozygotic) twins exhibit greater similarity than fraternal (dizygotic) twins in both educational attainment and income. Since then, a number of papers have followed suit in applying behavioral genetic research designs to the study of economic outcomes. Many of these studies rely on quasi-experiments such as adoption (Sacerdote, 2007; Björklund, Lindahl, and Plug, 2006; Björklund, Jäntti, and Solon, 2005), twinning (Taubman, 1976; Lichtenstein, Pedersen, and McClearn, 1992), or comparisons of multiple sibling types (Björklund, Jäntti, and Solon, 2005). More recent work has also demonstrated that economic preferences elicited from incentivized experiments or surveys are heritable, with estimates typically in the 20–30 percent range (Wallace, Cesarini, Johannesson, and Lichtenstein, 2007; Cesarini, Dawes, Johannesson, Lichtenstein, and Wallace, 2009a, b). These estimates are biased downward because they do not take into account measurement error in the preference elicitation.3 Two other studies of portfolio choice data found heritability estimates of about 0.25–0.60 for various financial decision-making variables (Barnea, Cronqvist, and Siegel, 2010; Cesarini, Johannesson, Lichtenstein, Sandewall, and Wallace, 2010). 2 Estimates of heritability based on family data, such as twins, have also recently been corroborated by techniques that utilize molecular genetic data in ingenious ways to estimate heritability (Visscher et al., 2006; Yang et al., 2010). 3 Adjusting for noise appears to approximately double the heritability estimates (Beauchamp, Cesarini, and Johannesson, 2011). Test retest data (unpublished) has been collected for a sample of about 100 twins that participated in the experiments reported in Wallace, Cesarini, Johannesson, and Lichtenstein (2007) and Cesarini, Dawes, Johannesson, Lichtenstein, and Wallace (2009a, b). These data suggest a test retest correlation of about 0.5. Adjusting for measurement noise would thus approximately double the heritability estimates. Molecular Genetics and Economics 61 In interpreting heritability estimates, it is crucial to appreciate the possibility that genetic effects may operate via environmental effects, because genotypes may either evoke environmental responses or cause an individual to select a particular environment endogenously (Becker and Tomes, 1979; Dickens and Flynn, 2001; Fowler, Settle, and Christakis, 2011; Jencks, 1980; Ridley, 2003). This possibility has given rise to the expression “nature via nurture”—as opposed to “nature versus nurture.” Estimates of the behavioral genetic model can therefore be thought of as reduced form coefficients from a more general model in which some environments are endogenous to genotype (Dickens and Flynn, 2001; Jencks and Brown, 1977; Jencks, 1980; Lizzeri and Siniscalchi, 2008). As pointed out by Jencks (1980), a common mistake is to equate “genetic” with “immutable”: the fact that a person’s DNA sequence is in some sense fixed does not mean that the effects of that sequence are fixed. Goldberger (1979) provides several examples of how the implications of heritability estimates have been misstated and notes that high estimates do not imply that interventions are doomed to failure. While genetic variation can statistically account for a moderate to large share of income in contemporary Western societies, this does not mean that it would be infeasible to use redistributive policies or policies that encourage human capital formation to change the distribution of income. Heritability is a population parameter that depends on both the environmental effects operating in a specific population at a certain point in time and on the genetic variation in that population. It says little about what would happen to the mean and variance of the trait were the environment to change. Therefore, there is no contradiction between observing a high heritability for height, say, and secular increases in height over time as the environment changes. Heritability estimates do not tell us how the genetic effects operate, of course, nor do they tell us much about whether the mechanisms are easy or hard to modify. But far from being useless, as has sometimes been asserted, heritability estimates tell us that for most traits, a sizable fraction of the within-family resemblance can ultimately be traced to shared DNA. We suspect that were it not for the impressive cumulative progress in behavioral genetics over the last couple of decades, the issue would still be contentious. Figuring out how and why genetic factors matter is an interesting scientific activity, and molecular genetic methods are an exciting tool to bring to bear on these questions. Elementary Molecular Genetic Concepts Molecular genetics is the branch of genetics that studies the structure and function of DNA at its most basic level. Recent decades have seen major advances, allowing researchers to better understand the numerous ways in which genomes vary between individuals. The human genome consists of 23 pairs of chromosomes that package DNA. One member of each pair of chromosomes is inherited from the mother and the other from the father. DNA itself consists of two strands of elementary building blocks that together form a double helix structure. The elementary 62 Journal of Economic Perspectives building blocks, called nucleotides, each contain one of four bases—A (adenine), C (cytosine), T (thymine), or G (guanine)—resulting in four distinct nucleotides. Due to a property of DNA called complementarity, a nucleotide with the base A is always paired with a nucleotide with the base T and a nucleotide with the base C is always paired with a nucleotide with the base G, forming so-called base pairs and holding the two strands of DNA together. A locus is a specific position of a DNA sequence on a (pair of) chromosome(s). A locus thus refers to a pair of base pairs (or nucleotide pairs), ), one base pair coming from the paternal chromosome and the other base pair coming from the maternal chromosome. The human genome consists of approximately three billion such pairs of base pairs arranged into the 23 (pairs of) chromosomes. Because of complementarity, the second base of a base pair can be directly identified from knowledge of the first one, and so it is common practice to refer to a locus as consisting of two single bases rather than of a pair of base pairs. For example, the genotype AT-AT would be referred to as AA or as TT.4 Genes are sequences of nucleotide base pairs that code for some types of RNA products, many of which in turn code for proteins. These RNA products and proteins begin cascades of interactions that regulate bodily structures and functions. Only a small portion of the genome actually consists of genes, and both genetic variation in the genes and in the remaining portion of the genome can account for variation in behaviors and traits. However, because of genes’ functional importance, many researchers have focused their attention on genetic variation in the genes; also, it is often said, loosely, that “a gene causes” a behavior or trait even though what is meant is that genetic variation at a given locus—often not even in a gene—accounts for some of the variation in the behavior or trait. Humans share most, but not all, of their genetic material: approximately 99.6 percent of common genetic variants are the same when comparing any two unrelated individuals (Kidd et al., 2008). Genetic variation comes in many forms, but most can be traced to one of two types of mutation events. The simplest mutation event is a base substitution, in which the base pair of a nucleotide pair is substituted for another. Whenever a nucleotide varies at a specific locus across individuals in the population, it is said to be a single nucleotide polymorphism,, or SNP (pronounced “snip”), with the different genetic variants of a SNP called “alleles.” Most other forms of genetic variation are due to repeated segments of DNA. In variable number of tandem repeat (VNTR) polymorphisms, there are differences across individuals in the number of times that particular short segments of DNA are repeated. In copy number variation (CNV) polymorphisms, there are differences in the number of repetitions of a long segment of DNA—of at least 1,000 base pairs and often many more. Genotyping SNPs and other genetic variants is performed with technology that allows high-throughput typing of hundreds of thousands of genetic variants per individual. Current technologies type around 500,000 SNPs, but versions with over one 4 For an accessible introduction to the basic concepts in molecular genetics, see Strachan and Read (2003) or Carey (2003). Jonathan P. Beauchamp et al. 63 million SNPs and other variants are already available, and this number is expected to increase in the very near future. Within a decade, it will be possible to genotype entire genomes at relatively low cost. Because SNPs in the vicinity of each other are often highly correlated, it is generally possible to impute unobserved SNPs with high accuracy if a neighboring set of SNPs has been genotyped; for that reason, even though most arrays type only a minute fraction of the three billion base pairs in the human genome, they can in principle capture a large part of the relevant genetic variation.5 In some rare cases, a difference at a specific locus on a chromosome can singlehandedly lead...
View Full Document

  • Winter '18

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture