Bi1_2009_Lecture12_full

Bi1_2009_Lecture12_full - Genomics & Genetic...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Genomics & Genetic Engineering DNA Interactive: http://www.dnai.org/ Using recombinant DNA technology, we can: • Sequence all the genes in an organism. • Amplify genes (e.g., for forensics or to find viruses). • Express rare proteins or construct new proteins by splicing together pieces of genes, mutating genes, or synthesizing genes. • Genetically alter animals (insert transgenes). • Diagnose genetic diseases. • Identify criminals or exonerate the innocent. Genomics -- the first of the “omes” One of the visions of modern biology is to construct accurate snapshots of the mRNA and protein profiles of an organism at different times in the cell cycle, after suffering different insults, or in diseased states. What is a Genome? The genome is the information set containing the totality of DNA sequence that specifies a species (on average) or an individual member of a species. For more information, go to http://www.dnai.org/ and click on “Genome”. An international consortium announced completion of the Human Genome Project in 2003 (started in 1990) http://www.genome.gov/11006929 3.2 billion base pairs (nearly 10 orders of magnitude) Major effort from information technology (60% of the professionals were software experts) . . . but annotating the human genome is not finished Why sequence whole genomes? Molecular medicine Risk assessment Bioarcheology, anthropology, evolution of species, human migration DNA forensics Agriculture, livestock breeding, bioprocessing Energy sources, environmental applications Find genetic components of common traits and diseases DNA sequencing now fully automated • • • • Dideoxy method: DNA synthesis in the presence of limiting amounts of chain-terminating dideoxyribonucleoside triphosphates (ddATP, ddTTP, ddCTP, ddGTP). New DNA strands terminate when a given nucleotide is reached, so have a collection of DNA copies that terminate at every position in the original DNA. Separate newly-synthesized DNA copies by gel electrophoresis. Done today by robots: they mix reagents, run, load and read gels. Chain-terminators each tagged with a different fluorescent dye. Figure 10-7 and 10-8, Little Alberts E.S. Lander et al. *20 groups from US, UK, Japan, France, Germany, China Public genome sequencing video Draft sequence reported in 2001 by Celera, a private company Celera genome sequencing video Here’s how DNA is prepared for sequencing How big is the human genome? • The U.S. Department of Energy Human Genome Project Information Web site estimates it would take "about 9.5 years to read out loud (without stopping) the more than three billion pairs of bases in one person's genome sequence". [Source: Human Genome Projects Information]. • If our strands of DNA were stretched out in a line, the 46 chromosomes making up the human genome would extend more than six feet. If the ... length of the 100 trillion cells could be stretched out, it would be ... over 113 billion miles [182 billion kilometres]. That is enough material to reach to the sun and back 610 times." [Source: Centre for Integrated Genomics] Getting the sequences is just the beginning • Each sequencing run gives you the sequence of hundreds of nucleotides; genomes have billions (human genome: 3.2 billion bases; 30,000 genes). Must piece together short fragments in correct order. Incredible computational problem to assemble a complete genome! • Annotation -- identifying genes and assigning function – Look for open reading frames (ORFs) that begin with initiation codon (ATG for methionine) and end with termination codon (TAA, TAG, or TGA). – Finding ORFs in eukaryotic DNA is difficult because of introns. Look for intron-exon boundaries or gene regulatory sequences. Or look at conservation across evolution. – Sometimes assign tentative function by homology to gene of known function in another organism. Annotation -- identifying genes and assigning function Getting the sequences is just the beginning – Look for open reading frames (ORFs) that begin with initiation codon (ATG for methionine) and end with termination codon (TAA, TAG, or TGA). Reading frame on top strand doesn’t contain a stop codon --Might be an ORF – Finding ORFs in eukaryotic DNA is difficult because of introns. Look for intron-exon boundaries or gene regulatory sequences. Or look at conservation across evolution. – Sometimes assign tentative function by homology to gene of known function in another organism. ORFs are not the whole story Regulatory regions in DNA determine which genes are transcribed in which cells. Genome comparisons * Differences between individual humans: ~0.1% * A humbling thought: Humans have ~30,000 genes -- are we only ~2x as complicated as a fly or a worm? What about plants? Our genes are more complex than genes in invertebrates or plants -- more alternative splicing creates a larger number of protein products. The human proteome (full set of proteins encoded by a genome) is more complex than proteomes from these “lesser” organisms. http://www.nature.ca/genome /03/c/20/03c_21_e.cfm#c18 Non-coding DNA Much of the human genome is “non-coding” DNA; Resolves the “C-value paradox” -- observation that genome size does not reflect complexity • Human genome: 1.5% protein-coding genes; 98.5% non-coding DNA. – Much of the non-coding DNA are transposable elements (transposons), sequences of DNA that can move to different positions with the genome; i.e., mobile genetic elements. • Discovered by Barbara McClintock in corn (1948). – Retrotransposons are one class of transposable element. They paste copies of themselves into genome in multiple places. • Retrotransposon DNA is first transcribed into RNA. • RNA copied into DNA by a reverse transcriptase (often encoded by the transposon itself). This should sound familiar… "Dr. Watson requested that all gene information about apolioprotein E be redacted, citing concerns about the association that has been shown with Alzheimer's disease. These data were redacted and were not analysed by the research team." Pace of sequencing • 2001, 2003: Draft, then complete, human genome sequence reported by Public Genome Sequencing Project (20 groups; ~$300M) and Celera (private company). 13 year effort. • Spring 2008: James Watson’s genome -- 2 months; ~$1M • Fall 2008: 8 weeks; ~$250K – Now have four human genome sequences • Two European-origin: (Craig Ventor and James Watson); 1 West African, 1 Han Chinese DNA sequencing on a surface allows many sequences to be determined at once Illumina sequencing instruments 103 - 104 more sequence per operation cycle than first instruments (~5x107 single 35-base reads) Form clonal single-molecule array Multiple cycles of annealing, extension and denaturation to make clusters of identical DNA fragments Remove original strand Make new strand Generate short fragments Add forked linkers Amplify Anneal ssDNA to primer on surface, copy to make new strand Cleave to linearize Read sequence Remove product of Read 1 in order to sequence other strand DR Bentley et al. Nature 456, 53-59 (2008) doi:10.1038/nature07484 http://www.illumina.com/pages.ilmn?ID=203 Click on “(click for sequencing-by-synthesis demo)” link at bottom of the page. Pace of sequencing • 2001: Draft human genome sequence reported by Public Genome Sequencing Project (20 groups) and Celera (private company). >10 year effort. • Spring 2008: James Watson’s genome -- 2 months; ~$1M • Fall 2008: 8 weeks; ~$250K – Now have four human genome sequences • What’s new: single molecule sequencing (no amplification, so no introduced errors) – Detect fluorescent bases built by enzymes into DNA • http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tabid/64/Default.aspx – Click on the video “How It Works” – Nanopores: Read DNA as it threads through a tiny hole -Hemolysin Nanopores Sanderson, K. (2008) Standards and Pores. Nature 456: 23-25. Bacterial pathogens secrete pore-forming toxins that kill cells by poking holes in membranes Attacks red blood cells (Above) Toxin (a-hemolysin) from Staphylococcus aureus, causative agent of life-threatening infections; now often resistant to most antibiotics. [MRSA - methicillin-resistant Staphylococcus aureus] Related toxins are involved in gangrene. Cholera (Vibrio cholerae) symptoms result from a toxin that uses a pore to insert an enzyme (ADP ribosylase) into the host cell cytoplasm. Ribosylates G proteins --> raises cAMP levels, causes massive fluid and electrolyte efflux --> dehydration and often death. Cholera toxin Protein-based nanopores • Staphylococcus aureus -hemolysin pore • DNA passes through pore -interrupts flow of ions • Changes in ion flow through pore read as electrical signal • Put exonuclease at pore’s mouth • Pore has cyclodextrin plug -phosphate group on nucleotide binds briefly and blocks pore. Each base gives distinct readout. –Should work to detect modified bases (e.g., 5-methylcytosine involved in gene regulation) • Scale up to make multichannel sequencer – Prototype: 10 cm2 chip with 128 pores The new genome vision “New technologies that can sequence the entire genome of any person for less than $1,000.” http://www.genome.gov/11006929 Can’t yet routinely sequence entire genomes, so look at SNPs (single-nucleotide polymorphisms) – Usually two alleles for each SNP – Minor allele frequency usually 1% – ~8 x 106 common SNPs in European population Clicker question Domestic dogs exhibit a greater diversity in size than any other terrestrial vertebrate. What might explain the size difference? 1) Differences in a genetic locus on the X chromosome encoding ~50 sizedetermining proteins Large dogs have more chromosomes Small dogs do not respond to canine growth hormone A single nucleotide polymorphism (SNP) in the insulin-like growth factor gene Small dogs have been bred with rats 2) 3) 4) 5) Clicker question Domestic dogs exhibit a greater diversity in size than any other terrestrial vertebrate. What might explain the size difference? 1) Differences in a genetic locus on the X chromosome encoding ~50 sizedetermining proteins Large dogs have more chromosomes Small dogs do not respond to canine growth hormone A single nucleotide polymorphism (SNP) in the insulin-like growth factor gene Small dogs have been bred with rats 2) 3) 4) 5) Genome-wide association studies • Wellcome Trust Case Control Consortium -- DNA from 17,000 people in UK • Try to find genetic variations for bipolar disorder, Crohn’s disease, coronary heart disease, hypertension, rheumatoid arthritis, type 1 diabetes, type 2 diabetes • Look for SNPs (single-nucleotide polymorphisms) – Usually two alleles for each SNP – Minor allele frequency usually 1% – ~8 x 106 common SNPs in European population “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls” (2007) Nature 447, 661-678. -log10(P) Compare regions of the genome between matched cohorts with and without the disease Personal Genome Project (PGP) • See http://www.personalgenomes.org/ • Sequence & post genomes along with medical, mental & physical characteristics • >9500 have volunteered. Want 100,000 Commercial gene-testing services • 23andMe (2008 Time Magazine Invention of the Year) – Get genetic profile based on saliva swab ($399). – Information on 90 traits (baldness to blindness) • deCODEme •Promethease -- free online program to analyze the data (from e.g., 23andMe or deCODEme) – Uses data compiled in a wiki called SNPedia 1/11/09 article by Steven Pinker describing analysis of his genes. (Link to full text posted with today’s lecture slides) Genomics and HIV • Rapid sequencing of viral genomes – Study transmission, mutation, drug resistance • Genome wide association studies/ genome sequencing of “elite controllers” (“long-term non-progressors”) – Elite controllers: ~1 in 3000 – Maintain 50 HIV/mL without anti-retroviral drugs (typically 104-106 HIV/mL before drugs) Sequel to PCR song http://bio-rad.cnpg.com/Video/flatFiles/799/>http://bio-rad.cnpg.com/Video/flatFiles/799/ ...
View Full Document

This note was uploaded on 09/25/2010 for the course BIO 1 taught by Professor Bakorman during the Spring '09 term at Caltech.

Ask a homework question - tutors are online