This preview shows page 1. Sign up to view the full content.
Unformatted text preview: MCB 121 2010 Dr. Ted Powers Lecture 4: Genome Sequencing & Analysis
Reading: Watson: pp. 135-144; 758-764 Optional Mol. Biol. Cell 4th Ed online (free) http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=mboc4 Search For: “human genome” Primary Literature: “Finishing the Human Genome” (2004) Nature 431: 931-945 “Shotgun Sequence Assembly” (2004) Nature 431: 927-930 Shendure&Ji (2008) Next Generation DNA Sequencing Interview with Paul Nurse & Jim Watson about the human genome:
1 Web-based information for some genome projects
Bacteria (E. coli): http://www.genome.wisc.edu/ Dog (canus familiaris): http://www.ncbi.nlm.nih.gov/projects/genome/guide/dog/ Chimpanzee (Pan troglodytes): http://www.nature.com/nature/focus/chimpgenome/index.html Fruit Fly (D. melanogaster): http://www.fruitﬂy.org; http://ﬂybase.org Humans (Homo sapiens): http://genomics.energy.gov/; http://genome.ucsc.edu/cgi-bin/hgGateway Mouse (Mus musculus): http://www.informatics.jax.org/ Plant (A. thaliana): http://www.arabidopsis.org/ Rat (Rattus norvegicus): http://www.genome.gov/10001859 Worm (C. elegans): http://www.wormbase.org/ Yeast (S. cerevisiae): http://www.yeastgenome.org/
2 The ultimate in cloning: Sequencing the human genome Human genome project: •large scale application of many molecular biology techniques presented here •principles similar to other sequencing projects of other organisms Scale of the problem: Organism E. coli Yeast Human Genome 4x 1.4 3.3 size (bp) 106 x 107 x 109 3 Sequence of several organisms occurred in parallel 4 Problem with many genome projects: Complex gene structure in higher eukaryotes Issues: Promoters can be difficult to identify Introns not always easy to detect consequence: ORF cannot be alsways be deduced from gene sequence
5 Another problem with many genome projects: Gene Density Decreases In Higher Eukaryotes 6 Contribution of Introns & Repeated Sequences To Different Genomes Species E. Coli S. cerevisiae Flies Humans Pufferfish Gene density (genes/Mb) 950 480 80 8.5 8.5 Ave. Introns/gene 0 0.04 3.0 6 5 % Repetitive DNA <1 3.4 12 46 2.7 7 8 9 Composition of the Human Genome Very short sequences (~13 bp) (e.g. dinucleotide repeats) Transposable elements ~100-1000 bp Focus of genome Project “Euchromatin” Note: coding regions account for only ~ 1.5% of the genome! 10 Human genome sequencing: technological breakthroughs
1. Large-scale automated DNA sequencing 2. New vectors for carrying large sized genomic inserts plasmids: ~15 kbp BACs (Bacterial Artificial Chromosomes): ~ 300 kbp YACs (Yeast Artificial Chromosomes): ~200 kbp to 2 Mbp 3. Bioinformatics improved algorithms, computer speed and memory to handle large amount of data
11 Bacterial Artificial Chromosomes (BACs): Vector of choice for genome projects 1. Derived from the “F Plasmid” of E. coli, involved in conjugation and gene transfer 2. Encodes genes for replication and maintenance 3. Inserts are stable and undergo limited recombination Commercially available BAC (Epicentre) 12 Two strategies for sequencing the human genome
1. Public consortium (>23 centers in 6 countries) Took 10 years (still ahead of schedule) Process: “BAC to BAC” slow laborious method (but tried and true) Physical Map: BACs --> Contigs --> Shotgun clones --> Sequence Link to Cytogenetic map: Contigs mapped to chromosomes using visual methods (e.g. FISH: Fluorescence In Situ Hybridization) 2. Private Corporation: Celera Complete shotgun approach [Whole-Genome Shotgun (WGS)] 13 Overview of public consortium method From Lander and coworkers
14 Overview of Celera’s approach 1. Made 3 genomic libraries using only small (~2-50 kbp inserts) shotgun library 2. Sequence ~500 nucleotides from each end of each insert 3. Repeat process 27 million times (resulted in 5-fold coverage of the genome) 4. Assembled “virtual contigs” using computer From Venter and coworkers 15 Comparison of the two approaches: what did we learn?
1. 2001: About 35,000-40,000 genes in the human genome (previous estimates at ~100,000 genes) 2. 2004: Revised estimate of genes: 20,000-25,000 3. “Proteome” much larger (actual proteins produced) Q: How? A: post transcriptional events 3. Hundreds of human genes result of horizontal gene transfer from bacteria during vertebrate evolution 4. More than 3.0 million SNPs (single nucleotide polymorphisms) identified. Will aid in identification of disease-related genes.
16 “Comparative Genomics” at work
Human vs mouse: Both contain <30,000 genes ~80% of these are orthologues <1% of mouse genes have no human orthologue Human vs mouse vs dog: ~4% of genes are conserved in humans and dogs but not in mice ~5% genomes are highly conserved with each other Of this, only ~2% composed of coding regions (genes) thus, non coding regions, regulatory elements, chromosomal structural elements highly constrained during evolution 17 Mapping human and mouse genes on dog chromosomes Chromosome number (haploid autosomes): Human: 22 Mouse: 19 Dog: 38 1. Insight into chromosome reorganization during mammalian evolution (conservation called “synteny”) 2. Facilitate comparison of different physical and genetic markers related to diseases in all three organisms (Comparative Genomic Medicine)
18 Mapping human and mouse genes on dog chromosomes 19 20 Genome sequences: what else is now possible? Gene identification homology searches (i.e. BLAST) (Genotype now independent from Phenotype) Evolution and genetic diversity Microbial Genome Programs (DOE & NSF) Craig Venter Institute (http://www.jcvi.org/)
Sorcerer II Expedition: Environmental Genomics http://www.sorcerer2expedition.org Human medicine and genetics Disease gene discovery “RFLP” maps & “SNP” maps Pharmacogenomics and Nutrigenomics “Reverse Genetics” in model systems (e.g. mouse knockouts)
21 Humans & other eukaryotes Minimum complement of genes: Basic cell functions: 500 Free living bacterial cell: 1,500 Free living eukaryotic cell: 5,000 Multicellular organism: >10,000 But, most genes may not be Essential!!! 22 Questions: 1. How do we explain this? 2. What experimental approaches Can we take? Answer to question 2: Genome wide-synthetic Lethality analysis. Termed “SGA”: Synthetic Genetic Array Analysis 23 An additional phenotypic phenomenon: Synthetic Interactions Synthetic interactions: mutations in different genes by themselves confer weak phenotypes, but produce strong phenotypes when combined. Two extreme general models: 1. Null mutations in genes A or B are viable 2. Null mutations in genes A or B are lethal 24 Synthetic Interactions (cont.) 1. Null mutation in gene A or B is viable Generally indicative of parallel pathways 25 Synthetic Interactions (cont.) 2. Null mutation in gene A or B is lethal May be indicative of interacting gene products/linear pathway
26 Synthetic Interactions (cont.) Like unlinked non complementation, synthetic interactions can reveal direct interactions between gene products One primary difference between these phenomenon is that synthetic phenotypes can be recessive: Ab and aB haploid mutant cells --> weak mutant phenotype AaBb diploid cells --> wild type phenotype ab haploid cell --> strong mutant phenotype Question: Consider the tetratype (T) from sporulation of AaBb diploid: AB, Ab, aB, ab Which spore is the best to observe a synthetic phenotype? 27 28 Specific example: SET2 histone methyltransferase
Results: Novel link between Transcription & histone modification Krogen et al. MCB 2003 Vol. 23 P. 4207-4218 Red lines: synthetic lethal interactions Blue lines: synthetic sick interactions 29 Next Generation Sequencing Technologies
Standard Sequencing approach New approach 30 Next Generation Sequencing Technologies
Some Major Differences: Old versus New: OLD Source of DNA Bacterial propagated Plasmid clone NEW In vitro produced Copies (PCR or other Enzymatic approaches) OR No amplification (native DNA)
Many options in development: 1. Polymerase-dependent 2. Ligase-dependent Direct Imaging of simultaneous Sequencing reactions Requires reference genome 31 For analysis Sequencing Technology Sanger dideoxyNTP Chain termination Data aquisition Radioactivity or Fluorescence Detector For single “run” Can be used for genome assembly Data analysis Basis for dideoxysequencing approach 32 Basis for dideoxysequencing approach (cont.) 33 New chemistries for DNA sequencing
1. 434 Sequencing 2. Solexa/Illumina Sequencing 3. SOLiD (Sequencing by Oligo Ligation & Detection) 4. HeliScope sequencing 34 All new methods: Data acquisition requires simultaneous monitoring of multiple sequencing reactions Surface binding of DNA To be sequenced (example is from Solexa sequencing) Sensitive fluorescence detection systems (CCD digital camera) to sequencing progression
35 Advantage of new approaches: lower cost
First Human Genome sequencing projects (Public Consortium + Celera) > $300,000,000.00 Relative costs of new technologies for human genome sequencing Technology Sanger sequencing 454 sequencing Solexa/Illumina Heliscope SOLiD Est. Cost (40x coverage) $57,000,000.00 $5,700,000.00 $330,000.00 $69,000.00 <$10,000.00 Instrument Cost <$100,000.00 ~$500,000.00 ~$430,000.00 ~$1,350,000.00 ~$600,000.00
36 New sequencing approaches are bringing a revolution to human genetics & medicine
Lupski et al (2010) Whole-Genome sequencing in a patient with Charcot-Maire-Tooth Neruopathy New England Journal of Medicine 362, 1181-1191 Roach et al (2010) Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing Science Express (Epub ahead of print) Conclusions/Issues 1. Generational mutation rate: 60/3.2 Billion basepairs (~1 x10-8) 2. Candidate disease-causing genes may not be adjacent to documented SNPs. Calls into question the major SNP mapping effort 3. Raises immediacy of ethical/moral issues related to human genomics
37 One very recent example…. 38 ...
View Full Document
This note was uploaded on 09/23/2010 for the course NPB 8746546 taught by Professor Goldberg during the Spring '10 term at UC Davis.
- Spring '10