View the step-by-step solution to:

2006 Nature Publishing Group http://www.nature.

Hi I need a 20 min presentation from this article. Thank you

Genome-wide analysis of mammalian promoter architecture and evolution Piero Carninci 1,2,21 , Albin Sandelin 1,3,21 , Boris Lenhard 1,3,20,21 , Shintaro Katayama 1 , Kazuro Shimokawa 1 , Jasmina Ponjavic 1,20 , Colin A M Semple 1,4 , Martin S Taylor 1,5 ,Pa ¨r G Engstro ¨m 3 , Martin C Frith 1,6 , Alistair R R Forrest 6 , Wynand B Alkema 3 , Sin Lam Tan 7 , Charles Plessy 2 , Rimantas Kodzius 1,2 , Timothy Ravasi 1,6,8 , Takeya Kasukawa 1,9 , Shiro Fukuda 1 , Mutsumi Kanamori-Katayama 1 , Yayoi Kitazume 1 , Hideya Kawaji 1,9 , Chikatoshi Kai 1 , Mari Nakamura 1 , Hideaki Konno 1 , Kenji Nakano 1,9 , Salim Mottagui-Tabar 3,20 , Peter Arner 10 , Alessandra Chesi 11 , Stefano Gustincich 11 , Francesca Persichetti 12 , Harukazu Suzuki 1 , Sean M Grimmond 6 , Christine A Wells 19 , Valerio Orlando 13 , Claes Wahlestedt 3,20 , Edison T Liu 14 , Matthias Harbers 15 , Jun Kawai 1,2 , Vladimir B Bajic 1,7,16 , David A Hume 1,6,21 & Yoshihide Hayashizaki 1,2,17,18 Mammalian promoters can be separated into two classes, conserved TATA box–enriched promoters, which initiate at a well- defned site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis oF the sequence architecture and evolution oF distinct promoter classes. DiFFerent tissues and Families oF genes diFFerentially use distinct types oF promoters. Our tagging methods allow quantitative analysis oF promoter usage in diFFerent tissues and show that diFFerentially regulated alternative TSSs are a common Feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identifed new start sites associated with the majority oF exons and with 3 ¢ UTRs. These data permit genome-scale identifcation oF tissue-specifc promoters and analysis oF the cis -acting elements associated with them. With the completion of several mammalian genome sequences, the next challenge for mammalian genomics is to understand how transcription is controlled. Present algorithms aimed at TSS prediction have proven unsatisfactory 1 . Although many TSSs from mouse can be inferred from the 5 ¢ ends of full-length cDNAs and 5 ¢ ESTs 2,3 ,th e depth of coverage is limited. To increase the depth of coverage, we have carried out systematic 5 ¢ -end analysis of the mouse and human transcriptome using the cap analysis of gene expression (CAGE) approach 4 . Here we redeFne basic promoter features and analyze the diversity, evolutionary con- servation and dynamic regulation of mammalian promoters on a genome-wide scale. Received 9 December 2005; accepted 27 March 2006; published online 28 April 2006; corrected online 5 May 2006; doi:10.1038/ng1789 1 Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan. 2 Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan. 3 Center for Genomics and Bioinformatics, Karolinska Institutet, Berzelius v. 35, S-171 77 Stockholm, Sweden. 4 UK Medical Research Council (MRC) Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK. 5 University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK. 6 Australian Research Council (ARC) Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane Qld, 4072, Australia. 7 Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore. 8 Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, 0412 La Jolla, California 92093, USA. 9 Broadband Communication Service Business Unit, Network Service Solution Business Group, NTT Software Corporation, Teisan Kannai Bldg. 209, Yamashita-cho Naka-ku, Yokohama, Kanagawa, 231-8551, Japan. 10 Department of Medicine, Karolinska Institute, Huddinge University Hospital, S 141 86 Huddinge, Sweden. 11 The Giovanni Armenise–Harvard Foundation Laboratory, Sector of Neurobiology, International School for Advanced Studies-Scuola Internazionale Superiore Studi Avanzati (I.S.A.S.-S.I.S.S.A.), AREA Science Park, Padriciano 99, 34012 Trieste, Italy. 12 Sector of Neurobiology, I.S.A.S.-S.I.S.S.A., AREA Science Park, Padriciano 99, 34012 Trieste, Italy, 13 Dulbecco Telethon Institute, Institute of Genetics and Biophysics, Consiglio Nazionale delle Ricerche (IGB CNR), Epigenetics and Genome Reprogramming Laboratory, Pietro Castellino Street 111, Napoli, 80131, Italy. 14 Genome Institute of Singapore, 60 Biopolis Street #02-01, Singapore 138672, 15 Kabushiki Kaisha Dnaform, 1-3-35, Mita, Minato-ku, Tokyo, 108-0073, Japan. 16 South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville, South Africa. 17 Yokohama City University, 1-7-29 Suehiro-cho Tsurumi-ku Yokohama 230-0045 Japan. 18 Graduate School of Comprehensive Human Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi Ibaraki-ken, 305-8577, Japan. 19 The Eskitis Institute for Cell and Molecular Therapies, Grif±th University, Nathan Campus, Kessels Road, Queensland 4111, Australia. 20 Present addresses: Bergen Center for Computational Science, Unifob AS, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway (B.L.), Scripps Florida, Jupiter, Florida 33458, USA (C.W.), Department of Molecular Medicine, National Public Health Instititute, Department of Medical Genetics, University of Helsinki, Biomedicum, FIN-00251 Helsinki, Finland (S.M.-T.) and MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK (J.P.). 21 These authors contributed equally to this work. Correspondence should be addressed to D.H. ([email protected]) or Y.H. ([email protected]). 626 VOLUME 38 [ NUMBER 6 [ JUNE 2006 NATURE GENETICS ARTICLES ©2006 Nature Publishing Group
Background image of page 01
RESULTS Defning TSSs by CAGE tags CAGE tags are 20- or 21-nt sequence tags that are derived from the mRNA sequenced in the proximity of the cap site, and their mapping onto unique genomic regions identiFes TSSs 4,5 .CAGEl ib ra r i e sa r e constructed from full-length cDNAs selected through a biotinylated cap. Second-strand synthesis is absolutely dependent upon the ligation, to the Frst-strand full-length cDNAs, of a primer that contains restriction sites allowing the cleavage of 5 ¢ 20- to 21-bp tags from the resulting cDNA. These short fragments are concatemer- ized and sequenced. We applied CAGE sequencing to 145 different mouse and 41 different human libraries. We mapped tags to the mouse and human genomes using a hierarchical data structure ( Fig. 1a ). CAGE tags that had an identical 5 ¢ start site were grouped into a CAGE-tag starting site (CTSS), whereas CTSSs that overlap on the same strand form a tag cluster. Mapping cDNAs, ESTs and CAGE, GIS and GSC tags to mouse and human genomes allowed us to identify 729,504 potential mouse and 665,278 human TSSs ( Table 1, and Supplementary Fig. 1 and Supplementary Table 1 online). Of these, 593,290 in the mouse genome and 629,716 in the human genome were deFned by CAGE tag clusters. The majority of tag clusters identiFed by two or more tags (159,075 mouse and 177,563 human) were derived from independent libraries ( Supplementary Note online). Therefore, we selected these tag clusters for detailed expression and promoter analysis. We aimed to identify the major promoters of the widest possible diversity of genes by sampling at relatively low depth many tissues and conditions. Most single CAGE tags (singletons) re±ect the fact that in most libraries the number of tags sequenced ( B 100,000) is lower than the total number of transcripts per cell 6 . Therefore, rare transcripts were sampled randomly. We provide several lines of evidence demonstrating that CAGE identiFes genuine transcription start sites, including (i) statis- tical analysis of reproducibility within and across species, (ii) experi- mental validation by distinct primer extension approaches, (iii) CTSS CTSS CTSS CTSS Number of tags Tag cluster (TC) 0.30 0.20 0.10 0.00 0.30 0.20 0.10 0.00 Tags from tissue bp in inner exons 0.30 0.20 0.10 0.00 0 Liver tags/bp in major promoter Lung tags/bp in major promoter Macrophage tags/bp in major promoter 10 20 30 40 50 01 02 03 04 05 0 01 02 03 04 05 0 0.01 0–25 25–50 Liver tags in major promoter (%) Lung tags in major promoter (%) Macrophage tags in major promoter (%) 50–75 75–100 0–25 25–50 50–75 75–100 0–25 25–50 50–75 75–100 0.03 0.05 0.07 0.030 0.020 0.010 0.025 0.015 0.005 tags from tissue bp in inner exons Mean Postn (T03F033EDB08) ORF61 (T10R04C2E25C) Ogdh (T11F005E65B7) Adsl (T15F04D83907) Peo1 (T19F02A4A00F) Ppap2b (T04F062B54A0) Trappc3 (T04F076DBE82) Fraction of tag counts in tag cluster Pik3r5 (T11F040E47AA) Txndc7 (T12F0109631B) MrpI2 (T17F02ABA5E8) Cd164 (T10F0277FE1C) Zfp385 (T15R06303CFB) 137774 (T18R05237FC0) Myh3 (T11F03F99F20) Fth1 (T19F008A9C64) 130092 (T19R02C738F6) 80% 60% 40% 20% 80% 60% 40% 20% 80% 60% 40% 20% 80% 60% 40% 20% 20 40 60 80 100 20 40 60 80 100 20 40 60 80 Nucleotide position 100 20 Broad with dominant peak (PB) Bi- or multimodal (MU) Broad (BR) Single dominant peak (SP) 40 60 80 100 ab 500 300 100 01 02 03 04 05 0 Tag cluster width Number of tag clusters Total number of tag clusters Number of TATA-associated tag clusters Number of CpG-associated tag clusters c e d Figure 1 Defnition and characteristics oF CAGE tag clusters. ( a )Tag clusters are produced by grouping overlapping tags on the same strand. Hence, tag clusters are defned by a start and end position, a count oF tags and a distribution oF these counts. Unique tag starts within the tag cluster Form CAGE tag starting sites (CTSSs). ( b ) Demonstration oF the lack oF correlation between the tag density in the ±100 region oF the frst exon and the tag density in inner exons. ( c ) Association oF tag cluster width (minimal length oF the sequence Fragment containing >80% oF all tags in the cluster) with TATA boxes and CpG islands For tag clusters with >100 tags. ( d ) Correlation between tissue specifcity and exonic promoter activity. Genes expressed in lung, liver and macrophages were grouped in Four categories depending on degree oF tissue specifcity. ( e ) Arrays oF representative tag clusters For diFFerent shape classes. Histograms indicate the Fraction oF tags in the tag cluster mapping into each position in a 120-bp window centered on the tag cluster. The single peak (SP) class is characterized by a sharp peak, indicative oF a single, well-defned TSS. The broad (BR) shape indicate multiple, weakly defned TSSs. The bimodal/multimodal (MU) shape class implies multiple well- defned TSSs within one cluster. Combination oF a well-defned TSS surrounded by weaker TSSs results in a broad with dominant peak shape (PB). HUGO gene names or transcriptional unit identifers For cognate genes and tag cluster identifers are shown above each tag cluster. NATURE GENETICS VOLUME 38 [ NUMBER 6 [ JUNE 2006 627 ARTICLES ©2006 Nature Publishing Group
Background image of page 02
Show entire document
Sign up to view the entire interaction

Recently Asked Questions

Why Join Course Hero?

Course Hero has all the homework and study help you need to succeed! We’ve got course-specific notes, study guides, and practice tests along with expert tutors.


Educational Resources
  • -

    Study Documents

    Find the best study resources around, tagged to your specific courses. Share your own to gain free Course Hero access.

    Browse Documents
  • -

    Question & Answers

    Get one-on-one homework help from our expert tutors—available online 24/7. Ask your own questions or browse existing Q&A threads. Satisfaction guaranteed!

    Ask a Question