This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: ORIGINAL ARTICLE PANGEA: pipeline for analysis of next generation amplicons Adriana Giongo 1 , David B Crabb 1 , Austin G Davis-Richardson 1 , Diane Chauliac 1 , Jennifer M Mobberley 1 , Kelsey A Gano 1 , Nabanita Mukherjee 2 , George Casella 2,3 , Luiz FW Roesch 4 , Brandon Walts 3,5 , Alberto Riva 3,5 , Gary King 6 and Eric W Triplett 1,3 1 Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA; 2 Department of Statistics, University of Florida, Gainesville, FL, USA; 3 Genetics Institute, University of Florida, Gainesville, FL, USA; 4 Centro de Ciencias Agricolas, Universidade Federal do Pampa, Sa ˜o Gabriel, Rio Grande do Sul, Brazil; 5 Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA and 6 Department of Biology, Louisiana State University, Baton Rouge, LA, USA High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre- processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes- resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the v 2 step, are joined into one program called the ‘backbone’. The ISME Journal (2010) 4, 852–861; doi:10.1038/ismej.2010.16; published online 25 February 2010 Subject Category: microbial population and community ecology Keywords: 16S rRNA; high throughput sequencing; microbial ecology; bioinformatics Introduction The analysis of amplified and sequenced 16S rRNA genes has become the most important single approach for the rapid identification and classifica- tion of prokaryotes. Amplicons from high-through- put sequencing by 454/Roche can generate many thousands of 16S rRNA sequences per sample, and unlike Sanger sequencing it does not require time- consuming clone library construction (Roesch et al ., 2007, 2009a; Hamady et al ., 2008; Liu et al ., 2008)....
View Full Document
This note was uploaded on 01/15/2012 for the course STA 6126 taught by Professor Yesilcay during the Spring '08 term at University of Florida.
- Spring '08