This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Importing ArrayExpress datasets into R/Bioconductor. Audrey Kauffmann 1 , * , Tim F. Rayner 2 , Helen Parkinson 1 , Misha Kapushesky 1 , Margus Lukk 1 , Alvis Brazma 1 , Wolfgang Huber 1 1 EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 2 Cambridge Institute for Medical Research, Addenbrooke’s Hospital, Cambridge CB2 0XY, UK. ABSTRACT Summary: ArrayExpress is one of the largest public repositories of microarray datasets. R/Bioconductor provides a comprehensive suite of microarray analysis and integrative bioinformatics software. However, easy ways for importing datasets from ArrayExpress into R/Bioconductor have been lacking. Here we present such a tool that is suitable for both interactive and automated use. Availability: The ArrayExpress package is available from the Bioconductor project at http://www.bioconductor.org. A users guide and examples are provided with the package. Contact: [email protected] INTRODUCTION ArrayExpress is a public database for high-throughput functional genomics data (Parkinson et al., 2009). It consists of a repository, which is a MIAME (Brazma et al., 2001) supportive public archive of microarray data, and an added value gene expression Atlas created from the repository data. Currently, nearly 8000 experiments comprising 230 000 arrays are available from ArrayExpress. Retrieving publicly available data for analysis is a repetitive and error prone task for which automation is desirable. As Bioconductor (Gentleman et al., 2004) contains many widely used tools for the data analysis, tools to make a connection with public databases are useful. The GEOquery package (Davis and Meltzer, 2007) was developed to load GEO datasets into Bioconductor, and the RMAGEML package (Durinck et al., 2004) was designed to import the MAGE-ML files that in the past were used by ArrayExpress for data transfer. The ArrayExpress database now supports the MAGE-TAB format (Rayner et al., 2006), a metadata- rich, but much simpler and more resource-efficient format based on tab-delimited files and all data are made available in this format. We have developed the ArrayExpress package for R/Bioconductor to query ArrayExpress and convert MAGE-TAB formatted datasets from the ArrayExpress repository into objects of the Bioconductor class for microarray datasets, eSet . MIAME MIAME is a guideline that describes the Minimum Information About a Microarray Experiment needed to ensure interpretation of a microarray dataset. It has five elements: (i) the raw data for each hybridisation, (ii) the final processed data for the set of hybridisations in the experiment, (iii) the experiment design including sample data relationships and the essential sample annotation including experimental factors and their values, (iv) sufficient annotation of the array design, (v) essential laboratory and data processing protocols....
View Full Document
- Fall '09
- DNA microarray, ArrayExpress package, ArrayExpress datasets, ArrayExpress function