lect 16.6 - Lab: From Affymetrix array CEL files to...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lab: From Affymetrix array CEL files to annotated list of interesting genes Jean Wu October 6, 2006 1 Introduction Identifying a list of differentially expressed genes is one of the major applications of microarray experiments. In this lab we will start with Affymetrix CEL files, which contain the raw probe level (feature level) data and create an annotated report of interesting genes. This lab is based on Chapter 25, From CEL files to annotated list of interesting genes by R. A. Irizarry. 2 Reading CEL files Load the affy package, which is needed to import the data into Bioconductor. > library("affy") The probe level data provided by Affymetrix is contained in what we call the CEL files, because these usually have extension .CEL. The function ReadAffy can be used to import these data into AffyBatch objects. One easy way of reading in the CEL files is to have all CEL files in one directory and set the R working directory to that directory. To set the working directory you can use the function setwd, or use the menu commands on either Windows or OS X. Then you can simply use the ReadAffy function, as shown in the code chunk below. > MyData <- ReadAffy() If instead, you have many CEL files and only want to select some of them, you can supply the names of the ones you would like to use as the filenames argument to the function. An example is given in the code chunk below. > ReadAffy(filenames = filenames.of.your.data) The help file for ReadAffy gives more details. For illustration of the rest of this lab we will use an AffyBatch named spikein95 that is already read-in, available through the package SpikeInSubset. You are welcome to try out your own data. > library("SpikeInSubset") > data(spikein95) > [email protected] <- "hgu95av2" 1 2.1 The phenoData These data are a six array subset (2 sets of triplicate arrays) from a calibration experiment performed by Affymetrix. For the purpose of this lab we generate a simple object, pd of class phenoData describing the arrays. The first three arrays are from one population and the next three are from the other population. We will arbitrarily label the first population . > + > > + > pd <- data.frame(population = factor(c("Pop1", "Pop1", "Pop1", "Pop2", "Pop2", "Pop2")), replicate = c(1, 2, 3, 1, 2, 3)) rownames(pd) <- sampleNames(spikein95) vl <- list(population = "Pop1 is control, Pop2 is treatment", replicate = "1, 2, 3, arbitrary numbering") phenoData(spikein95) <- new("phenoData", pData = pd, varLabels = vl) If you have your own data, generate a phenoData object according to your experimental design. The phenoData object can also be created using the function read.phenoData. For more details on this object, please refer to Chapter 7. 3 Preprocessing To convert probe level data to expression measures (preprocessing), we use RMA as example. > eset <- rma(spikein95) If your computer has limited memory this command might fail. In that case you can try to use just.rma, but to use it you must have CEL files. So, if you brought your own CEL files, you can also use something similar to the code below. just.rma. > eset <- just.rma(filenames = list.celfiles()) After this function has run eset is an instance of the class exprSet which contains the expression values and other important experimental information. The expression values calculated by rma are in log base 2 scale. A matrix with the expression information is readily available. The following code extracts the expression data and obtains the dimensions of the matrix containing the data. > e <- exprs(eset) > dim(e) [1] 12625 6 Exercise 1 How many probe sets are there in this dataset? The phenoData information that was stored with the raw data has been copied to the output of the preprocessing function. > pData(eset) > pData(spikein95) You can conveniently use $ to access each column and create indexes denoting which columns represent each population: > Index1 <- which(eset$population == "Pop1") > Index2 <- which(eset$population == "Pop2") We will use this information in the next section. 2 Optional exercise: other preprocessing methods Here we are only demonstrating the use of RMA. Other options are available through the functions mas5 and expresso. If you find expresso too slow, the package affyPLM provides an alternative, threestep, that is faster due to the use of C code. In the next code chunk we create another expression set object, this time using gcrma to normalize the data. > library("gcrma") > eset2 <- gcrma(spikein95) Then, in this next code chunk we make use of threestep from the affyPLM package. There are a number of different options that can be set. In the code chunk below we do not do any background correction, normalization is done using quantile normalization (the default) and a one step Tukey biweight function is used to summarize the data, per probeset. > library("affyPLM") > eset3 <- threestep(spikein95, background = FALSE, summary.method = "tukey.biweight") 4 Ranking and filtering probesets Now we have, in e, a measurement xijk of log (base 2) expression from each probeset j on each array i for both populations k = 1, 2. You can rank the probesets as soon as you select a statistic. 4.1 Summary statistics and tests for ranking Log fold-change A naive first choice is simply the average log fold-change: the rowMeans function works on matrix and provides a much faster alternative to the commonly used function apply: > d <- rowMeans(e[, Index2]) - rowMeans(e[, Index1]) The variability of fold-change measurements often depends on over-all intensity of the probeset in question. An example is average log expression, which we compute in the code chunk below. > a <- rowMeans(e) Exercise 2 Plot d against a (see Figure 1) to see if this applies to your data. Also plot d versus rank(a). Does the variability of the d-values depend on a? Another simple statistic that takes into account the probeset-specific variation is the t-statistic. You can use the function rowttests from the package genefilter, which applies the t–test to each row: > library("genefilter") > tt <- rowttests(eset, "population") The first argument is the exprSet , the second indicates the covariate which defines the two groups to be compared. When there are few replicates the variance is not well estimated and the t-statistic can perform poorly. And alternative statistics that borrow strength across all genes often provide better results. An example of such a modified t-statistic is based on an empirical Bayes approach, implemented in the eBayes function in the limma package. 3 2 1 d 0 d 2 4 6 8 a 10 12 14 −1 −2 −2 0 −1 0 1 2 4000 8000 12000 rank(a) Figure 1: Scatterplot of d against a. > > > > library("limma") design <- model.matrix(~eset$population) fit <- lmFit(eset, design) ebayes <- eBayes(fit) Because the example data set is from a calibration experiment, the truly differentially expressed genes are known. Interested users can refer to the optional exercise to carry out the comparison and confirm the better performance of the moderated versus classical t-statistics in this case. When sample sizes are moderate, say ten in each group there is generally no advantage to using the Bayesian approach. 4.2 Visualization of Differential Expression The volcano plot is a useful way to see the estimate of the log fold-change and statistic you choose to rank the genes simultaneously. Figure 2 plots p-values (more specifically − log10 versus the p-value) versus effect size. For simplicity we assume the t-statistic follows a t-distribution to obtain the p-values. To create a volcano plot you can use the following simple code: > lod <- -log10(tt$p.value) > plot(d, lod, cex = 0.25, main = "Volcano plot for $t$-test") > abline(h = 2) Exercise 3 Generate the volcano plot using a moderated t-statistic (Figure 3). Do they look the same? You can highlight a subset of genes with the points function. For example, you could set the graphical parameters to use blue, (col="blue") diamonds, (pch=18), to denote the top 25 genes ranked by average fold change. 4.3 Highlighting interesting genes Exercise 4 Can you highlight the top 20 genes ranked by smallest p-value or the largest statistic of your choice, with a different color and symbol (as in Figure 4)? (Hint: look at the function points 4 Volcano plot for $t$−test q 4 5 6 q q q lod q 3 q q q q qq q q q q q q q q q q q 0 q qq q q q qq q q q qq q q q q q qq q q qq q q q qq q qq q q qq q q q q qqq q qq qq q q q qq q qq q q q q q qq q q q qq qq q q q q qq q qq q q q q q q qq q q q q qq qqq q q q q q qq qq qqq q q qqqqq qq qq q qqqqq qq q qq q q q q qqq q q q qq qq q q qqqqq q q qq qqq qqq q q q q q qq q qq q q qqqqq q qqqq q q qq qq q q q q qq q qq q qq qq qqqq q qq q q qqqq q qq q q q qq qq q qq q q qq q qq qq qqqq qqqq q q qq qq qq qq qq qqqq q qq q q q qq qq q qq q qq q qqqqq qqqqq q qq q q q qqqq q qq qqqq qqqqq q qq q q q qqq qq qqq q q q qqqqqq q q qqqq q qq qq q q q q q q qqq q qq qqqqq q q qqqqqqq q q qqqq q qq q qqq qqq qq qqqq q q qq qqq q qqq qqq qq q q qq q qqq qqqq qqqq qqq qq q q qq q q qqqq q q qq qq q q qqqqq q q qq qqqq q qq qq q qq q qq q qqqqqqq qqqq q qq q qqqqqqqq qqq q qq q q qq qqqqqq qqqqq qq q qq q qqqq q qq qq qq q qqqq q qq q qq q q q qqqq qqqq q q q qqqqqq qqqqqqq q qqqq qqqqqq q qq q qqqq qqqqq q qqqqqqqq qqqqqqq q qq q qq q qq q qq q q qqqq qqqqqq qqqqqq q qqqqqq qqqqq q qqqqq q q q q qq qq q qq qqqqqqq qqqqq qq q qq qqqqqqqq qqqqqqq q qqqq qqqq qq qqqq qq qqq q qqqqqq q qqqqq q q qq q qq qq qqqqqqqqqq qqqqqqqqq qqqqq qqqq qq qq qq qqqq q qq qq q qqqqqqq qqqqqq q q qqq qqqqqq qqqqqqq qq q qq q qqq qqqqqq q qqqqq qqqqqq q qqqqqqq qqqqqq q q qqqqq qqqq q qqqq qqqqqq qq qqqq qq qq qqqqqq qqqqq q q q q qq qqqq qqqqqqq qqqqqqq qqqqqq qqqq q q q qqqq qqqqqq qqqqqqq qqqq qq qqqqq qqqqq qq q q qq qqq qqqq q q q qqqqqq qqqqqq q q qqq qqqqq q qq q qqq qqq q qq q qq qq q qqqq qqqqqq qqqqqq q qqq qq qqqqqqq qqqqqq q qqqqq qqqqqq q qqqqq qqqq q qq qqqq qqqq q qqqqqqq qqqqqqqqq qqqq qqqq qqqq qqq q qqqqq qq q q qqqqqq qqqqq q qqqqqqq qqqqqq qq qqqqqq qqqqq qqq qqqq qqqq qqqq qq qqqqq qqqqq qqq qqqq q q qq qq qq q qq qqqqqqqq qqqqqq q qqqq qqqq q qqqqqqqq qqqqq q q qqqqqq qqqqq q qqqqqqqqqq qqqqqq qq q q q q qqqqqqq qqqqqq q qqqqq qqqq qq qqqqq qqqqq qq qq q q qqqqqqqq qqqqqq qq qq qq qqq qqqqq qqqqqqqq qqqqqqq qqqq qqqqq q q qqqq qqqq qqqqq q q qqqq q q qqq qqqq q qq q qq qqqqqq qq qq qqq qqqq q qqqqq q qqqqq qqqq q qq qqqqq qqqq q q q qqqqqqqq qqqqqq qqqqqqqqq qqqqqq qq qq q qq q q qqqq qqq q qqqqq qqqq q q q qqqq qq q qq qqqqqq qqqqq qqq q qqqqqqq qqqqqq q qq qq qqqqq qqqq q qqqqqq qqqqqq qqqqqq qqqqq qqq q qqqqqqqq qqqqqqq qqq qq q qq qqqqqqq qqqqqq qqqqq qqqq qqqq qqq qqqqqq qqqqqq qqqq qqqq q qqqq qqqqqq qq q qqqqq qqqqqq qqqqq qqqq qqqqqqq qqqqqqq q q qq qqqqqqq qqqqqqq qqq qq qq q q qqqq qqqqqqq qqqq qqqqq q qqqq qqqq qq qq qqq qqqq qqqqqqqqqqqqqq q qqqq qqqqq qqqqq qqqq q q qqqqq qqqqq qqq qqq q qqqqqq qqqqqq qqq qqq qq q q q qq q qq q qqqq qqq qqqq qqqq qqqqqq qqqqqq qq q qqq qqqqq qqqqq q qqqqqq qqqqqq qqqq qqqq qqq qqq qqqqqq qqqqq q qqqq qqqq qqq qqq qqq q q qqq qq qqqq qqqq qq q qqq q q qqqqqqqqqqq qqqq qqqq qqqqq qqqqq q qqqqqqqqqq qqqqq qqqqq q qqq qqq q qq qqqqq qqqq qqq qqq q q qq qqq qqq qqqqqqqqqqq q qqqqq qqqqq qqq qqq qqqqq qqqq qq qqq qqqq q qq qq qqq qqq qqq qqq qqq q qqqqqqqqq qqqqq qqqq qqq qqq q q qqq qqq qqq qq qq qqqqq qqqqq qqqqqqqq qqq qqq qq q q q q qq qqqqqqq qqq qqqq qqq qqq qq qqqqqqqqq qqq qqq qqq qq qq qq qq q q qqqqqqqq qq qq qq qq qqq qqq qqqqqq qqqqqqq qq qq qqq qq qq qqq qqq qq qq qqqqq qqq qqq qqqqqqqq q qq qqq q qq qqq qqqq q qq qqq q qqqq q qq qqqq qq qqq qq qq qqqq qqqq qqq qqq qq qqqqq qqqq qqq qq qqqqq qqqq qqqq qqq qqq qq qq qq q qqqq q qq qq q qq qq qq q q q q q q q q q 1 2 −6 −4 d −2 0 Figure 2: Volcano plot. with options col="blue" and pch=18.) 4.4 Selecting cut-offs When you get to this section, you have generated three statistics that can be used to rank genes. Now we turn our attention to deciding on a cutoff. A naive approach is to consider genes attaining p-values less than 0.01. However, because we are testing many hypotheses, the p-values no longer have the typical meaning. There are 12625 null hypotheses (this is the number of probesets on the array) in the example data set. If they are all true (no genes are differentially expressed) we expect 0.01 × 12625 = 126.25 false positives. How many probesets with p < 0.01 do you actually see using the t-test? Or with the moderated t-test? > table(tt$p.value <= 0.01) FALSE 12579 TRUE 46 Various approaches have been suggested for dealing with the multiple testing problem. Chapter 15 gives a detailed discussion and the multtest package provides extensive implementations of the different options. Alternatively, the topTable in limma package provides some adjustment to the raw p-values, including Benjamini & Hochberg’s False Discovery Rate (FDR), simple Bonferroni correction and several others. For details look at help files for topTable and p.adjust. The following example lists the top 10 genes and creates a report. > tab <- topTable(ebayes, coef = 2, adjust.method = "BH", n = 10) > genenames <- as.character(tab$ID) Here, coef=2 specifies that we care for the second coefficient in the linear mode fit, that is the line slope, as the parameter of interest. The first coefficient is the intercept. The parameter n indicates how many genes should be selected. 5 Volcano plot for t−test −log10(ebayes$p.value[, 2]) 10 15 q q 5 q q q q q q q q q q q q q q q q 0 q qq q q qq q q q qq q q q q q qq q q q qq q q q qq q qqq qq q q q q qq q q q q qq q q q qq q q q q q qq q q qq q q q qq qq q qq q q q q qq q qqq qqq qq qq qqq qq q qq q q qq q q qqq q qq qq q q q qq q q q q q q q qqqqqq q qq qqq q q q q q qq qqq qqqqq q qq qqq qqq q qq q q qqq qqqqqqq q qqqq q q q q qq q q qq qq qq q q q qqqqqqqqq q q qq q q qq qq q qqqqqqq q q qq qqqqq qq qq qq qq q qq q q q q qqqqqqqqq qqq q qqqqqqqqq qqqq qq q q q qqqqq q qq q qq q q q qqqqqqq qq qq q q qq qq q qq qqq qqqq qqqq qq q q qqq qq qqqq q q q qq qqqqqqq q qq qq q q qqq qqq qqq q q qq q q qq qqqqqqqqqq q qq qq q q q q q q qqqq q qqqq qq q qqqqqqq q q q q q q qqq q qqqqqqqqqqqq q qq q qq q qq qqqqqqqqq qq qq q q qq qq q qq qq q q q q q q qqq q qqqqqqqqqqqq qq q qq q qq qqqq qqqqqqqqqq q q qqqqqqq q qqq qqqqqqqqq qqqqqqqq qq q qq q q q qqq qq q q qqqq q q q q q q q q qq qqqqq qqqqqqqq qqq q q q qq q q qqqqqqq qqq qq q q q q qqqqq qqqqqq q q q q q qqq q qq q qq qq qqqqqqq qqqqqqq q q qqqqqqq q qq qqqqqqq qq q qq q qq q q q q qqqqqq qq q q qqqqqqqqqqqqq qqqqqqq qq qq q qqqqqq q q qqqqqq qqq q qq qqqqqqqqqqqqq qqqqqqqqqqqqq q qqqqqqq qq q qqqqqqqqqqqq qqqq qqq qqqqq q q qqqqq qqqqqqq q qq qq q qq q q qqq q qqqqqqqq qq q qqq q qqq q qqqqq qq qq qqqqqq qqq q qq qq qqqqqqqqqq qqqq q q q qqqqqq qq q qqqqqqqq q qqqqqqqq q q qqqqqqq qqqq qq qq q qqq q qqq qqqqqqq qq qqqqqqqqq q q q qqqqqqqqqqqqqq q qqqqqq q q qqqqqqq q qqqqqqqqqqq q q q q qqqqqqqqqq q q q qq qqqqqqq q qq qqqqqqqqqqq qqqqq qqq q q q qqqqqqqqqq q qqqqqqqqqqqq qqqqqqqq qq qqqq q qqqqqqqq qqqq qqqqqqqqq qqqqqq q qqqqqqq qqq q q qqqq q q qqqqqqqq qqq q q qqqqqq qq q qq q q qqqqqq qq qqqqq qq q qqqqqqqq q qq qqqq q q qqq qqq qqq q q q qqqqqqqq qqqqqqq q qq q q qqqqqq qqqqqqq q qqqqqqqq qqq qqqqq qq qqqqq q qq qqqqqqqqqq qqqqqqqq q qq qqqqqqqqqqqqq qqqqqqq q qqqqq qqqqqqqqq qqqqqqqqq q qqqqqqqq qqqqqq q qqqq q q q qq qq qqqqqq q q q qqqqqqq q q qqqqqqqqq q qq qqq q q qqqqqqqq qqqq q q qqqqqqqqq qqqqqqq q qqq qqqq q qqqqqq q qqq qqqqqq q qq q q qqqqqq qqq q qqqqqq q q qqqqqq q q qqqqqqqqq q q qqqq qqqqqq q q q q qqqqqqqqqq qq qq q q qqqqqqq q q qqqq qqqqqq q qq q qq q qqqqqqqq qqqq q qqqqqq qqqq q qq qqqqq qq q qqqqq qqq qqqqqqqqqq qq q q qqqqqqq qq qqqqqq qq qqqqqqq q qqqqq qqqqqqqq qqq qq qq qqqqqqq q qqqqqq qqqqqqqq qqqq qqqqqqqqq qqqqqqqq qq qqq qqqq qqqqqqq qq q qqqqq qqqqq qq q qqqqq q q qqqqq qq q qqq q qqqqq q qqqqqq qq qq qqqqq qq qqqqq qqqqqq qqqq qq qq q qqqqqq q qqqqqqqq qqq qqq q qqqqq qqq qqqq qqqqq q q qqq q qqqqqq q q qqq qqq qqqqqq q qqqqqq qqqqqqq qq q q qqq qq qqqqqq qqqqqqqqq q qq q qq q qqqqqq q qqqqqq q qqqqqq q qqqqq qq qq q q q q q −1.0 −0.5 0.0 d 0.5 1.0 Figure 3: Volcano plot when using moderated t-statistic. 4.5 Annotation The following is an optional exercise that you can do when you have internet access. First we load the annotate package > library("annotate") and find out which metadata package we need. > annotation(eset) [1] "hgu95a" For most platforms the character returned by annotate function is the name of the annotation package you will need. However, Bioconductor maintains annotation information for both HGU95A and HG-U95Av2 platforms in the same package hgu95av2. Now load the package hgu95av2. > library("hgu95av2") Now you can add more annotation information about our top 10 interesting genes. For example, the EntrezGene ID and gene symbol, > ll <- getLL(genenames, "hgu95av2") 1708_at 36202_at 36311_at 33264_at 32660_at 38734_at 5602 5569 5136 55556 9881 5350 33818_at 39058_at 7415 29 > sym <- getSYMBOL(genenames, "hgu95av2") 1024_at 36085_at 1543 2779 6 Volcano plot 4 5 6 q q q lod q 3 q q q qq q q q q q q q q q q q q q q q q q q q q q q 0 q q q q qq qq q q q qq q q q q q q q q q q q qq q qq q q qq q qq q q qq q q q q q qq q qq q qq q qq q q q q q q q qqq q q qq q q q q q q q q qq qq qq q q q q q q q q qq q q qq qq qq qq qq q qq q q q q q q q q q qq q qq q q qq q q q qq q q q q q q q q q q qqq q qq qq q qq q q q qq qqqq qq qq q q q q q qqq qq q q q q q q q q qq q qq q q qq q q qqq q q qq q q qq q q qq q qq q q q q q qqq q qqqq q qq qq q qq q qq q qq q qq qqqq q q q q qq q q q q qq qq q q q q qq q q q q qq q q qq q q qq q q qq q qqq q qqq q qq q q qq qq q q q q q q qq q q qq qq q q q q qq qqq qq qq q q qq q q q q q qq q q q q q q q q q qq qqq qqqq qqq q q q qq q q q q q qq q q qq q qqq qq q q q q q q q q qq qqq q qq q q qq q q q q q q q qqq q q q qq qq q q qq q qq q qq q qq qq q q q qq qq q qq qqqq q q q q qq qq qqqqqq q q q q qq q q q qq qq q q q q qq q qq q q q qq q q q qq q qq q qq qqqq q qq q q q qq q qqq q qqq qqqqq qqqq qqq q q qq q q q q q q qq q q q q qq q qqq q q q q qqq q qqqqqqqqqqqq q qqq q q q q q q q qq q q qq q qq q qq q qqqq qq qq q q q q q qq qqq q qqq q q q q q qq qqq q qqqqqqq q q q q qqq q q q qqq q q qqq q q q qq q q q q qqq q q q qq q q qq q qq q q qq qq q q q qq q qq q qq qq qq q q q q qq q q q q qq q q qq q q q q q q qqqqqqqqq q q qqqqqqq q qqqqqq qqq qq q q q q q q qq q qqqqqqqqqqqqq q qq q qq q q q qq q q qq q q q q q qqq qqqqqq qq q q qq q q q q q q q q q qqqqq qqq qqqq q qqq q q q q q q q qqq q qq q qq q q q q qqqqqq qqqqqqqqq q qq q q q qqqq qqqq qq qqqqq q q qq q q qq qq q qqqqqqqqqqq q qq q q q q qq q q q qqqqqq qqq q q q q q qq qqqq q qqqqqqq qqqqq q q q qq q q qq qqqq q q qqqq qqq qq q q q q q q qqq q q q q qqq q qqqqqqqqqq q q q qq qq qq q q q q q q qqq q q q q qq q q qqq qqq qqqqq q q qq qq qq qqqq q q qq q qq q q qqq q qq q qqq qq q q q q q q qq qq qq q q qq q q q q q q q q qq q q qq qq q q qq q q qqqqqqq qqqq q q qq qq q q qq qqqqqq q q qq q q q q q q qqqqq qqqqqq qqq q q qq qqq qqq q qq q qqq q qq q q qqqqqqq qqqqq qq qqqq q qq q q qqqq q qqqq qqqq q q qq q qq q q q qq q qq q q qqqq qqqqqqqqqqqqq q qq qq qq q q q q q qq q qqqqqq q q qqq qqqqqqqqqq q q q q q q q qqqq q q qq q q q q q q qq qqqqq qqq qqq qqq q q q q q qqq qqqqqqqqqqqqqqqqq qqq qqq qqq q q q q qqq qqqqqqqqqq qq qqqq qq q q q qq qqq qqqqqqq q qqq qq q qq q q q qqqqqqqqqqqqq q q q qqq qq q q qq q q q qqq qq q qqqq q qq q qqqqqqqqqqqqqqqqq qqq qq q qq q q qq qqq qq q q q q q qq q qqqq q q qq q qqqqqqqqqqqqqqqqq q q qqq q qqqqqqq q qqq q qqqqqqqqqqqqqqqqq qq qq q qqqqqqqqqqqqqq qqq q qq q qqq q q q q qqqqqqqq qqqqqqqq q q q qq q q qqq qq qqq q q q q qqqqqqqq q q q qqqqqqqqqqqqqqqq qq q qq q qq q qqqq q qqqqqqqqqqqqq qqq q q qqq qqqqqqqqqq qq q qq q q q q qqqqqqqqq q qq q q q q qq q q qqqqqqqqqqqqqqqqq q q q q q qqqqqqqqqqqqq qq qqqqq q qq qq q qqqqqqqqqqqqqq qqqqqqqqqqqqqqqqq q q q q q qqq q q qq q q q q q q q q qq qqqqqqqqqqqqqqqq q q qqqqq qqqq q q qq qqq qqqqqqqqqqqq qqq qqqqqqqqqqq q q qqqq q qqqq qqqqq qqqqqq q q qq q q qq q qqq qqqq q q qq q qqqqqqqq qqq q q qqqqqqqqqqqqqqqqqq qqq qq qqq qqq q qqqqqq qq q qqqq q q q q qq qq q qqq qqqqqqqqqqqq qqqqqqqqq qqqq q q q qqq q qqqqq qqqq q qqqqqqqqqq q qqqq qqq q q qq qq q q q qqq qq q q qq qq q qqqq q q qqqqqq qqqqqqqq q qq qq qqqqqqq qq q q q q qqq q qqqqqqqqq qqq qqqq q qq qq q qq q q q qq q q qqq q qq q q q qq qqqqqqqq qqqqq qq q qqqqqqqqqq qqqqq q q q qqq qqq qq q qq q q qqq q qqqqqqqqqqqqqqqqqqqqqq q q qqqqqqq qqq q q q q q qqqqqqqqqqq qqqqq qq q qqqqqqqqqqqq q q q q qq qqqqqqqqqqqqqqq q qq q q q qq q q qqqqqqqqqqqqqqq q qqqqqqq q q q qqqqqqqqq qqq q qqqqqq qq q q qq q qq q q qqq qqqqqqqqqqqq qq qqqqqqqqqqqq qq qqqqqqqqqq q q qqqqq qqqqqqqqqqq qqqq q q q q qqqq qqqqq qq q qq q qqq q q q q q qqqqqq qqqqqqqqqqq qq q q qqqqqqqqqqqqqq q qqqqqqqqqqqqqqq q qq q qqqq qqqq q qq qqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqq qqq qq q q q q q qqq q qqq qq qqqq qqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqq q q qqqq qq qq qqqqqqqqqqqqqqq q qq q qq q q q q q qq q q q qq qq qqqqqqqqq qq q q qq q q q q q qq qq qqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqq qq q q q q q qqqqqq qqqq q q q q qqq qqqqqqqqqq qqqq q qqqqqqqqqqqqqqqqqqqqq qq q qqqqqqqqqqqqqq qqqqq qqqq q q q qqqqqqqqqqq q qqq qq qqq q qqqqqqqqqqqqqqqqqq qq q qqq qqqqqqqqqqq q q qq q q qqqqq qq q q q qqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqq qq q q qq q qq qqqqqqqqqq qq q qq qq qqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqq q qq q qq qqq qqqqqqqq qq q qqqq qqqqq qqq q q q q qq q q qqqqqqqqqq q qq qqqqqq qqq qqqqqqqqqqqqqqq qqqqqqqqqqqqqqq qq qq qqqq q qq q q qqqqq qqqqqqqqqqq q qqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqq qq qq qqqqqqqqqqqqq q q q q q qqqqqqqqqqqqqqqqqqq q q qqqqqqqqqq q qqqq qqq q q q qqqq qqqqqqqqq qq q q qqqqqq qqq qq q qqq qq qqqqq q qqqqqqqqqqqqq qq q qqqqqqqqqqq qq qqq qq qqqqqqqqqqqqqqqqq q qq qq q qqqqqqqqqqqqqqqq qq qq qqqqqq q qqqqqqqqqq q q qqqqqqqqqq qqqqqqqqq qqqqqqqqqq q qqqqqqq q q qqqqqqqqqqqqq q q q qq qqqq qq q qqqqqqqqqqqqq qq q qqqqqqqqqq q qqqqqq qq q qq q q qqq q qqqqqqqqqq q q qq qqqqqqqqqqqqqq q q q q qqqqqqqqqqqqqq qqqqqq qq q qq q q qqqqqqqqqqqqqqq qqq q qqqqqqqqqqq q q qqqqqqqqqqqqq q qqqqqqqqq q q qqqqqqqqq q q qqqqqqqqqqqqq qqqqqqqqqqqqq q q q qqqqqqqq q qq q qqqq qqqqqq q qqqqqqqqqqqqq qqqqqqqqqqqqqqqq qq q qqqq qq q q qq qq qqqqqqqqq qqq q q qqqqqqqq qqq qqqqqqqqq qqqqqq q q q qqqqq qqq q q q qqqqqqqqqqqqqqq qqqqqqqqqqqqq qq q qqqqqqqqqqqqq qqqqqqqqqqqqq q qqqqq q qqqqqqqqqq q q qqqqqqqqqqq q qqq q q qqqqq qqqqq q q qqqqqqq q qqqqqqqqq q qqqqqqqqqqq qqqqqqqqqqq qqqqqqqqq qqqqqq q qq q q q qq q qqqqqqqqqqq qqqqqqqqqqqqq q q qqqqqqq q qqqqqqqq q qqq qqqqqqqqq qqqqqqqq q q qqqqqq qqqqqq qq qq q q q qqqqqqqqqqqqq qqqqqqqqqqqqq q qqqqqq qq qqqqqqqqq qqqqqqqqqqq qqqqqq qqqqqq q qqqqqqqqqqqq qqqqqqqqqqqq q qq qqq q qq qqq qq q qqqqqqqqqq qqqqqqqqq q qqqqqqq qqqqqq q qqqqqqqqq qqqqqqqqq qq q qqqqqqq qqqqqqqq qqqqqqqq qqqqqqqq qq qqqq qqqq q qqqqqqq qqqqqqq qqqqqqqqq qqqqqqqqqq qqqqqqqqq qqqqqqqq q qq q q qqqqqqq qqqqqqq q qqq qqqqq q q qqqqqqqq qqqqqqqq q qqqqqq qqqqqq q qq qqqq qqqq qq qqqqqq qqqqqq qq qq qqqqq qqqq q qqqqqq qqqqq q qqqqqqqqq qqqqqqqq q qqqqq qqqq q qqqqq qqqqqq q qqq qqq qqqqqq qqqqqq q qqqq qqqq qqqqqqq qqqqqqq qqq qq q qqqq qqqqq qqq qqqqq qqqqqqqqqq qqqqqqqqqq qqq qqq qqqqqqq qqqqqqqq qqqqq qq qqqq qq qqqqq qqqqqqq qqq q q qqq qqq qqq q q qqq q q q q q q qq qq q q 2 q q q 1 q q −1.0 −0.5 0.0 d 0.5 1.0 Figure 4: Volcano plot with highlighting of the top 25 probesets. 1708_at 36202_at 36311_at 33264_at 32660_at 38734_at 1024_at 36085_at "MAPK10" "PKIA" "PDE1A" "ENOSF1" "LBA1" "PLN" "CYP1A1" "GNAT1" 33818_at 39058_at "VCP" "ABR" With these values available, we can use the following code to create an HTML page, useful for instance to share results with collaborators. > tab <- data.frame(sym, signif(tab[, -1], 3)) > htmlpage(ll, filename = "GeneList1.html", title = "HTML report", + othernames = tab, table.head = c("Locus ID", colnames(tab)), + table.center = TRUE) Look for the resulting file GeneList1.html in your R working directory. > browseURL("GeneList1.html") Look for a file named GeneList1.html in your R working directory. The above HTML report only connects with EntrezGene. To create a report with more annotation information, we can use the annaffy package. Below are a few lines of code that create a useful HTML report HTML report with links to various annotation sites. > > > > > library("KEGG") library("GO") library("annaffy") atab <- aafTableAnn(genenames, "hgu95av2", aaf.handler()) saveHTML(atab, file = "GeneList2.html") aaf.handler() returns 13 annotation types. You can also select a subset instead of using all of these. 7 > aaf.handler() > atab <- aafTableAnn(genenames, "hgu95av2", aaf.handler()[c(2, + 5, 8, 12)]) > saveHTML(atab, file = "GeneList3.html") 8 ...
View Full Document

This note was uploaded on 07/29/2010 for the course BIOC BIOC2808 taught by Professor Dr.jjwang during the Fall '09 term at HKU.

Ask a homework question - tutors are online