This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Normalization Normalization Methods for Two-Color Microarray Data
1/24/2008 Peng Liu Normalization does not necessarily have anything to do with the normal distribution that plays a prominent role in statistics.
1 2 Possible levels of normalization
Within-slide normalization Between-slide normalization Paired-slides normalization for dye-swap experiments Assumptions for normalization
Idea: some genes are comparable (constant level) between dyes and between slides so that we can `calibrate' to the intensity data (transformed) so that meaningful biological comparisons can be made. Assumptions for the genes used for normalization:
We have equal total amounts of RNA in each sample. The average gene expressions don't change for different treatments. The arrayed element is a random sampling of genes.
5 6 Normalization describes the process of removing (or minimizing) non-biological variation in measured signal intensity levels so that biological differences in gene expression can be appropriately detected. Aim: Normalization
The ideal procedure would reach some `universal' measurement of mRNA copies per cell so that we can compare biological differences independent of genes, technology and experiments. However, this can not be done now. We just talk about normalization for a single experiment that include a set of slides. Sources of non-biological variation
Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (here channel is used to refer to a particular slide/dye combination.) Variation across replicate slides Variation across hybridization conditions Variation in scanning conditions Variation among technicians doing the lab work
3 Which genes to use
All genes on the array Suitable if we expect Which genes to use
Constantly expressed genes, usually `housekeeping' genes These genes are not expressed differentially among different treatment/conditions. In this sense, they are good standard to calibrate between channels. However, these genes often expressed relatively highly and may not represent the whole range of expression levels.
7 8 etc. 4 1 2 Which genes to use
Control genes, e.g.,
spiked controls: synthetic sequences (or sequences from a different organism) are spotted on slide and included in the samples for hybridization a titration series of control sequences corresponding to genes that will not change expression in response to treatments of interest: spot them on slide Notations:
For a spot j, j=1,...,N, (N is the total number of spots on a slide), let Rj and Gj denote the background corrected intensities for the red and green dyes respectively. Rj and Gj = for red and green respectively, if we use local background correction. Log intensity ratio: Average log intensity:
10 Dye effect
Possible reasons :
Physical properties of dyes (heat and light sensitivity, half life) Efficiency of dye incorporation Scan settings (laser voltage setting, photon emission response to laser excitation) Etc. A real data example The idea is to determine the adjustment necessary to normalize the control genes and then make that same adjustment to all genes on the array.
9 Result: between two channels on the same slide
13 14 Log Red Log Green Within-slide normalization
This is done separately for each slide. The purpose is to make red intensities and green intensities comparable. It is known that for self-self experiments (for which the same sample is labeled with Cy3 and Cy5), the green intensities tend to be higher than the red intensities. And the difference is not constant across spots.
11 Figure 2 from Dudoit et al, 2002, Statistica Sinica Normalization: M vs. A Plot (45o rotation) LOWESS normalization
LOWESS stands for "LOcally WEighted polynomial regreSSion". The original reference for lowess is
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. JASA 74 829836. Log Red-Log Green = M It was recommended to correct the intensity related dye bias by Yang, et al. (2002. Nucleic Acids Research, 30, 4 e15) and Dudoit et al, 2002, Statistica Sinica.
(Log Green+Log Red)/2 = A
15 16 12 3 4 LOWESS Fit After normalization Log Red-Log Green (Log Green+Log Red)/2 17 Normalized M A 18 Assigned Readings
Yang, et al. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation, Nucleic Acids Research, 30, 4 e15 Yang, et al. (2002). Normalization for cDNA microarray data, Technical Report 589, Department of Statistics, UC Berkeley 19 5 ...
View Full Document