This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Singular value decomposition for genome-wide expression data processing and modeling Orly Alter* Â² , Patrick O. Brown â€¡ , and David Botstein* Departments of *Genetics and â€¡ Biochemistry, Stanford University, Stanford, CA 94305 Contributed by David Botstein, June 15, 2000 We describe the use of singular value decomposition in transform- ing genome-wide expression data from genes 3 arrays space to reduced diagonalized â€˜â€˜eigengenesâ€™â€™ 3 â€˜â€˜eigenarraysâ€™â€™ space, where the eigengenes (or eigenarrays) are unique orthonormal superpo- sitions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in differ- ent experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expres- sion, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively. D NA microarray technology (1, 2) and genome sequencing have advanced to the point that it is now possible to monitor gene expression levels on a genomic scale (3). These new data promise to enhance fundamental understanding of life on the molecular level, from regulation of gene expression and gene function to cellular mechanisms, and may prove useful in medical diagnosis, treatment, and drug design. Analysis of these new data requires mathematical tools that are adaptable to the large quantities of data, while reducing the complexity of the data to make them comprehensible. Analysis so far has been limited to identification of genes and arrays with similar expression pat- terns by using clustering methods (4â€“9). We describe the use of singular value decomposition (SVD) (10) in analyzing genome-wide expression data. SVD is also known as Karhunenâ€“Loe`ve expansion in pattern recognition (11) and as principal-component analysis in statistics (12). SVD is a linear transformation of the expression data from the genes 3 arrays space to the reduced â€˜â€˜eigengenesâ€™â€™ 3 â€˜â€˜eigen- arraysâ€™â€™ space. In this space the data are diagonalized, such that each eigengene is expressed only in the corresponding eigen- array, with the corresponding â€˜â€˜eigenexpressionâ€™â€™ level indicating their relative significance. The eigengenes and eigenarrays are unique, and therefore also data-driven, orthonormal superpo- sitions of the genes and arrays, respectively....
View Full Document
- Spring '11