This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Lecture 19: Data Mining and Clustering § Microarray data mining § Hierarchical Clustering Some slides adapted from notes created by Dr. Jaideep Chaudhary Data Mining § Goal: Find novel patterns and correlations in microarray (gene expression) data § Identify groups of genes with the same expression pattern • Co-regulated genes § Use various clustering algorithms to group genes that have similar expression patterns across multiple experiments • Hierarchical clustering • k-means clustering • Self-organizing Maps (SOMs) § Many clustering methods are unsupervised § Clustering simplifes data analysis by grouping together genes with similar expression patterns § Clustering acts as a data reduction method--instead of analyzing the expression levels of 10,000 genes, analyze the expression of 25 gene clusters § Clusters of genes can be represented as an average of the microarray data for the genes in that cluster § Co-regulated (clustered) genes tend to have similar: • Gene functions • Promoter regulatory sequences • Protein complexes § Exception : genes with apparently similar expression patterns may have different control strategies (promoter sequences, etc) Data Clustering Clustering Procedure § Assemble set of genome-wide expression data to be clustered • Includes multiple (2 to 1000+) expression data sets for different experimental conditions or cell types § Normalize/transform and filter expression data to construct a data matrix § Calculate pair-wise distance matrix using the gene expression data matrix (Euclidean distance, etc.) § Use distance matrix to group genes into clusters Gene expression data matrix .... .... .... .... . . . . . . . . . . . . x C,20 x C,2 x C,1 C x B,20 x B,2 x B,1 B x A,20 x A,2 x A,1 A Expt 20 Expt. 2 Expt. 1 Gene Distance matrix . . . d(C,C) d(B,C) d(A,C) C . . . . . . . . . . . . .... d(C,B) d(C,A) C .... d(B,B) d(B,A) B .... d(A,B) d(A,A) A .... B A Gene Clustering Methods § Bottom-up, agglomerative clustering methods • Hierarchical (agglomerative) Clustering § Partitioning cluster methods • k-means clustering • Self-organizing Maps (SOMs) Image from Patrik D'haeseleer (2005) Nature Biotechnology 23, 1499-1501 k-means SOM Hierarchical clustering Data Preparation for Clustering § Gene Filtering • Often want to cluster genes that show significant expression changes in at least one of the experimental conditions being examined • Otherwise just clustering data noise (particularly for variance normalization or scaling methods) • Include a filter to identify those genes that are differentially expressed in...
View Full Document
This note was uploaded on 01/20/2012 for the course MBIOS 478 taught by Professor Staff during the Fall '11 term at Washington State University .
- Fall '11