Lecture19 - Lecture 19: Data Mining and Clustering §...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 19: Data Mining and Clustering § Microarray data mining § Hierarchical Clustering Some slides adapted from notes created by Dr. Jaideep Chaudhary Data Mining § Goal: Find novel patterns and correlations in microarray (gene expression) data § Identify groups of genes with the same expression pattern • Co-regulated genes § Use various clustering algorithms to group genes that have similar expression patterns across multiple experiments • Hierarchical clustering • k-means clustering • Self-organizing Maps (SOMs) § Many clustering methods are unsupervised § Clustering simplifes data analysis by grouping together genes with similar expression patterns § Clustering acts as a data reduction method--instead of analyzing the expression levels of 10,000 genes, analyze the expression of 25 gene clusters § Clusters of genes can be represented as an average of the microarray data for the genes in that cluster § Co-regulated (clustered) genes tend to have similar: • Gene functions • Promoter regulatory sequences • Protein complexes § Exception : genes with apparently similar expression patterns may have different control strategies (promoter sequences, etc) Data Clustering Clustering Procedure § Assemble set of genome-wide expression data to be clustered • Includes multiple (2 to 1000+) expression data sets for different experimental conditions or cell types § Normalize/transform and filter expression data to construct a data matrix § Calculate pair-wise distance matrix using the gene expression data matrix (Euclidean distance, etc.) § Use distance matrix to group genes into clusters Gene expression data matrix .... .... .... .... . . . . . . . . . . . . x C,20 x C,2 x C,1 C x B,20 x B,2 x B,1 B x A,20 x A,2 x A,1 A Expt 20 Expt. 2 Expt. 1 Gene Distance matrix . . . d(C,C) d(B,C) d(A,C) C . . . . . . . . . . . . .... d(C,B) d(C,A) C .... d(B,B) d(B,A) B .... d(A,B) d(A,A) A .... B A Gene Clustering Methods § Bottom-up, agglomerative clustering methods • Hierarchical (agglomerative) Clustering § Partitioning cluster methods • k-means clustering • Self-organizing Maps (SOMs) Image from Patrik D'haeseleer (2005) Nature Biotechnology 23, 1499-1501 k-means SOM Hierarchical clustering Data Preparation for Clustering § Gene Filtering • Often want to cluster genes that show significant expression changes in at least one of the experimental conditions being examined • Otherwise just clustering data noise (particularly for variance normalization or scaling methods) • Include a filter to identify those genes that are differentially expressed in...
View Full Document

This note was uploaded on 01/20/2012 for the course MBIOS 478 taught by Professor Staff during the Fall '11 term at Washington State University .

Page1 / 6

Lecture19 - Lecture 19: Data Mining and Clustering §...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online