{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

MIT6_047f08_lec04_slide04

MIT6_047f08_lec04_slide04 - MIT OpenCourseWare...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Clustering Lecture 3 September 16, 2008 Computational Biology: Genomes, Networks, Evolution
Background image of page 2
Structure in High-Dimensional Data Gyulassy, Atilla, et al. "Topologically Clean Distance Fields." IEEE Transactions on Visualization and Computer Graphics 13, no. 6 (2007): 1432- 1439. Structure can be used to reduce dimensionality of data Structure can tell us something useful about the underlying phenomena Structure can be used to make inferences about new data ©2007 IEEE. Used with permission.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Clustering vs Classification • Objects characterized by one or more features • Classification Have labels for some points Want a “rule” that will accurately assign labels to new points Supervised learning • Clustering No labels Group points into clusters based on how “near” they are to one another Identify structure in data Unsupervised learning Expression in Exp 1 Expression in Exp 2
Background image of page 4
Today • Microarray Data K-means clustering Expectation Maximization Hierarchical Clustering
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Central Dogma phenotype DNA mRNA protein We can measure amounts of mRNA for every gene in a cell
Background image of page 6
Expression Microarrays A way to measure the levels of mRNA in every gene Two basic types – Affymetrix gene chips – Spotted oligonucleotides Both work on same principle – Put DNA probe on slide – Complementary hybridization
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Expression Microarrays Measure the level of mRNA messages in a cell DNA 1 DNA 3 DNA 5 DNA 6 DNA 4 DNA 2 cDNA 4 cDNA 6 Hybridize Gene 1 Gene 3 Gene 5 Gene 6 Gene 4 Gene 2 Measure RT RNA 4 RNA 6
Background image of page 8
Expression Microarray Data Matrix Genes are typically given as rows • Experiment are given by columns n experiments m genes
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Clustering and Classification in Genomics Classification ¾ Microarray data: classify cell state (i.e. AML vs ALL) using expression data ¾ Protein/gene sequences: predict function, localization, etc. Clustering ¾ Microarray data: groups of genes that share similar function have similar expression patterns – identify regulons ¾ Protein sequence: group related proteins to infer function ¾ EST data: collapse redundant sequences
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Clustering Expression Data • Cluster Experiments – Group by similar expression profiles • Cluster Genes – Group by similar expression in different conditions Gene 1 Gene 2 Experiment Experiment 1 Experiment 2 Genes
Background image of page 12
Why Cluster Genes by Expression? Data Exploration Summarize data Explore without getting lost in each data point Enhance visualization Co-regulated Genes Common expression may imply common regulation – Predict cis -regulatory promoter sequences Functional Annotation Similar function from similar expression GCN4 His2 Amino Acids His3 Amino Acids Unknown Amino Acids
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Clustering Algorithms • Partitioning – Divides objects into non-overlapping
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}