This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 1 6.874/6.807/7.90 Computational functional genomics, lecture 7 (Jaakkola) Expression profiles, clustering, and latent processes If we have multiple array measurements concerning cell populations under different treat- ments relative to control, we can put together an expression profile for each gene: its expression across the different treatments. Such profiles are useful in finding genes that behave similarly across the different experiments, presumably because they participate in the same processes that are activated (or suppressed) due to the treatments. We can also identify treatments that lead to similar expression responses, i.e., have similar consequences as far as transcriptional regulation is concerned. There are many diculties associated with this type of cluster analysis. Since biological processes are not independent of each other, many genes participate in multiple different processes. Each gene therefore should be assigned to multiple clusters whenever clusters are identified with processes. We also shouldnt necessarily expect to find gene profiles that look the same over all the experiments. The similarities would be restricted to those experiments that tap into the processes common to both genes. The available experiments may exercise only a fraction of the underlying processes. Thus the profiles of two genes might look the same because we have not carried out the experiments where they would differ. Finally, the cell populations are not uniform but may involve different cell types. Genes with similar aggregate profiles, averaged over the cell types, may look the same even though they differ substantially for each cell type. Clustering For simplicity we will pay attention only to the log-ratio measurements from each array experiment, omitting the fact that it would be better to look at the actual intensity mea- surements from the two channels (see lecture 6). Let x it denote the log-expression ratio for gene i in experiment t . We assume that there are n experiments (on the order of tens) and m genes (in the thousands). The available expression data can be put into a matrix form X = x 11 x 12 x 1 n x 21 x 22 x 2 n x m 1 x m 2 x mn where each row represents a gene profile and each column defines a treatment/tissue/experiment profile. We use a special notation for gene profiles g i = [ x i 1 , . . . , x in ] T , cast here as column vectors. Our goal here is to group together (cluster) gene profiles so as to capture genes that participate in the same biological processes. 2 6.874/6.807/7.90 Computational functional genomics, lecture 7 (Jaakkola) Hierarchical clustering Perhaps the simplest approach to clustering is hierarchical agglomerative clustering. Each profile initially represents a separate singleton cluster. The algorithm successively merges two most similar clusters into a larger one. The resulting clustering...
View Full Document
This note was uploaded on 11/11/2011 for the course BIO 7.344 taught by Professor Bobsauer during the Spring '08 term at MIT.
- Spring '08