ps4answers2002 - Harvard-MIT Division of Health Sciences...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Problem Set 4 Problem 1: Clustering (33 points) Microarray and DNA chip technologies have made it possible to study expression patterns of thousand of genes simultaneously. The amount of data coming out of these efforts is overwhelming. A powerful strategy for analysis of microarray data is the clustering of expression profiles. Expression profiles can be clustered by gene or by condition. Golub et al. ( Science , 286 , 531-7. pdf , supplemental website ) clustered different types of leukemia expression data using non-hierarchical Self-organizing Maps (SOMs). Now you will write a Perl program to cluster the same data using an alternative hierarchical clustering algorithm. I) I) Briefly describe the two major goals of this paper. (2 pts) a. a. Cancer class discovery (1 pt) b. b. Cancer class prediction (1 pt) II) II) Describe the major steps of the SOMs training algorithm without using code. (4 pts) a. a. Define map: define the topological relations and the number of neurons according to the input data and expected number of clusters (1 pt) b. b. Initialization: initialize the weight vector with random sample vectors from the training dataset (1 pt) c. c. Random selection: randomly choose one sample vector from the input dataset, and calculate similarity measure between it and all weight vectors in the map (1 pt) d. d. Update map: find the weight vector that has the greatest similarity with the input vector, and update the surrounding weight vectors (1 pt) e. e. repeat step c and d for predefined number of steps III) III) The authors used Affymetrix GeneChip, which is very different from ratio- based cDNA microarray in the way of measuring expression level of RNA. Data from several different GeneChip microarrays should be normalized before being compared to each other. Describe why normalization is needed, and how the authors normalized their data. (4 pts) a. a. Affymetrix GeneChip is a ‘one-channel’ platform where only one fluorescent dye is used. Expression levels are determined by the difference of fluorescent intensities between the ‘perfect match’ (PM) probes and ‘mismatch’ (MM) probes for each gene, and absolute values are reported rather than ratios as in the cDNA microarray. The overall brightness (intensity) of a chip may vary from experiment to experiment due to various reasons ranging from sample preparation to chip scanning. Therefore normalization to make all chips into the same brightness is needed to compare these absolute values across different experiments. (2 pts)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
b. b. Quoted from http://www- s.txt: “Intensity values have been re-scaled such that overall intensities for each chip are equivalent. This is done by fitting a linear regression model using the intensities of all genes with "P" (present) calls in both the first sample (baseline) and each of the other samples. The inverse of the "slope" of the linear regression line becomes the (multiplicative) re-
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/24/2010 for the course HST. 508 taught by Professor Dr.georgechurch during the Fall '02 term at MIT.

Page1 / 19

ps4answers2002 - Harvard-MIT Division of Health Sciences...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online