This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS573: Homework 1 Solution 1 Elements of Data Mining (5 pts) Read the following paper at a highlevel (don’t worry about the lowlevel details): M. Deodhar and J. Ghosh (2006). Consensus Clustering for Detection of Overlapping Clusters in Microarray Data. Proceedings of the ICDM 2006 Workshop on Data Mining in Bioinformatics (DMB 2006) . (http://www.ideal.ece.utexas.edu/papers/deodhar06overlap.pdf) Identify the following components of the work: 1. The task The task in this paper is to cluster genes into groups, such that each gene can be a member of more than one group. 2. The data representation The data representation is tabular, iid data with sets of individual measurements for each gene, specifically microarray data with gene expression levels under a range of environmental stress conditions. 3. The knowledge representation MCLA: Cluster assignments for each data point, each instance can be associated with more than one cluster. SKK: Probability distribution for each data point, representing the likelihood that the instance was generated from each cluster. 4. The learning technique (search method + scoring function) MCLA: Search is inside the METIS hypergraph partitioning algorithm, scoring func tion is not described. SKK: Search is expectation maximization (EM) to maximize likelihood with unknown cluster assignments, scoring function is ∑ n i =1 ∑ k j =1 h c j x i log ( P ( c j ) P ( ψ ( x i )  θ j )) . 5. The inference technique (if applicable) and evaluation method The results of the clustering are evaluated using precision, recall, and Fvalue. The clustering is not applied to new data, the cluster assignments on the sample data are compared to ground truth groupings (i.e., true class labels). 2 Probability (3 pts) Suppose that we have three colored boxes r (red), b (blue), and g (green). Box r contains 3 apples, 4 oranges, and 3 limes; box b contains 1 apple, 1 orange, and 0 limes; box g contains 3 apples, 3 oranges, and 4 limes. If a box is chosen at random with probabilities p ( r ) = 0 . 2 ,p ( b ) = 0 . 2 ,p ( g ) = 0 . 6, and a piece of fruit is removed from the box (with 1 equal probability of selecting any of the items in the box), then what is the probability of selecting an apple? If we observe that the selected fruit is in fact an orange, what is the probability that it came from the green box? Solution : P ( a ) = P ( r ) P ( a  r ) + P ( g ) P ( a  g ) + P ( b ) P ( a  b ) = 0 . 2 * . 3 + 0 . 2 * . 5 + 0 . 6 * . 3 = 0 . 34 P ( g  o ) = P ( g,o ) P ( o ) = P ( g ) P ( o  g ) P ( o ) = . 6 * . 3 . 2 * . 4 + 0 . 2 * . 5 + 0 . 6 * . 3 = 0 . 5 3 Probability distributions (3 pts) The form of the Bernoulli( p ) distribution is not symmetric between the two values of X ....
View
Full Document
 Spring '11
 Dewey
 Normal Distribution, probability density function, Maximum likelihood, Estimation theory, euclidean distance

Click to edit the document details