mcl - MCL (and other clustering algorithms) 858L Comparing...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
MCL 858L (and other clustering algorithms)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of Fnding protein complexes: Used same MIPS complexes that we’ve seen before as a test set. • MCODE • RNSC – Restricted Neighborhood Search Clustering • SPC – Super Paramagnetic Clustering • MCL – Markov Clustering Created a simulated network data set.
Background image of page 2
Simulated Data Set 220 MIPS complexes (similar to the set used when we discussed VI-Cut and graph summarization). A add,del := this clique graph with (add) % random edges added and (del)% edges deleted. Created a clique for each complex. Giving graph A (at right) (Brohee and van Helden, 2006) A 100,40 = Also created a (!?) random graph R by shufFing edges and created R add,del for the same choices of (add) and (del).
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
RNSC RNSC (King, et al, 2004): Similar in spirit to the Kernighan-Lin heuristic, but more complicated: 1. Start with a random partitioning. 2. Repeat: move a node u from one cluster to another cluster C, trying to minimize this cost function: 3. Add u the “FIXED” list for some number of moves. 4. Occasionally, based on a user de±ned schedule , destroy some clusters, moving their nodes to random clusters. 5. If no improvement is seen for X steps, start over from Step 2, but use a more sensitive cost function: # neighbors of u that are not in the same cluster + # of nodes co-clustered with u that are not its neighbors Approximately: Naive cost function scaled by the size of cluster C
Background image of page 4
MCODE Bader and Hogue (2003) use a heuristic to fnd dense regions oF the graph. Key Idea. A k -core oF G is an induced subgraph oF G such that every vertex has degree k . 2-core Not part of a 2-core u A local k-core(u, G) is a k -core in the subgraph oF G induced by {u} N(u). A highest k-core is a k -core such that there is no (k+1)-core.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
MCODE, continued 1. The core clustering coefFcient CCC(u) is computed for each vertex u : 2. Vertices are weighted by k highest ( u ) × CCC( u ) , where k highest ( u ) is the largest k for which there is a local k -core around u . 3. Do a BFS starting from the vertex v with the highest weight w v , including vertices with weight TWP × w v . 4. Repeat step 3, starting with the next highest weighted seed, and so on. CCC( u ) = the density of the highest, local k -core of u . In other words, it’s the density of the highest k -core in the graph induced by { u } N( u ). “Density” is the ratio of existing edges to possible edges.
Background image of page 6
MCODE, fnal step Post-process clusters according to some options: Filter. Discard clusters if the do not contain a 2-core. Flu±±. For every u in a cluster C, if the density of {u} N(u) exceeds a threshold , add the nodes in N(u) to C if they are not part C 1 , C 2 , . .., C q . (This may cause clusters to overlap.) u v C i C j Haircut. 2-core the ±nal clusters (removes tree-like regions).
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Comparison – 40% edges removed; varied % added % of added edges Geometric Accuracy = GeoMean(PPV, Sn) MCL RNSC SPC MCODE Representative test; MCL generally outperformed others.
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Page1 / 30

mcl - MCL (and other clustering algorithms) 858L Comparing...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online