modularity

modularity - Modularity CMSC 858L Module-detection for...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Modularity CMSC 858L Module-detection for Function Prediction • Biological networks generally modular (Hartwell+, 1999) • We can try to find the modules within a network. • Once we find modules, we can look at over-represented functions within a module, e.g.: • If a majority of the proteins within a module have annotation A, predict annotation A for the other proteins in the module. ⇒ Graph clustering methods - Min Multiway Cut, Graph Summarization, VI-Cut: examples we’ve already seen. - Methods often borrowed from other “community detection” applications. Modularity Modularity eii = % edges in module i eii=|{(u,v) : u ∈ Vi, v ∈ Vi, (u,v)∈E}| / |E| 2 i ai = % edges with at least 1 end in module i ai=|{(u,v) : u ∈ Vi, (u,v) ∈ E}| / |E| Modularity is: Q= k ￿￿ i=1 probability a random edge would fall into module i eii − 2 ai ￿ probability edge is in module i 3 High modularity ⇒ more edges within the module that you expect by chance. Examples 28 5 22 4 27 3 6 23 16 7 19 8 12 21 13 14 24 2 29 9 9 17 10 3 5 1 10 20 1 25 30 11 2 7 26 8 Communities Assigned to a small graph Note: maximizing modularity will find it’s own # of clusters 6 15 4 18 Communities assigned to a random graph Modularity Algorithm #1 • Modularity is NP-hard to optimize (Brandes, 2007) • Greedy Heuristic: (Newman, 2003) - C = trivial clustering with each node in its own cluster Repeat: • Merge the two clusters that will increase the modularity by the largest amount • Stop when all merges would reduce the modularity. Karate Club (again) Newman-Girvan, 2004 Only 3 is in the “wrong” community. Maximizing Modularity via a Spectral Technique Another View of Modularity adjacency matrix normalization 1 Q= 4m ￿ ￿ i,j ki kj Aij − 2m in same module ￿ probability a random edge would go between i and j m = # edges in graph ki = degree(i) Consider the case of only 2 modules. Let si = 1 if node i is in module 1; -1 if node i is in module 2 Q= = 1 4m 1 4m ￿ ￿ i,j ￿ ￿ i,j ki kj Aij − 2m ki kj Aij − 2m ￿ ￿ (si sj + 1) si sj Goal: Maximize modularity • Try to find ±1 vector s that maximizes the modularity. • Start with the case above: only two groups. • Then show how to extend to ≥ 2 groups. • Will use some ideas from linear algebra. 1 4m Q= ￿ ￿ i,j ki kj Aij − 2m 1T s Bs 4m = “modularity” matrix ￿ si sj s is a {-1,1} membership vector Let ui (i = 1,...,n) be the eigenvectors of matrix B with eigenvalue βi for vector ui. (Assume β1 ≥ β2 ≥ β3 ≥ β4 ≥ ... ≥ βn) Write s as: s= ￿ i where: ai ui ai = T ui s s= ￿ ai = ai ui T ui s i drop the (1/4m) 1T Q= s Bs 4m ￿ ￿ ￿ ￿ T aj uj = ai ui B j i = ￿ ￿ i = ￿ ￿ aj uj ai uT B i ￿￿ i j ai aj uT Buj i j Note: 1. Buj = βi uj 2. When i ≠ j, uiTBuj = 0 because ui ⊥ uj Q= ￿ i T2 (ui s) βi To Maximize Q Q= ￿ T2 (ui s) βi i • If we were allowed to choose any s we’d pick the one that is parallel to u1. • But: si must be +1 or -1. This is a severe restriction. • • So: maximize u1⋅s, the projection of s along vector u1. To do this: choose si = 1 if u1 > 0, and si = -1 if u1 ≤ 0. Subsequent Splits The modularity if this module was split according to s The modularity of module g as it stands now ￿ 1￿ 1￿ Bij si sj + Bij − Bij = 2 i,j ∈g 2 i,j ∈g i,j ∈g ￿ i,j ∈g Bij = ￿ i,j ∈g si sj δi,j ￿ g Bik k∈g +1 -1 Karate Club Results: Exactly Right (Newman, 2006) Greedy Improvement • • Given a partition of the network Repeat: largest increase might be negative • Find the vertex that would yield the largest modularity increase if it were moved into a different community AND that has not yet been moved Move the vertex into that new community Return the best partitioning ever observed Similar to the Kernighan-Lin graph partitioning heuristic (details in a few slides) Additional Results Girvan-Newman (betweenness) Newman Spectral Greedy Hierarchical Newman, 2006 Krebs Political Books Nodes = political books; shape = conservative (squares) / liberal (circles) / “centrist” (triangles) Edges = books frequently bought by the same readers on Amazon.com Complexes “+” indicates parameters tuned to maximize precision Biological Processes All GS predictions are Pareto optimal Many unique predictions made by each algorithm % Modules Enriched A lower % of GS modules are enriched for some annotation, but not indicative of predictive performance. “Easy” to get legitimate statistical signiBicant enrichment. Summary: Modularity • Modularity is widely used as a measure for how good a clustering is. • Particularly popular in social network analysis, but used in other contexts as well (e.g. Brain networks). • Has a “resolution” preference: for a given network, will tend to prefer clusters of a particular size. • Often this means the clusters are too big. • A good example of where a spectral clustering technique can work. ...
View Full Document

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Ask a homework question - tutors are online