This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Modularity
CMSC 858L Moduledetection for Function Prediction • Biological networks generally modular
(Hartwell+, 1999) • We can try to ﬁnd the modules within a network. • Once we ﬁnd modules, we can look at overrepresented
functions within a module, e.g.: • If a majority of the proteins within a module have annotation
A, predict annotation A for the other proteins in the module. ⇒ Graph clustering methods  Min Multiway Cut, Graph Summarization, VICut: examples
we’ve already seen.  Methods often borrowed from other “community detection”
applications. Modularity Modularity
eii = % edges in module i eii={(u,v) : u ∈ Vi, v ∈ Vi, (u,v)∈E} / E 2
i ai = % edges with at least 1
end in module i
ai={(u,v) : u ∈ Vi, (u,v) ∈ E} / E Modularity is: Q= k
i=1 probability a random
edge would fall into
module i eii − 2
ai probability edge
is in module i 3 High modularity ⇒ more edges
within the module that you expect
by chance. Examples
28
5 22 4 27 3 6 23 16 7 19 8 12 21 13
14
24 2 29 9 9 17 10 3 5
1 10 20 1 25 30
11 2 7 26
8 Communities Assigned
to a small graph
Note: maximizing
modularity will ﬁnd it’s
own # of clusters 6
15 4 18 Communities assigned to
a random graph Modularity Algorithm #1 • Modularity is NPhard to optimize (Brandes, 2007) • Greedy Heuristic: (Newman, 2003)  C = trivial clustering with each node in its own cluster
Repeat:
• Merge the two clusters that will increase the modularity
by the largest amount • Stop when all merges would reduce the modularity. Karate Club (again)
NewmanGirvan, 2004 Only 3 is in the “wrong”
community. Maximizing Modularity via a
Spectral Technique Another View of Modularity
adjacency
matrix normalization 1
Q=
4m
i,j ki kj
Aij −
2m in same
module probability a random
edge would go
between i and j m = # edges in graph
ki = degree(i) Consider the case of only 2 modules.
Let si = 1 if node i is in module 1; 1 if node i is in module 2 Q=
= 1
4m
1
4m
i,j
i,j ki kj
Aij −
2m
ki kj
Aij −
2m
(si sj + 1)
si sj Goal: Maximize modularity • Try to ﬁnd ±1 vector s that maximizes the
modularity. • Start with the case above: only two groups. • Then show how to extend to ≥ 2 groups. • Will use some ideas from linear algebra. 1
4m Q=
i,j ki kj
Aij −
2m 1T
s Bs
4m = “modularity”
matrix si sj s is a {1,1}
membership
vector Let ui (i = 1,...,n) be the eigenvectors of matrix B with eigenvalue
βi for vector ui. (Assume β1 ≥ β2 ≥ β3 ≥ β4 ≥ ... ≥ βn)
Write s as: s=
i where: ai ui ai = T
ui s s= ai = ai ui T
ui s i drop the (1/4m) 1T
Q=
s Bs
4m
T
aj uj =
ai ui B j i =
i =
aj uj ai uT B i
i j ai aj uT Buj
i j Note:
1. Buj = βi uj
2. When i ≠ j, uiTBuj = 0 because ui ⊥ uj Q=
i T2
(ui s) βi To Maximize Q
Q= T2
(ui s) βi i • If we were allowed to choose any s we’d pick the one
that is parallel to u1. • But: si must be +1 or 1.
This is a severe restriction. •
• So: maximize u1⋅s, the projection of s along vector u1.
To do this: choose si = 1 if u1 > 0, and si = 1 if u1 ≤ 0. Subsequent Splits
The modularity if this module
was split according to s The modularity of
module g as it stands now
1
1
Bij si sj +
Bij −
Bij
=
2 i,j ∈g
2 i,j ∈g
i,j ∈g i,j ∈g Bij = i,j ∈g si sj δi,j g Bik k∈g +1 1 Karate Club Results: Exactly Right (Newman, 2006) Greedy Improvement •
• Given a partition of the network
Repeat: largest increase
might be negative • Find the vertex that would yield the largest modularity
increase if it were moved into a different community
AND that has not yet been moved
Move the vertex into that new community Return the best partitioning ever observed Similar to the KernighanLin
graph partitioning heuristic
(details in a few slides) Additional Results
GirvanNewman
(betweenness) Newman
Spectral Greedy
Hierarchical
Newman, 2006 Krebs Political Books Nodes = political books; shape =
conservative (squares) / liberal
(circles) / “centrist” (triangles) Edges = books frequently bought
by the same readers on
Amazon.com Complexes “+” indicates parameters
tuned to maximize precision Biological Processes All GS predictions are Pareto optimal
Many unique predictions made by each algorithm % Modules Enriched
A lower % of GS modules are enriched for some annotation, but not indicative of predictive performance.
“Easy” to get legitimate statistical signiBicant enrichment. Summary: Modularity • Modularity is widely used as a measure for how good
a clustering is. • Particularly popular in social network analysis, but
used in other contexts as well (e.g. Brain networks). • Has a “resolution” preference: for a given network,
will tend to prefer clusters of a particular size. • Often this means the clusters are too big. • A good example of where a spectral clustering
technique can work. ...
View
Full
Document
This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.
 Fall '07
 staff

Click to edit the document details