1
2009/11/19
OutofCore Coherent Closed QuasiClique
Mining from Large Dense Graph Databases
Jianyong Wang
Department of Computer Science and Technology
Tsinghua University, Beijing, P.R. China
Email: [email protected]
—
Joint work with Zhiping Zeng, Lizhu Zhou, and George Karypis
2
2009/11/19
Case study 1: an initial solution

Frequent Closed Clique Mining
The CLAN algorithm
CLAN
stands for Frequent closed
CL
ique p
A
tter
N
mining
Case Study 2: a more complex solution

Frequent Coherent Closed QuasiClique Mining
The Cocain algorithm
Cocain
stands for
Co
herent
c
losed qu
a
sicl
i
que mi
n
ing
Case Study 3: an outofcore solution

Outofcore Coherent Closed QuasiClique Mining
The Cocain* algorithm
Frequent Coherent Closed Subgraph Mining
—
Case Studies
3
2009/11/19
Part
1:
Closed Clique Mining
4
2009/11/19
CLAN: An Algorithm for Mining Closed
Cliques from Large Dense Graph Databases
Jianyong Wang
1
, Zhiping Zeng
2
, Lizhu Zhou
3
{
1
jianyong,
3
dcszlz}@tsinghua.edu.cn
2
[email protected]
Department of Computer Science and Technology
Tsinghua University, Beijing, P.R. China
—
Proc. 2006 IEEE Int. Conf. on Data Engineering. (ICDE'06)
5
2009/11/19
Outline
Problem definition and motivation

Problem definition

Motivation
The CLAN solution

Canonical form of a clique

Efficient clique enumeration
Lowdegree vertex pruning
Structural redundancy pruning

Closed clique discovery
Clique closure checking scheme
Nonclosed prefix pruning

Integrated algorithm
Empirical results
Conclusions
6
2009/11/19
Problem Formulation
7
2009/11/19
Preliminaries
Input database D: a set of undirected labeled input graphs.
Undirected labeled input graph G
G={V, E, L
V
, F
V
}
V: the set of vertices
E: the set of edges,
L
V
: the set of vertex labels
Cardinality of graph G:
G= V
Note: in this work, we do not consider the edge labels.
V
V
E
V
V
L
V
F
:
8
2009/11/19
Preliminaries
Induced subgraph:

An
induced subgraph
is a subset of the vertices of a graph together
with any edges whose endpoints are both in this subset.
Examples:

V(G
2
)={v
1
,v
2
,v
3
,v
4
,v
5
,v
6
}

E(G
2
)={(v1,v2),
…
}

L
v
(G
2
)={a, b, c, d, e}

Card(G
2
)=6
u
4
u
3
u
5
Induced Subgraph of G
1
c
d
b
9
2009/11/19
Preliminaries
Clique: a clique C is a fully connected subgraph
Clique Isomorphism
:

A clique C
1
={V
1
, L
1
, F
1
} is
isomorphic
to another clique C
2
={V
2
, L
2
, F
2
} iff
V
1
= V
2
 and there exists a bijection f: V
1
V
2
such that
Subclique and Superclique
:

If a clique C is isomorphic to a subgraph of another clique C
’
,
C is called a
subclique
of C
’
, while C
’
is called a
superclique
of C. We use C
C
’
or C
C
’
(C
C
’
but C≠C
’
) to denote
the subclique
or proper subclique relationship.
))
(
(
)
(
,
2
1
1
v
f
F
v
F
V
v
10
2009/11/19
Preliminaries
Embedding

If a fully connected subgraph
h
of a graph G is isomorphic to a clique C, we call
h
an
embedding
of C in G.