Lecture07-2-Out-of-Core-Cocain - Out-of-Core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases Jianyong Wang Department of Computer

Lecture07-2-Out-of-Core-Cocain - Out-of-Core Coherent...

This preview shows page 1 - 11 out of 82 pages.

1 2009/11/19 Out-of-Core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases Jianyong Wang Department of Computer Science and Technology Tsinghua University, Beijing, P.R. China Email: [email protected] Joint work with Zhiping Zeng, Lizhu Zhou, and George Karypis
2 2009/11/19 Case study 1: an initial solution - Frequent Closed Clique Mining The CLAN algorithm CLAN stands for Frequent closed CL ique p A tter N mining Case Study 2: a more complex solution - Frequent Coherent Closed Quasi-Clique Mining The Cocain algorithm Cocain stands for Co herent c losed qu a si-cl i que mi n ing Case Study 3: an out-of-core solution - Out-of-core Coherent Closed Quasi-Clique Mining The Cocain* algorithm Frequent Coherent Closed Subgraph Mining Case Studies
3 2009/11/19 Part 1: Closed Clique Mining
4 2009/11/19 CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases Jianyong Wang 1 , Zhiping Zeng 2 , Lizhu Zhou 3 { 1 jianyong, 3 dcszlz}@tsinghua.edu.cn 2 [email protected] Department of Computer Science and Technology Tsinghua University, Beijing, P.R. China Proc. 2006 IEEE Int. Conf. on Data Engineering. (ICDE'06)
5 2009/11/19 Outline Problem definition and motivation - Problem definition - Motivation The CLAN solution - Canonical form of a clique - Efficient clique enumeration Low-degree vertex pruning Structural redundancy pruning - Closed clique discovery Clique closure checking scheme Non-closed prefix pruning - Integrated algorithm Empirical results Conclusions
6 2009/11/19 Problem Formulation
7 2009/11/19 Preliminaries Input database D: a set of undirected labeled input graphs. Undirected labeled input graph G G={V, E, L V , F V } V: the set of vertices E: the set of edges, L V : the set of vertex labels Cardinality of graph G: |G|= |V| Note: in this work, we do not consider the edge labels. V V E V V L V F :
8 2009/11/19 Preliminaries Induced subgraph: - An induced subgraph is a subset of the vertices of a graph together with any edges whose endpoints are both in this subset. Examples: - V(G 2 )={v 1 ,v 2 ,v 3 ,v 4 ,v 5 ,v 6 } - E(G 2 )={(v1,v2), } - L v (G 2 )={a, b, c, d, e} - Card(G 2 )=6 u 4 u 3 u 5 Induced Subgraph of G 1 c d b
9 2009/11/19 Preliminaries Clique: a clique C is a fully connected subgraph Clique Isomorphism : - A clique C 1 ={V 1 , L 1 , F 1 } is isomorphic to another clique C 2 ={V 2 , L 2 , F 2 } iff |V 1 |=| V 2 | and there exists a bijection f: V 1 V 2 such that Subclique and Superclique : - If a clique C is isomorphic to a subgraph of another clique C , C is called a subclique of C , while C is called a superclique of C. We use C C or C C (C C but C≠C ) to denote the subclique or proper subclique relationship. )) ( ( ) ( , 2 1 1 v f F v F V v
10 2009/11/19 Preliminaries Embedding - If a fully connected subgraph h of a graph G is isomorphic to a clique C, we call h an embedding of C in G.