Unformatted text preview: Enabling Rapid Development and Execu5on of Advanced Graph
Analysis Algorithms on Very Large Graphs Aydin Buluc, LBL ([email protected]) John Gilbert and Adam Lugowski, UCSB ({gilbert,alugowski}@cs.ucsb.edu) Steve Reinhardt, MicrosoO ([email protected]) With ideas from Dave Wecker and Zheng Zhang, MicrosoO Research Jim Harrell, Cray, Inc. Viral Shah, formerly UCSB Technically Architecturally Knowledge Discovery Toolbox (KDT) embodies two key innova5ons:
Technically, non
graph
expert subject
maXer experts analyze terascale graphs with mul5ple advanced algorithms with leading performance
Architecturally, graph algorithm users, graph algorithm developers, and graph infrastructure developers each use complementary interfaces to advance the ﬁeld Agenda • APIs for diﬀerent audiences • Seman5c and hyper
graphs • Implementa5on / performance KNOWLEDGE DISCOVERY WORKFLOW 1. Cull relevant data 2. Build input graph 3. Analyze input graph 4. Visualize result graph memory
Gene
Email
TwiXer
Facebook
Video
Sensor
Web
… DryadLINQ, StreamInsight KDT ?? Agenda • APIs for diﬀerent audiences • Seman5c and hyper
graphs • Implementa5on / performance KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally • Fosters earlier use and learning about how algorithms work at scale Graph
algorithm users develop applica5ons based on a set of complex graph algorithm implemented by experts Graph
algorithm developers develop algorithms for a growing set of users through an evolving set of interfaces, based on powerful infrastructure Graph
infrastructure developers develop new implementa5ons of the KDT interfaces for diﬀerent hardware or soOware plakorms centrality(‘exactBC’) centrality(‘approxBC’) DiGraph Graph500 HyGraph CombBLAS pageRank cluster(‘Markov’) cluster(‘spectral’) SpParMat (Sp)ParVec KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally Graph
algorithm users Graph
algorithm developers Graph
infrastructure developers # Graph500.py
deg3verts = (G.degree() > 2).findInds()
deg3verts.randPerm()
starts = deg3verts[kdt.ParVec.range(nstarts)]
centrality(‘
cluster(‘Markov’) G.toBool() exactBC’) Graph500 pageRank cluster(‘spectral’) centrality(‘approxBC’) [origI, ign, ign2] = G.toParVec()
for start in starts:
parents = G.bfsTree(start, sym=True)
nedges = len((parents[origI] != 1).find())
if not k2Validate(G, start, parents):
DiGraph verifyResult HyGraph = "FAILED" SpParMat (Sp)ParVec CombBLAS KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally Graph
algorithm users Graph
algorithm developers Graph
infrastructure developers cluster(‘Markov’) Graph500 […]
cluster(‘spectral’) L = G.toSpParMat()
cluster(‘barycent’) pageRank centrality(‘exactBC’) centrality(‘approxBC’) d = L.sum(kdt.SpParMat.Column)
L = L
L.setDiag(d)
M = kdt.SpParMat.eye(G.nvert()) SpParMat – mu*L
DiGraph HyGraph pos = kdt.ParVec.rand(G.nvert())
for i in range(nsteps):
pos = M.SpMV(pos) CombBLAS (Sp)ParVec KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally Graph
algorithm users Graph
algorithm developers Graph
infrastructure developers # community detection due to Botherel and Bouklit
import kdtxmt
[…]
Q = kdt.ParVec.zeros(G.nedge())
for i in range(G.nedge()):
bc = kdtxmt.centrality(G,‘approxBC’,’edge’)
centrality(‘exactBC’) (bc.maxndx()[1])
cluster(‘Markov’) centrality(‘exactBC’) G.delete_edge
Graph500 pageRank cluster(‘spectral’) centrality(‘approxBC’) centrality(‘approxBC’) p = G.cluster()
Q[i] = G.modularity(p)
best = Q.max()
DiGraph HyGraph // SWIG headers for kdtxmt.py
[…]
INCLUDE “pyCentrality.h”
MTGL/XMT CombBLAS SpParMat (Sp)ParVec KDT’s Graph API (v0.1) • Targeted at non
graph
expert domain experts • Exposed via Python Technically Architecturally Real applicaIons Community Detec5on Network Vulnerability Analysis Applets centrality(‘exactBC’) centrality(‘approxBC’) Building blocks Graph500 DiGraph pageRank (Sp)ParVec bfsTree, isBfsTree (e.g., +,*,,&,>,==,, abs,max,sum,range, norm, randPerm, scale, topK) plus u5lity (e.g., DiGraph,nvert, toParVec,degree,load,UFget,+,*, sum,subgraph,reverseEdges) CombBLAS SpMV_SemiRing, SpMM_SemiRing KDT’s Graph API (v0.2) Technically Architecturally Real applicaIons Community Detec5on Network Vulnerability Analysis Applets centrality(‘exactBC’) centrality(‘approxBC’) Building blocks DiGraph bfsTree, isBfsTree Graph500 pageRank HyGraph bfsTree, isBfsTree plus u5lity (e.g., DiGraph,nvert, plus u5lity (e.g., HyGraph,nvert, toParVec,degree,load,UFget,+,*, toParVec,degree,load,UFget) sum,subgraph,reverseEdges) CombBLAS SpMV_SemiRing, SpMM_SemiRing cluster(‘Markov’) cluster(‘spectral’) SpParMat (e.g., +,*, SpMM, SpMV, SpMM_SemiRing, (Sp)ParVec (e.g., +,*,,&,>,==,, abs,max,sum,range, norm, randPerm, scale, topK) Agenda • APIs for diﬀerent audiences • Seman5c and hyper
graphs • Implementa5on / performance Technically Seman5c
graph API: Mul5ple Criteria Architecturally Customiz
Level of ability abstrac5on Performance CombBLAS PBGL
Atypical abstrac5ons + Sustainably scalable performance
Abstrac5ons low
level for domain experts + Scalable performance KDT v0.2 goal Technically Seman5c Graph Use Case Architecturally • Vertex types: Person, SmartPhone, Camera • Edge types: PhoneCall, TextMessage, PhysicalPresence • Edge StartTime, EndTime: • Calculate betweenness centrality just for PhoneCalls and TextMessages between People occurring between 5mes sTime and eTime Technically Approach 1: Known Good Performance Architecturally def vfilter(self, wantedVTypes):
return kdt.in(wantedVTypes, self.type)
def efilter(self, wantedETypes, sTime, eTime):
return kdt.and(kdt.in(wantedETypes, self.type),
kdt.and(kdt.gt(sTime, self.sTime),
kdt.lt(eTime, self.eTime)))
wantedVTypes = (People)
wantedETypes = (PhoneCall, TextMessage)
bc = Gtmp.centrality(‘approxBC’,filter=(vfilter,efilter)) Technically Approach 2: Highly Flexible, Currently Bad Performance Architecturally def vfilter(self, wantedVTypes):
# any Python constructs permitted
return self.type in wantedVTypes
def efilter(self, wantedETypes, sTime, eTime):
return (self.type in wantedETypes)
and (sTime > self.sTime)
and (eTime < self.eTime)
wantedVTypes = (People)
wantedETypes = (PhoneCall, TextMessage)
bc = G.centrality(‘approxBC’,filter=(vfilter,efilter)) Technically Approach 3: Likely Good Performance, but Poten5ally Memory
Expensive Architecturally def vfilter(self, wantedVTypes):
return self.type in wantedVTypes
def efilter(self, wantedETypes, sTime, eTime):
return (self.type in wantedETypes)
and (sTime > self.sTime)
and (eTime < self.eTime)
wantedVTypes = (People)
wantedETypes = (PhoneCall, TextMessage)
Gtmp = G.subgraph(filter=(vfilter,efilter))
bc = Gtmp.centrality(‘approxBC’) Technically Hypergraph Support Architecturally • The underlying sparse matrix is interpreted as an incidence matrix; ver5ces are in columns, edges in rows • (Subset of) same methods implemented • Graph500 Kernel 2 looks iden5cal except valida5on • Performance not yet measured for big cases, but expected to take twice as long as same DiGraph method – Two SpMVs in the core loop instead of one – TEPS ra5ng the same Technically bfsTree DiGraph HyGraph Architecturally def bfsTree(self, root, sym=False): def bfsTree(self, root): if not sym: self._T() parents = pcb.pyDenseParVec(self.nvert(),
1) parents = pcb.pyDenseParVec(self.nvert(),
1) fringe = pcb.pySpParVec(self.nvert()) fringeV = pcb.pySpParVec(self.nvert()) parents[root] = root parents[root] = root fringe[root] = root fringeV[root] = root while fringe.getnee() > 0: while fringeV.getnee() > 0: fringe.setNumToInd() fringeV.setNumToInd() self._spm.SpMV_SelMax_inplace(fringe) fringeE = self._spm.SpMV_SelMax(fringeV) fringeV = self._spmT.SpMV_SelMax(fringeE) pcb.EWiseMult_inplaceﬁrst(fringe, parents, True,
1) pcb.EWiseMult_inplaceﬁrst(fringeV, parents, True,
1) parents[fringe] = 0 parents[fringeV] = 0 parents += fringe parents += fringeV if not sym: self._T() return ParVec.toParVec(parents) return ParVec.toParVec(parents) Technically Ques5ons about Hypergraph Support Architecturally • We have deﬁned a BFS tree of a hypergraph as a set of simple edges, each contained in a hyperedge (which permits cycles of hyperedges). Is this the most useful deﬁni5on? • Are hypergraphs in the KDT style useful? What use cases should we target? What methods should we provide? Root Agenda • APIs for diﬀerent audiences • Seman5c and hyper
graphs • Implementa5on / performance Key DiGraph Methods in KDT v0.1/v0.2 Technically Architecturally def pageRank(self, epsilon=0.1, dampingFactor=0.85):
def centrality(self, alg, **kwargs):
‘exactBC’,normalize=True
‘approxBC’, sample=0.05, normalize=True
def cluster(self, alg, **kwargs):
‘Markov’
‘spectral’ class Graph: #base class only class DiGraph:
class ParVec:
class SpParVec:
class SpParMat: def bfsTree(self, root, sym=False):
def isBfsTree(self, root, parents, sym=False):
def neighbors(self, source, nhop=1, sym=False):
def pathsHop(self, source, sym=False):
def degree(self, dir=gr.Out):
def genGraph500Edges(self, scale):
def load(fname):
def UFget(fname):
def max(self, dir):
def reverseEdges(self):
def scale(self, other, dir=gr.Out):
def sum(self, dir):
def DiGraph(sourceV, destV, weight, nvert):
def toParVec(self):
def toBool(self):
def normalizeEdgeWeights(self): def sendFeedback():
# may want to disable this Key HyGraph Methods in KDT v0.2 Technically Architecturally def pageRank(self, epsilon=0.1, dampingFactor=0.85):
def centrality(self, alg, **kwargs):
‘exactBC’,normalize=True
‘approxBC’, sample=0.05, normalize=True
def cluster(self, alg, **kwargs):
def bfsTree(self, root, sym=False):
def isBfsTree(self, root, parents):
def neighbors(self, source, nhop=1):
def pathsHop(self, source):
def degree(self, dir=gr.Out):
def genGraph500Edges(self, scale):
def load(fname):
def UFget(fname):
def max(self, dir):
def invertEdgesVertices(self):
def scale(self, other, dir=gr.Out):
def sum(self, dir):
def HyGraph(edgeNumV, incidentVertexV, weightV, nvert):
def toParVec(self):
def toBool(self):
def toDiGraph(self):
def normalizeEdgeWeights(self): Architecturally • – LBL/NERSC’s Hopper Cray XE6 • Scale 29 (“mini”) has 8B directed edges • Performance measured from Python 10 GTEPS Technically Graph500 Performance [Aydin Buluc] Excellent scaling up to 2 500 cores, good to 5K cores 9 8 7 6 5 4 3 2 1 0 scale 28 scale 29 scale 30 perfect 1225 cores 2500 cores 5041 cores Number of cores • On
node thread parallelism starts to show beneﬁt at 10K cores and above KDT development and licensing • KDT is a collabora5on among UCSB (John Gilbert et al), LBL (Aydin Buluc), and MicrosoO Technical Compu5ng • The resul5ng soOware is released under the New BSD license • v0.1 was released on March 17 • Tested on Linux x86 and Cray XT conﬁgura5ons • V0.2 release targeted for early June • The project homepage is kdt.sourceforge.net • Downloads, User Guide, FAQ and bug repor5ng Planned KDT v0.2 Content Windows HPC Server version Seman5c graphs Hypergraphs Clustering
Markov and spectral • Out
of
core (Dryad
based) version (likely v0.3) • Cray XMT version •
•
•
• – Discussing with Cray et al. • Version based on other graph infrastructures – E.g., Parallel Boost Graph Library, SNAP, Mul5Threaded Graph Library Technically Architecturally Knowledge Discovery Toolbox (KDT) embodies two key innova5ons:
Technically, non
graph
expert subject
maXer experts analyze terascale graphs with mul5ple advanced algorithms with leading performance
Architecturally, graph algorithm users, graph algorithm developers, and graph infrastructure developers each use complementary interfaces to advance the ﬁeld Backup Graphs
on
Disk Use Case Technically Architecturally Does graph analysis make sense on data that won’t all ﬁt in memory? memory KDT Graphs
on
Disk Use Case Technically Architecturally Does graph analysis make sense on data that won’t all ﬁt in memory? • The sparse
matrix
linear
algebra approach structures communica5on, so raw pointer
chasing performance not so important • People are building sparse
matrix packages on top of MapReduce/Hadoop • We will shortly map the KDT APIs onto a sparse
matrix package based on Dryad* • mInterface perhaps emory import kdtooc
[…]
G = kdtooc.load(‘mydata’)
G.bfsTree(…)
KDT *hXp://research.microsoO.com/en
us/projects/Dryad/ Technically Ques5ons about KDT
on
disk Support Architecturally • Assuming that in
memory processing is much faster than on
disk (10X?), what type of graph ops would be prac5cal for on
disk data? Just simple ops? Would something as compute
intensive as BC ever make sense out
of
core? • Is seman5c graph’s ﬁltering capability essen5al for on
disk processing? KDT Implementa5on on Combinatorial BLAS Technically Ecologically • Combinatorial BLAS • Built for combinatorial (sparse
matrix) problems • Not limited to simple directed graphs • Powers the func5onality and performance of KDT • Scales well to 2K
4K cores Real applicaIons Network Vulnerability Analysis Community Detec5on Applets centrality(‘exactBC’) centrality(‘approxBC’) Building blocks bfsTree, isBfsTree, neighbors, pathsHop SpMV_SemiRing, SpMM_SemiRing Graph500 pageRank DiGraph u5lity ParVec/SpParVec u5lity (e.g., DiGraph (from edges), (e.g., +,
,*,,&,>,==,,abs, range, nverts, degrees, +, *, toParVec, max, sum, norm, randPerm, topK) subgraph, reverseEdges, load) Sparse
matrix classes/ops/types (e.g., Apply, EWiseApply, Reduce) Technically Ecologically Example Implementa5on: bfsTree 2 1 4 5 7 1
3 6 from 1 to 7 AT 7 Technically Ecologically 2 1 4 5 7 1
3 6 from 1 7 1 1 to 1 7 AT X ATX Technically Ecologically 2 1 4 5 7 1
3 6 from 7 1 2 to 4 4 2 4 7 AT X ATX Technically Ecologically 2 1 4 5 7 1
3 6 from 7 1 3 to 5 5 7 7 AT X ATX Technically Ecologically 2 1 4 5 7 1
3 6 from 7 1 to 6 7 AT X ATX Technically bfsTree Implementa5on in KDT, for DiGraphs (Kernel 2 of Graph500) Ecologically def bfsTree(self, root, sym=False):
if not sym:
self.T()
# synonym for reverseEdges
parents = dg.ParVec(self.nvert(), 1)
fringe = dg.SpParVec(self.nvert())
parents[root] = root
fringe[root] = root
while fringe.nnn() > 0:
fringe.spRange()
self._spm.SpMV_SelMax_inplace(fringe._spv)
pcb.EWiseMult_inplacefirst(fringe._spv,
parents._dpv, True, 1)
parents[fringe] = fringe
if not sym:
self.T()
return parents • SpMV and EWiseMult are CombBLAS ops that do not yet have good graph abstrac5ons – pathsHop is an aXempt for one ﬂavor of SpMV Technically Ecologically pageRank Implementa5on in KDT (p. 1 of 2) def pageRank(self, epsilon = 0.1, dampingFactor = 0.85):
# We don't want to modify the user's graph.
G = self.copy()
nvert = G.nvert()
G._spm.removeSelfLoops()
# Handle sink nodes (nodes with no outgoing edges)
# connecting them to all other nodes.
degout = G.degree(gr.Out)
nonSinkNodes = degout.findInds()
nSinkNodes = nvert  len(nonSinkNodes)
iInd = ParVec(nSinkNodes*(nvert))
jInd = ParVec(nSinkNodes*(nvert))
wInd = ParVec(nSinkNodes*(nvert), 1)
sinkSuppInd = 0
for ind in range(nvert):
if degout[ind] == 0:
# Connect to all nodes.
for sInd in range(nvert):
iInd[sinkSuppInd] = sInd
jInd[sinkSuppInd] = ind
sinkSuppInd = sinkSuppInd + 1
sinkMat = pcb.pySpParMat(nvert, nvert,
iInd._dpv, jInd._dpv,
wInd._dpv)
sinkG = DiGraph()
sinkG._spm = sinkMat • This por5on looks more like graph opera5ons by Technically Ecologically pageRank Implementa5on in KDT (p. 2 of 2) (main loop) G.normalizeEdgeWeights()
sinkG.normalizeEdgeWeights()
# PageRank loop
delta = 1
dv1 = ParVec(nvert, 1./nvert)
v1 = dv1.toSpParVec()
prevV = SpParVec(nvert)
dampingVec = SpParVec.ones(nvert) *
((1  dampingFactor)/nvert)
while delta > epsilon:
prevV = v1.copy()
v2 = G._spm.SpMV_PlusTimes(v1._spv) + \
sinkG._spm.SpMV_PlusTimes(v1._spv)
v1._spv = v2
v1 = v1*dampingFactor + dampingVec
delta = (v1  prevV)._spv.Reduce(pcb.plus(),
pcb.abs())
return v1 • This por5on looks much more like matrix algebra Technically Ecologically Graph500 Implementa5on in KDT (p. 1 of 2) scale = 15
nstarts = 640
GRAPH500 = 1
if GRAPH500 == 1:
G = dg.DiGraph()
K1elapsed = G.genGraph500Edges(scale)
if nstarts > G.nvert():
nstarts = G.nvert()
deg3verts = (G.degree() > 2).findInds()
deg3verts.randPerm()
starts = deg3verts[dg.ParVec.range(nstarts)]
G.toBool()
K2elapsed = 1e12
K2edges = 0
for start in starts:
start = int(start)
if start==0:
#HACK: avoid root==0 bugs for now
continue
before = time.time()
parents = G.bfsTree(start, sym=True)
K2elapsed += time.time()  before
if not k2Validate(G, start, parents):
print "Invalid BFS tree generated by bfsTree"
print G, parents
break
[origI, origJ, ign] = G.toParVec()
K2edges += len((parents[origI] != 1).find()) Graph500 Implementa5on in KDT (p. 2 of 2) def k2Validate(G, start, parents):
ret = True
Technically bfsRet = G.isBfsTree(start, parents)
if type(ret) != tuple:
Ecologically if dg.master():
print "isBfsTree detected failure of Graph500 test %d" % abs(ret)
return False
(valid, levels) = bfsRet
# Spec test #3:
[origI, origJ, ign] = G.toParVec()
li = levels[origI]
lj = levels[origJ]
if not ((abs(lilj) <= 1)  ((li==1) & (lj==1))).all():
if dg.master():
print "At least one graph edge has endpoints whose levels differ by
more than one and is in the BFS tree"
print li, lj
ret = False # Spec test #4:
neither_in = (li == 1) & (lj == 1)
both_in = (li > 1) & (lj > 1)
out2root = (li == 1) & (origJ == start)
if not (neither_in  both_in  out2root).all():
if dg.master():
print "The tree does not span the connected component exactly, root=%d" %
start
ret = False
# Spec test #5:
respects = abs(lilj) <= 1
if not (neither_in  respects).all():
if dg.master():
print "At least one vertex and its parent are not joined by an
original edge"
ret = False
return ret ...
View
Full Document
 Fall '09
 GILBERT
 def, fringe, Mos Def, nstarts, Technically00

Click to edit the document details