Reinhardt_KDT_11apr26

Reinhardt_KDT_11apr26 - Enabling Rapid Development and...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Enabling Rapid Development and Execu5on of Advanced Graph ­Analysis Algorithms on Very Large Graphs Aydin Buluc, LBL ([email protected]) John Gilbert and Adam Lugowski, UCSB ({gilbert,[email protected]) Steve Reinhardt, MicrosoO ([email protected]) With ideas from Dave Wecker and Zheng Zhang, MicrosoO Research Jim Harrell, Cray, Inc. Viral Shah, formerly UCSB Technically Architecturally Knowledge Discovery Toolbox (KDT) embodies two key innova5ons:  ­ Technically, non ­graph ­expert subject ­maXer experts analyze terascale graphs with mul5ple advanced algorithms with leading performance  ­ Architecturally, graph algorithm users, graph algorithm developers, and graph infrastructure developers each use complementary interfaces to advance the field Agenda •  APIs for different audiences •  Seman5c and hyper ­graphs •  Implementa5on / performance KNOWLEDGE DISCOVERY WORKFLOW 1. Cull relevant data 2. Build input graph 3. Analyze input graph 4. Visualize result graph memory  ­ Gene  ­ Email  ­ TwiXer  ­ Facebook  ­ Video  ­ Sensor  ­ Web  ­ … DryadLINQ, StreamInsight KDT ?? Agenda •  APIs for different audiences •  Seman5c and hyper ­graphs •  Implementa5on / performance KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally •  Fosters earlier use and learning about how algorithms work at scale Graph ­algorithm users develop applica5ons based on a set of complex graph algorithm implemented by experts Graph ­algorithm developers develop algorithms for a growing set of users through an evolving set of interfaces, based on powerful infrastructure Graph ­infrastructure developers develop new implementa5ons of the KDT interfaces for different hardware or soOware plakorms centrality(‘exactBC’) centrality(‘approxBC’) DiGraph Graph500 HyGraph CombBLAS pageRank cluster(‘Markov’) cluster(‘spectral’) SpParMat (Sp)ParVec KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally Graph ­algorithm users Graph ­algorithm developers Graph ­infrastructure developers # Graph500.py deg3verts = (G.degree() > 2).findInds() deg3verts.randPerm() starts = deg3verts[kdt.ParVec.range(nstarts)] centrality(‘ cluster(‘Markov’) G.toBool() exactBC’) Graph500 pageRank cluster(‘spectral’) centrality(‘approxBC’) [origI, ign, ign2] = G.toParVec() for start in starts: parents = G.bfsTree(start, sym=True) nedges = len((parents[origI] != -1).find()) if not k2Validate(G, start, parents): DiGraph verifyResult HyGraph = "FAILED" SpParMat (Sp)ParVec CombBLAS KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally Graph ­algorithm users Graph ­algorithm developers Graph ­infrastructure developers cluster(‘Markov’) Graph500 […] cluster(‘spectral’) L = G.toSpParMat() cluster(‘barycent’) pageRank centrality(‘exactBC’) centrality(‘approxBC’) d = L.sum(kdt.SpParMat.Column) L = -L L.setDiag(d) M = kdt.SpParMat.eye(G.nvert()) SpParMat – mu*L DiGraph HyGraph pos = kdt.ParVec.rand(G.nvert()) for i in range(nsteps): pos = M.SpMV(pos) CombBLAS (Sp)ParVec KDT APIs enable disparate groups’ work to reinforce each other Technically Architecturally Graph ­algorithm users Graph ­algorithm developers Graph ­infrastructure developers # community detection due to Botherel and Bouklit import kdtxmt […] Q = kdt.ParVec.zeros(G.nedge()) for i in range(G.nedge()): bc = kdtxmt.centrality(G,‘approxBC’,’edge’) centrality(‘exactBC’) (bc.maxndx()[1]) cluster(‘Markov’) centrality(‘exactBC’) G.delete_edge Graph500 pageRank cluster(‘spectral’) centrality(‘approxBC’) centrality(‘approxBC’) p = G.cluster() Q[i] = G.modularity(p) best = Q.max() DiGraph HyGraph // SWIG headers for kdtxmt.py […] INCLUDE “pyCentrality.h” MTGL/XMT CombBLAS SpParMat (Sp)ParVec KDT’s Graph API (v0.1) •  Targeted at non ­graph ­expert domain experts •  Exposed via Python Technically Architecturally Real applicaIons Community Detec5on Network Vulnerability Analysis Applets centrality(‘exactBC’) centrality(‘approxBC’) Building blocks Graph500 DiGraph pageRank (Sp)ParVec bfsTree, isBfsTree (e.g., +,*,|,&,>,==,, abs,max,sum,range, norm, randPerm, scale, topK) plus u5lity (e.g., DiGraph,nvert, toParVec,degree,load,UFget,+,*, sum,subgraph,reverseEdges) CombBLAS SpMV_SemiRing, SpMM_SemiRing KDT’s Graph API (v0.2) Technically Architecturally Real applicaIons Community Detec5on Network Vulnerability Analysis Applets centrality(‘exactBC’) centrality(‘approxBC’) Building blocks DiGraph bfsTree, isBfsTree Graph500 pageRank HyGraph bfsTree, isBfsTree plus u5lity (e.g., DiGraph,nvert, plus u5lity (e.g., HyGraph,nvert, toParVec,degree,load,UFget,+,*, toParVec,degree,load,UFget) sum,subgraph,reverseEdges) CombBLAS SpMV_SemiRing, SpMM_SemiRing cluster(‘Markov’) cluster(‘spectral’) SpParMat (e.g., +,*, SpMM, SpMV, SpMM_SemiRing, (Sp)ParVec (e.g., +,*,|,&,>,==,, abs,max,sum,range, norm, randPerm, scale, topK) Agenda •  APIs for different audiences • Seman5c and hyper ­graphs •  Implementa5on / performance Technically Seman5c ­graph API: Mul5ple Criteria Architecturally Customiz ­ Level of ability abstrac5on Performance CombBLAS PBGL  ­ Atypical abstrac5ons + Sustainably scalable performance  ­ Abstrac5ons low ­level for domain experts + Scalable performance KDT v0.2 goal Technically Seman5c Graph Use Case Architecturally •  Vertex types: Person, SmartPhone, Camera •  Edge types: PhoneCall, TextMessage, PhysicalPresence •  Edge StartTime, EndTime: •  Calculate betweenness centrality just for PhoneCalls and TextMessages between People occurring between 5mes sTime and eTime Technically Approach 1: Known Good Performance Architecturally def vfilter(self, wantedVTypes): return kdt.in(wantedVTypes, self.type) def efilter(self, wantedETypes, sTime, eTime): return kdt.and(kdt.in(wantedETypes, self.type), kdt.and(kdt.gt(sTime, self.sTime), kdt.lt(eTime, self.eTime))) wantedVTypes = (People) wantedETypes = (PhoneCall, TextMessage) bc = Gtmp.centrality(‘approxBC’,filter=(vfilter,efilter)) Technically Approach 2: Highly Flexible, Currently Bad Performance Architecturally def vfilter(self, wantedVTypes): # any Python constructs permitted return self.type in wantedVTypes def efilter(self, wantedETypes, sTime, eTime): return (self.type in wantedETypes) and (sTime > self.sTime) and (eTime < self.eTime) wantedVTypes = (People) wantedETypes = (PhoneCall, TextMessage) bc = G.centrality(‘approxBC’,filter=(vfilter,efilter)) Technically Approach 3: Likely Good Performance, but Poten5ally Memory ­Expensive Architecturally def vfilter(self, wantedVTypes): return self.type in wantedVTypes def efilter(self, wantedETypes, sTime, eTime): return (self.type in wantedETypes) and (sTime > self.sTime) and (eTime < self.eTime) wantedVTypes = (People) wantedETypes = (PhoneCall, TextMessage) Gtmp = G.subgraph(filter=(vfilter,efilter)) bc = Gtmp.centrality(‘approxBC’) Technically Hypergraph Support Architecturally •  The underlying sparse matrix is interpreted as an incidence matrix; ver5ces are in columns, edges in rows •  (Subset of) same methods implemented •  Graph500 Kernel 2 looks iden5cal except valida5on •  Performance not yet measured for big cases, but expected to take twice as long as same DiGraph method –  Two SpMVs in the core loop instead of one –  TEPS ra5ng the same Technically bfsTree DiGraph HyGraph Architecturally def bfsTree(self, root, sym=False): def bfsTree(self, root): if not sym: self._T() parents = pcb.pyDenseParVec(self.nvert(),  ­1) parents = pcb.pyDenseParVec(self.nvert(),  ­1) fringe = pcb.pySpParVec(self.nvert()) fringeV = pcb.pySpParVec(self.nvert()) parents[root] = root parents[root] = root fringe[root] = root fringeV[root] = root while fringe.getnee() > 0: while fringeV.getnee() > 0: fringe.setNumToInd() fringeV.setNumToInd() self._spm.SpMV_SelMax_inplace(fringe) fringeE = self._spm.SpMV_SelMax(fringeV) fringeV = self._spmT.SpMV_SelMax(fringeE) pcb.EWiseMult_inplacefirst(fringe, parents, True,  ­1) pcb.EWiseMult_inplacefirst(fringeV, parents, True,  ­1) parents[fringe] = 0 parents[fringeV] = 0 parents += fringe parents += fringeV if not sym: self._T() return ParVec.toParVec(parents) return ParVec.toParVec(parents) Technically Ques5ons about Hypergraph Support Architecturally •  We have defined a BFS tree of a hypergraph as a set of simple edges, each contained in a hyperedge (which permits cycles of hyperedges). Is this the most useful defini5on? •  Are hypergraphs in the KDT style useful? What use cases should we target? What methods should we provide? Root Agenda •  APIs for different audiences •  Seman5c and hyper ­graphs •  Implementa5on / performance Key DiGraph Methods in KDT v0.1/v0.2 Technically Architecturally def pageRank(self, epsilon=0.1, dampingFactor=0.85): def centrality(self, alg, **kwargs): ‘exactBC’,normalize=True ‘approxBC’, sample=0.05, normalize=True def cluster(self, alg, **kwargs): ‘Markov’ ‘spectral’ class Graph: #base class only class DiGraph: class ParVec: class SpParVec: class SpParMat: def bfsTree(self, root, sym=False): def isBfsTree(self, root, parents, sym=False): def neighbors(self, source, nhop=1, sym=False): def pathsHop(self, source, sym=False): def degree(self, dir=gr.Out): def genGraph500Edges(self, scale): def load(fname): def UFget(fname): def max(self, dir): def reverseEdges(self): def scale(self, other, dir=gr.Out): def sum(self, dir): def DiGraph(sourceV, destV, weight, nvert): def toParVec(self): def toBool(self): def normalizeEdgeWeights(self): def sendFeedback(): # may want to disable this Key HyGraph Methods in KDT v0.2 Technically Architecturally def pageRank(self, epsilon=0.1, dampingFactor=0.85): def centrality(self, alg, **kwargs): ‘exactBC’,normalize=True ‘approxBC’, sample=0.05, normalize=True def cluster(self, alg, **kwargs): def bfsTree(self, root, sym=False): def isBfsTree(self, root, parents): def neighbors(self, source, nhop=1): def pathsHop(self, source): def degree(self, dir=gr.Out): def genGraph500Edges(self, scale): def load(fname): def UFget(fname): def max(self, dir): def invertEdgesVertices(self): def scale(self, other, dir=gr.Out): def sum(self, dir): def HyGraph(edgeNumV, incidentVertexV, weightV, nvert): def toParVec(self): def toBool(self): def toDiGraph(self): def normalizeEdgeWeights(self): Architecturally •  –  LBL/NERSC’s Hopper Cray XE6 •  Scale 29 (“mini”) has 8B directed edges •  Performance measured from Python 10 GTEPS Technically Graph500 Performance [Aydin Buluc] Excellent scaling up to 2 500 cores, good to 5K cores 9 8 7 6 5 4 3 2 1 0 scale 28 scale 29 scale 30 perfect 1225 cores 2500 cores 5041 cores Number of cores •  On ­node thread parallelism starts to show benefit at 10K cores and above KDT development and licensing •  KDT is a collabora5on among UCSB (John Gilbert et al), LBL (Aydin Buluc), and MicrosoO Technical Compu5ng •  The resul5ng soOware is released under the New BSD license •  v0.1 was released on March 17 •  Tested on Linux x86 and Cray XT configura5ons •  V0.2 release targeted for early June •  The project homepage is kdt.sourceforge.net •  Downloads, User Guide, FAQ and bug repor5ng Planned KDT v0.2 Content Windows HPC Server version Seman5c graphs Hypergraphs Clustering  ­ Markov and spectral •  Out ­of ­core (Dryad ­based) version (likely v0.3) •  Cray XMT version •  •  •  •  –  Discussing with Cray et al. •  Version based on other graph infrastructures –  E.g., Parallel Boost Graph Library, SNAP, Mul5Threaded Graph Library Technically Architecturally Knowledge Discovery Toolbox (KDT) embodies two key innova5ons:  ­ Technically, non ­graph ­expert subject ­maXer experts analyze terascale graphs with mul5ple advanced algorithms with leading performance  ­ Architecturally, graph algorithm users, graph algorithm developers, and graph infrastructure developers each use complementary interfaces to advance the field Backup Graphs ­on ­Disk Use Case Technically Architecturally Does graph analysis make sense on data that won’t all fit in memory? memory KDT Graphs ­on ­Disk Use Case Technically Architecturally Does graph analysis make sense on data that won’t all fit in memory? •  The sparse ­matrix ­linear ­algebra approach structures communica5on, so raw pointer ­ chasing performance not so important •  People are building sparse ­matrix packages on top of MapReduce/Hadoop •  We will shortly map the KDT APIs onto a sparse ­matrix package based on Dryad* •  mInterface perhaps emory import kdtooc […] G = kdtooc.load(‘mydata’) G.bfsTree(…) KDT *hXp://research.microsoO.com/en ­us/projects/Dryad/ Technically Ques5ons about KDT ­on ­disk Support Architecturally •  Assuming that in ­memory processing is much faster than on ­disk (10X?), what type of graph ops would be prac5cal for on ­disk data? Just simple ops? Would something as compute ­intensive as BC ever make sense out ­of ­core? •  Is seman5c graph’s filtering capability essen5al for on ­disk processing? KDT Implementa5on on Combinatorial BLAS Technically Ecologically •  Combinatorial BLAS •  Built for combinatorial (sparse ­matrix) problems •  Not limited to simple directed graphs •  Powers the func5onality and performance of KDT •  Scales well to 2K ­4K cores Real applicaIons Network Vulnerability Analysis Community Detec5on Applets centrality(‘exactBC’) centrality(‘approxBC’) Building blocks bfsTree, isBfsTree, neighbors, pathsHop SpMV_SemiRing, SpMM_SemiRing Graph500 pageRank DiGraph u5lity ParVec/SpParVec u5lity (e.g., DiGraph (from edges), (e.g., +, ­,*,|,&,>,==,,abs, range, nverts, degrees, +, *, toParVec, max, sum, norm, randPerm, topK) subgraph, reverseEdges, load) Sparse ­matrix classes/ops/types (e.g., Apply, EWiseApply, Reduce) Technically Ecologically Example Implementa5on: bfsTree 2 1 4 5 7 1 3 6 from 1 to 7 AT 7 Technically Ecologically 2 1 4 5 7 1 3 6 from 1 7 1 1  to 1 7 AT X ATX Technically Ecologically 2 1 4 5 7 1 3 6 from 7 1 2 to 4  4 2 4 7 AT X ATX Technically Ecologically 2 1 4 5 7 1 3 6 from 7 1 3 to  5 5 7 7 AT X ATX Technically Ecologically 2 1 4 5 7 1 3 6 from 7 1  to 6 7 AT X ATX Technically bfsTree Implementa5on in KDT, for DiGraphs (Kernel 2 of Graph500) Ecologically def bfsTree(self, root, sym=False): if not sym: self.T() # synonym for reverseEdges parents = dg.ParVec(self.nvert(), -1) fringe = dg.SpParVec(self.nvert()) parents[root] = root fringe[root] = root while fringe.nnn() > 0: fringe.spRange() self._spm.SpMV_SelMax_inplace(fringe._spv) pcb.EWiseMult_inplacefirst(fringe._spv, parents._dpv, True, -1) parents[fringe] = fringe if not sym: self.T() return parents •  SpMV and EWiseMult are CombBLAS ops that do not yet have good graph abstrac5ons –  pathsHop is an aXempt for one flavor of SpMV Technically Ecologically pageRank Implementa5on in KDT (p. 1 of 2) def pageRank(self, epsilon = 0.1, dampingFactor = 0.85): # We don't want to modify the user's graph. G = self.copy() nvert = G.nvert() G._spm.removeSelfLoops() # Handle sink nodes (nodes with no outgoing edges) # connecting them to all other nodes. degout = G.degree(gr.Out) nonSinkNodes = degout.findInds() nSinkNodes = nvert - len(nonSinkNodes) iInd = ParVec(nSinkNodes*(nvert)) jInd = ParVec(nSinkNodes*(nvert)) wInd = ParVec(nSinkNodes*(nvert), 1) sinkSuppInd = 0 for ind in range(nvert): if degout[ind] == 0: # Connect to all nodes. for sInd in range(nvert): iInd[sinkSuppInd] = sInd jInd[sinkSuppInd] = ind sinkSuppInd = sinkSuppInd + 1 sinkMat = pcb.pySpParMat(nvert, nvert, iInd._dpv, jInd._dpv, wInd._dpv) sinkG = DiGraph() sinkG._spm = sinkMat •  This por5on looks more like graph opera5ons by Technically Ecologically pageRank Implementa5on in KDT (p. 2 of 2) (main loop) G.normalizeEdgeWeights() sinkG.normalizeEdgeWeights() # PageRank loop delta = 1 dv1 = ParVec(nvert, 1./nvert) v1 = dv1.toSpParVec() prevV = SpParVec(nvert) dampingVec = SpParVec.ones(nvert) * ((1 - dampingFactor)/nvert) while delta > epsilon: prevV = v1.copy() v2 = G._spm.SpMV_PlusTimes(v1._spv) + \ sinkG._spm.SpMV_PlusTimes(v1._spv) v1._spv = v2 v1 = v1*dampingFactor + dampingVec delta = (v1 - prevV)._spv.Reduce(pcb.plus(), pcb.abs()) return v1 •  This por5on looks much more like matrix algebra Technically Ecologically Graph500 Implementa5on in KDT (p. 1 of 2) scale = 15 nstarts = 640 GRAPH500 = 1 if GRAPH500 == 1: G = dg.DiGraph() K1elapsed = G.genGraph500Edges(scale) if nstarts > G.nvert(): nstarts = G.nvert() deg3verts = (G.degree() > 2).findInds() deg3verts.randPerm() starts = deg3verts[dg.ParVec.range(nstarts)] G.toBool() K2elapsed = 1e-12 K2edges = 0 for start in starts: start = int(start) if start==0: #HACK: avoid root==0 bugs for now continue before = time.time() parents = G.bfsTree(start, sym=True) K2elapsed += time.time() - before if not k2Validate(G, start, parents): print "Invalid BFS tree generated by bfsTree" print G, parents break [origI, origJ, ign] = G.toParVec() K2edges += len((parents[origI] != -1).find()) Graph500 Implementa5on in KDT (p. 2 of 2) def k2Validate(G, start, parents): ret = True Technically bfsRet = G.isBfsTree(start, parents) if type(ret) != tuple: Ecologically if dg.master(): print "isBfsTree detected failure of Graph500 test %d" % abs(ret) return False (valid, levels) = bfsRet # Spec test #3: [origI, origJ, ign] = G.toParVec() li = levels[origI] lj = levels[origJ] if not ((abs(li-lj) <= 1) | ((li==-1) & (lj==-1))).all(): if dg.master(): print "At least one graph edge has endpoints whose levels differ by more than one and is in the BFS tree" print li, lj ret = False # Spec test #4: neither_in = (li == -1) & (lj == -1) both_in = (li > -1) & (lj > -1) out2root = (li == -1) & (origJ == start) if not (neither_in | both_in | out2root).all(): if dg.master(): print "The tree does not span the connected component exactly, root=%d" % start ret = False # Spec test #5: respects = abs(li-lj) <= 1 if not (neither_in | respects).all(): if dg.master(): print "At least one vertex and its parent are not joined by an original edge" ret = False return ret  ­ #1 and #2: implemented in isBfsTree  ­ #3: every input edge has ver5ces whose levels differ by no more than 1. Note: don't actually have input edges, will use the edges in the resul5ng graph as a proxy  ­ #4: the BFS tree spans a connected component's ver5ces (== all edges either have both endpoints in the tree or not in the tree, or source is not in tree and des5na5on is the root)  ­ #5: a vertex and its parent are joined by an edge of the original graph Technically Ecologically isBfsTree implementa5on KDT (p. 1 of 2) def isBfsTree(self, root, parents, sym=False): ret = 1 # assume valid nvertG = self.nvert() # calculate level in the tree for each vertex; root is at level 0 if not sym: self.reverseEdges() parents2 = ParVec.zeros(nvertG) - 1 parents2[root] = root fringe = SpParVec(nvertG) fringe[root] = root levels = ParVec.zeros(nvertG) - 1 levels[root] = 0 level = 1 while fringe.nnn() > 0: fringe.spRange() #ToDo: create PCB graph-level op self._spm.SpMV_SelMax_inplace(fringe._spv) #ToDo: create PCB graph-level op pcb.EWiseMult_inplacefirst(fringe._spv, parents2._dpv, True, -1) parents2[fringe] = fringe levels[fringe] = level level += 1 if not sym: self.reverseEdges() Technically Ecologically isBfsTree implementa5on KDT (p. 2 of 2) # build a new graph from just tree edges tmp2 = parents != ParVec.range(nvertG) treeEdges = (parents != -1) & tmp2 treeI = parents[treeEdges.findInds()] treeJ = ParVec.range(nvertG)[treeEdges.findInds()] if (treeJ == root).any(): return -1 # note treeJ/TreeI reversed, so builtGT is transpose, as needed by SpMV builtGT = DiGraph(treeJ, treeI, 1, nvertG) visited = ParVec.zeros(nvertG) visited[root] = 1 fringe = SpParVec(nvertG) fringe[root] = root cycle = False; multiparents = False while fringe.nnn() > 0 and not cycle and not multiparents: fringe.spOnes() newfringe = SpParVec.toSpParVec( builtGT._spm.SpMV_PlusTimes(fringe._spv)) if visited[newfringe.toParVec().findInds()].any(): cycle = True break if (newfringe > 1).any(): multiparents = True fringe = newfringe visited[fringe] = 1 if cycle or multiparents: return -1 # spec test #2 if (levels[treeI]-levels[treeJ] != -1).any(): return -2 return (ret, levels)  ­ #1: validate that the tree is a tree and has no cycles:  ­ a) no edge has the root as a des5na5on  ­ b) no cycle exists  ­ c) no vertex has more than 1 parent  ­ #2: tree edges should be between ver5ces whose levels differ by 1 ...
View Full Document

This note was uploaded on 12/27/2011 for the course CMPSC 240A taught by Professor Gilbert during the Fall '09 term at UCSB.

Ask a homework question - tutors are online