lec5-pagerank - Distributed Computing Seminar Lecture 5:...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Distributed Computing Seminar Lecture 5: Graph Algorithms & PageRank Summer 2007 Except as otherwise noted, the content of this presentation is © 2007 Google Inc. and licensed under the Creative Commons Attribution 2.5 License.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline Motivation Graph Representations Breadth-First Search & Shortest-Path Finding PageRank
Background image of page 2
Motivating Concepts Performing computation on a graph data structure requires processing at each node Each node contains node-specific data as well as links (edges) to other nodes Computation must traverse the graph and perform the computation step How do we traverse a graph in MapReduce? How do we represent the graph for this?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Breadth-First Search Breadth-First Search is an iterated algorithm over graphs Frontier advances from origin by one level with each pass 1 2 2 2 3 3 3 3 4 4
Background image of page 4
Problem: This doesn't “fit” into MapReduce Solution: Iterated passes through MapReduce – map some nodes, result includes additional nodes which are fed into successive MapReduce passes
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Breadth-First Search & MapReduce Problem: Sending the entire graph to a map task (or hundreds/thousands of map tasks) involves an enormous amount of memory Solution: Carefully consider how we represent graphs
Background image of page 6
Graph Representations The most straightforward representation of graphs uses references from each node to its neighbors
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Direct References Structure is inherent to object Iteration requires linked list “threaded through” graph Requires common view of shared memory (synchronization!) Not easily serializable class GraphNode { Object data; Vector<GraphNode> out_edges; GraphNode iter_next; }
Background image of page 8
Another classic graph representation. M[i] [j]= '1' implies a link from node i to j. Naturally encapsulates iteration over nodes
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/06/2008 for the course CSE 450 taught by Professor Davison during the Spring '08 term at Lehigh University .

Page1 / 33

lec5-pagerank - Distributed Computing Seminar Lecture 5:...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online