{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lec5-pagerank

lec5-pagerank - Distributed Computing Seminar Lecture 5...

This preview shows pages 1–10. Sign up to view the full content.

Distributed Computing Seminar Lecture 5: Graph Algorithms & PageRank Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the content of this presentation is © 2007 Google Inc. and licensed under the Creative Commons Attribution 2.5 License.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Outline Motivation Graph Representations Breadth-First Search & Shortest-Path Finding PageRank
Motivating Concepts Performing computation on a graph data structure requires processing at each node Each node contains node-specific data as well as links (edges) to other nodes Computation must traverse the graph and perform the computation step How do we traverse a graph in MapReduce? How do we represent the graph for this?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Breadth-First Search Breadth-First Search is an iterated algorithm over graphs Frontier advances from origin by one level with each pass 1 2 2 2 3 3 3 3 4 4
Breadth-First Search & MapReduce Problem: This doesn't “fit” into MapReduce Solution: Iterated passes through MapReduce – map some nodes, result includes additional nodes which are fed into successive MapReduce passes

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Breadth-First Search & MapReduce Problem: Sending the entire graph to a map task (or hundreds/thousands of map tasks) involves an enormous amount of memory Solution: Carefully consider how we represent graphs
Graph Representations The most straightforward representation of graphs uses references from each node to its neighbors

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Direct References Structure is inherent to object Iteration requires linked list “threaded through” graph Requires common view of shared memory (synchronization!) Not easily serializable class GraphNode { Object data; Vector<GraphNode> out_edges; GraphNode iter_next; }
Adjacency Matrices Another classic graph representation. M[i] [j]= '1' implies a link from node i to j.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 33

lec5-pagerank - Distributed Computing Seminar Lecture 5...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online