map-reduce2

map-reduce2 - 1 Generalizing Map-Reduce The Computational...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Generalizing Map-Reduce The Computational Model Map-Reduce-Like Algorithms Computing Joins 2 Overview ◆ There is a new computing environment available: ◗ Massive files, many compute nodes. ◆ Map-reduce allows us to exploit this environment easily. ◆ But not everything is map-reduce. ◆ What else can we do in the same environment? 3 Files ◆ Stored in dedicated file system. ◆ Treated like relations. ◗ Order of elements does not matter. ◆ Massive chunks (e.g., 64MB). ◆ Chunks are replicated. ◆ Parallel read/write of chunks is possible. 4 Processes ◆ Each process operates at one node. ◆ “Infinite” supply of nodes. ◆ Communication among processes can be via the file system or special communication channels. ◗ Example : Master controller assembling output of Map processes and passing them to Reduce processes. 5 Algorithms ◆ An algorithm is described by an acyclic graph. 1. A collection of processes ( nodes ). 2. Arcs from node a to node b , indicating that (part of) the output of a goes to the input of b . 6 Example : A Map-Reduce Graph map map map reduce reduce reduce . . . 7 Algorithm Design ◆ Goal : Algorithms should exploit as much parallelism as possible. ◆ To encourage parallelism, we put a limit s on the amount of input or output that any one process can have. ◗ s could be: • What fits in main memory. • What fits on local disk. • No more than a process can handle before cosmic rays are likely to cause an error. 8 Cost Measures for Algorithms 1. Communication cost = total I/O of all processes. processes....
View Full Document

This document was uploaded on 03/04/2012.

Page1 / 24

map-reduce2 - 1 Generalizing Map-Reduce The Computational...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online