lec05 - CS 3110 Lecture 5: Mapping and Folding and the...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 3110 Lecture 5: Mapping and Folding and the Map-Reduce Paradigm The Map-Reduce Paradigm Map-reduce is a programming model that has its roots in functional programming. In addition to often producing short, elegant code for problems involving lists or collections, this model has proven very useful for large-scale highly parallel data processing. Here we will think of map and reduce as operating on lists for concreteness, but they are appropriate to any collection (sets, etc.). Map operates on a list of values in order to produce a new list of values, by applying the same computation to each value. Reduce operates on a list of values to collapse or combine those values into a single value (or more generally some number of values), again by applying the same computation to each value. For large datasets, it has proven particularly valuable to think about processing in terms of the same operation being applied independently to each item in the dataset, either to produce a new dataset, or to produce summary result for the entire dataset. This way of thinking often works well on parallel hardware, where each of many processors can handle one or more data items at the same time. The Connection Machine, a massively parallel computer developed in the late 1980's made heavy use of this programming paradigm. More recently, Google, with their large server farms, have made very effective use of it. Two Google fellows, Dean and Ghemawat, have reported some of these uses in a 2004 paper at OSDI (the Operating System Design and Implementation Conference) and a 2008 paper in CACM (the monthly magazine of the main Computer Science professional society). Much of the focus in those papers is on separating the fault tolerance and distributed processing issues of operating on large clusters of machines from the programming language abstractions. At Google they have found map-reduce to be a highly useful way of separating out the conceptual model of large-scale computations on big collections of data, using mapping and reducing operations, from the issue of how that computation is implemented in a reliable manner on a large cluster of machines. In the introduction of their 2008 paper, Dean and Ghemawat write "Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages. We realized that most of our computations involved applying a map operation to each logical record in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key in order to combine the derived data appropriately." In the paper they discuss a number of applications that are simplified by this functional programming view of massive parallelism. To give a sense of the scale
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 4

lec05 - CS 3110 Lecture 5: Mapping and Folding and the...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online