COMP10052-AAAF-09-2010-for-viewing

The components take responsibility for spligng

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: f(c1), …, f(cn)] . It applies the func<on passed as argument to each element of the collec<on independently.   Here is an example in Python: >>> >>> ... >>> [2, A = [1,2,3,4,5] def incr(x): return x+1 B = map(incr, A); B 3, 4, 5, 6] >>> Note that the first argument to map is a func<on. Note that map is parallelizable.     The Map ­Reduce Model (3) 20     In its simplest form, reduce takes a binary, associa<ve func<on ⊚, (op<onally an ini<al value i) and a collec<on [c1, …, cn] and returns the value i ⊚ c1 ⊚ … ⊚ cn . In other words, it applies the func<on passed as argument itera<vely (like an acumulator) to the elements in the collec<on.   Here is an example in Python: >>> B = map(incr, A); B [2, 3, 4, 5, 6] >>> def prodDup(x,y): ... return (x*2)*(y*2) ... >>> reduce(prodDup, B) 184320 >>> reduce(prodDup, B[0:3]+B[3:]) == reduce(prodDup, B[3:]+B[0:3]) True >>>     Note that the first argument to reduce is a func<on. Note that reduce is parallelizable. Google MapReduce Engine 21       Google built bespoke components (including a distributed file system and a scheduler/load ­balancer) to support computa<ons expressed in terms of map and reduce. It makes a useful class of distributed computa<ons easy to code for parallel execu<on. Apache Hadoop is a Yahoo ­ sponsored open ­source map ­reduce engine.       The components take responsibility for spligng, spawning and merging. Because of the absence of shared state, no need for locks: paralleliza<on reaches massive levels (thousands of commodity PCs) over terabyte ­scale collec<ons. A barrier is used to synchronize the map phase with the reduce phase. MapReduce Programming Model 22   Users implement the following interfaces (of two func<ons, called the mapper and the reducer):         mapper (in_key, in_value) -> (out_key, intermediate_value) list reducer (out_key, intermediate_value list) -> out_value list The infrastructure takes care of spligng, spawning and merging. It also handles fault ­tolerance and load balancing. Example: Coun<ng Word Occurrences (1) 23 mapper(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1") reducer(String output_key, Iterator intermediate_values): // output_key: a wo...
View Full Document

Ask a homework question - tutors are online