Key value pairs with specified data types WordCount Reducer 63 Data types of

# Key value pairs with specified data types wordcount

This preview shows page 62 - 80 out of 102 pages.

Key-value pairs with specified data types
WordCount: Reducer 63 Data types of input key-value Should be the same as output data types of mapper Data types of output key-value A list of values
WordCount: setting up job 64 Take multiple directories as input Set output key and value types for both map and reduce tasks. If Mapper has different types, use setMapOutputKeyClass and setMapOutputValueClass
Example Consider input/helloword.txt i.e., only helloworld.txt under "input" dir helloworld.txt hello world hello this world hello hello world 65
Checking map input map input: key=0, value=hello world map input: key=12, value=hello this world map input: key=29, value=hello hello world 66
Checking reduce input reduce input: key=hello, values=1 1 1 1 reduce input: key=this, values=1 reduce input: key=world, values=1 1 1 67
Combiner 68
Combiner Run on the node running the Mapper Perform local (or mini-) reduction Combine Mapper results Before they are sent to the Reducers Reduce communication costs E.g., may use a combiner in WordCount (cat, 1), (cat, 1), (cat, 1) => (cat, 3) One key-value pair per unique word 69
Without combiner Mapper 1 outputs: (cat, 1), (cat, 1), (cat, 1), (dog, 1) Mapper 2 outputs: (dog, 1), (dog, 1), (cat, 1) Suppose only one Reducer It will receive: (cat, [1, 1, 1, 1]), (dog, [1, 1, 1]) 70
Implementing combiner May directly use the reduce function If it is commutative and associative Meaning operations can be grouped & performed in any order Operation 'op' is commutative A op B = B op A Op is associative A op (B op C) = (A op B) op C 71
Example: without combiner Consider two map tasks M1 => 1, 2, 3 for some key x M2 => 4, 5 for the same key Reducer adds all values for x Result = (((1 + 2) + 3) + 4) + 5 72
Example: with combiner M1 => 1, 2, 3 => combiner: (1 + 2) + 3 => 6 M2 => 4, 5 => combiner: 4 + 5 => 9 Reducer now 6 + 9, I.e., ((1 + 2) + 3) + (4 + 5) Question: is it the same as (((1 + 2) + 3) + 4) + 5? Yes, since '+' is associative 73
Example: with combiner M1 => 1, 2, 3 => combiner: (1 + 2) + 3 => 6 M2 => 4, 5 => combiner: 4 + 5 => 9 Reducer may also compute 9 + 6 , I.e., (4 + 5) + ((1 + 2) + 3) Since values may arrive at reducer in any order Question: is it the same as (((1 + 2) + 3) + 4) + 5? Yes, since '+' is also commutative 74
General requirements To use reduce function 'f' for a combiner Consider a set of values S and its subsets S 1 , …, S k It must be that: f(S) = f(f(S 1 ), …, f(S k )) E.g., in WordCount: f = sum S = a list of integers 75
Commutative and associative Examples Sum Max Min Non-examples Count Average Median 76
Custom combiner Key & value data type of both input & output Should be same as that of the output of Mapper (Also the same as the input of Reducer) So if Mapper outputs (Text, Text), then: public static class MyCombiner extends Reducer<Text, Text, Text, Text> { } 77
Enabling combiner job.setCombinerClass(IntSumReducer.class) To use reduce function for combiner 78
Two-split example Now input directory has two files => Two splits (hence two map tasks) generated, one for each file 79

#### You've reached the end of your free preview.

Want to read all 102 pages?

• Fall '14
• Hadoop, key-value pairs, run MapReduce programs