Key value pairs with specified data types WordCount Reducer 63 Data types of

Key value pairs with specified data types wordcount

This preview shows page 62 - 80 out of 102 pages.

Key-value pairs with specified data types
Image of page 62
WordCount: Reducer 63 Data types of input key-value Should be the same as output data types of mapper Data types of output key-value A list of values
Image of page 63
WordCount: setting up job 64 Take multiple directories as input Set output key and value types for both map and reduce tasks. If Mapper has different types, use setMapOutputKeyClass and setMapOutputValueClass
Image of page 64
Example Consider input/helloword.txt i.e., only helloworld.txt under "input" dir helloworld.txt hello world hello this world hello hello world 65
Image of page 65
Checking map input map input: key=0, value=hello world map input: key=12, value=hello this world map input: key=29, value=hello hello world 66
Image of page 66
Checking reduce input reduce input: key=hello, values=1 1 1 1 reduce input: key=this, values=1 reduce input: key=world, values=1 1 1 67
Image of page 67
Combiner 68
Image of page 68
Combiner Run on the node running the Mapper Perform local (or mini-) reduction Combine Mapper results Before they are sent to the Reducers Reduce communication costs E.g., may use a combiner in WordCount (cat, 1), (cat, 1), (cat, 1) => (cat, 3) One key-value pair per unique word 69
Image of page 69
Without combiner Mapper 1 outputs: (cat, 1), (cat, 1), (cat, 1), (dog, 1) Mapper 2 outputs: (dog, 1), (dog, 1), (cat, 1) Suppose only one Reducer It will receive: (cat, [1, 1, 1, 1]), (dog, [1, 1, 1]) 70
Image of page 70
Implementing combiner May directly use the reduce function If it is commutative and associative Meaning operations can be grouped & performed in any order Operation 'op' is commutative A op B = B op A Op is associative A op (B op C) = (A op B) op C 71
Image of page 71
Example: without combiner Consider two map tasks M1 => 1, 2, 3 for some key x M2 => 4, 5 for the same key Reducer adds all values for x Result = (((1 + 2) + 3) + 4) + 5 72
Image of page 72
Example: with combiner M1 => 1, 2, 3 => combiner: (1 + 2) + 3 => 6 M2 => 4, 5 => combiner: 4 + 5 => 9 Reducer now 6 + 9, I.e., ((1 + 2) + 3) + (4 + 5) Question: is it the same as (((1 + 2) + 3) + 4) + 5? Yes, since '+' is associative 73
Image of page 73
Example: with combiner M1 => 1, 2, 3 => combiner: (1 + 2) + 3 => 6 M2 => 4, 5 => combiner: 4 + 5 => 9 Reducer may also compute 9 + 6 , I.e., (4 + 5) + ((1 + 2) + 3) Since values may arrive at reducer in any order Question: is it the same as (((1 + 2) + 3) + 4) + 5? Yes, since '+' is also commutative 74
Image of page 74
General requirements To use reduce function 'f' for a combiner Consider a set of values S and its subsets S 1 , …, S k It must be that: f(S) = f(f(S 1 ), …, f(S k )) E.g., in WordCount: f = sum S = a list of integers 75
Image of page 75
Commutative and associative Examples Sum Max Min Non-examples Count Average Median 76
Image of page 76
Custom combiner Key & value data type of both input & output Should be same as that of the output of Mapper (Also the same as the input of Reducer) So if Mapper outputs (Text, Text), then: public static class MyCombiner extends Reducer<Text, Text, Text, Text> { } 77
Image of page 77
Enabling combiner job.setCombinerClass(IntSumReducer.class) To use reduce function for combiner 78
Image of page 78
Two-split example Now input directory has two files => Two splits (hence two map tasks) generated, one for each file 79
Image of page 79
Image of page 80

You've reached the end of your free preview.

Want to read all 102 pages?

  • Fall '14
  • Hadoop, key-value pairs, run MapReduce programs

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture