5-MapReduceAlgorithms

5-MapReduceAlgorithms - MapReduce Algorithms 2009 Cloudera,...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
MapReduce Algorithms © 2009 Cloudera, Inc.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© 2009 Cloudera, Inc. Algorithms for MapReduce • Sorting • Searching • Indexing • Classification • Joining • TF-IDF
Background image of page 2
© 2009 Cloudera, Inc. MapReduce Jobs • Tend to be very short, code-wise – IdentityReducer is very common • “Utility” jobs can be composed • Represent a data flow , more so than a procedure
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© 2009 Cloudera, Inc. Sort: Inputs • A set of files, one value per line. • Mapper key is file name, line number • Mapper value is the contents of the line
Background image of page 4
© 2009 Cloudera, Inc. Sort Algorithm • Takes advantage of reducer properties: (key, value) pairs are processed in order by key; reducers are themselves ordered • Mapper: Identity function for value (k, v) (v, _) • Reducer: Identity function (k’, _) -> (k’, “”)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© 2009 Cloudera, Inc. Sort: The Trick • (key, value) pairs from mappers are sent to a particular reducer based on hash(key) • Must pick the hash function for your data such that k 1 < k 2 => hash(k 1 ) < hash(k 2 ) ¸¹º»»¼½
Background image of page 6
© 2009 Cloudera, Inc. Final Thoughts on Sort • Used as a test of Hadoop’s raw speed • Essentially “IO drag race” • Highlights utility of GFS
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© 2009 Cloudera, Inc. Search: Inputs • A set of files containing lines of text • A search pattern to find • Mapper key is file name, line number • Mapper value is the contents of the line • Search pattern sent as special parameter
Background image of page 8
© 2009 Cloudera, Inc. Search Algorithm
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 31

5-MapReduceAlgorithms - MapReduce Algorithms 2009 Cloudera,...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online