{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

how_hadoop_works

how_hadoop_works - CPS216 Advanced Database...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Lifecycle of a MapReduce Job Map function Reduce function Run this program as a MapReduce job
Background image of page 2
Lifecycle of a MapReduce Job Map function Reduce function Run this program as a MapReduce job
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Map Wave 1 Reduce Wave 1 Map Wave 2 Reduce Wave 2 Input Splits Lifecycle of a MapReduce Job Time
Background image of page 4
Components in a Hadoop MR Workflow Next few slides are from: http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Job Submission
Background image of page 6
Initialization
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Scheduling
Background image of page 8
Execution
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Map Task
Background image of page 10
Sort Buffer
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Reduce Tasks
Background image of page 12
Quick Overview of Other Topics (Will Revisit Them Later in the Course) Dealing with failures Hadoop Distributed FileSystem (HDFS) Optimizing a MapReduce job
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Dealing with Failures and Slow Tasks What to do when a task fails? Try again (retries possible because of idempotence ) Try again somewhere else Report failure What about slow tasks: stragglers Run another version of the same task in parallel. Take
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}