starfish-cps216-sep

starfish-cps216-sep - Herodotos Herodotou, Harold Lim, Fei...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Analysis in the Big Data Era 9/26/2011 2 Massive Data Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis Starfish
Background image of page 2
Hadoop MapReduce Ecosystem Popular solution to Big Data Analytics 9/26/2011 3 MapReduce Execution Engine Distributed File System Hadoop Java / C++ / R / Python Oozie Hive Pig Elastic MapReduce Jaql HBase Starfish
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Practitioners of Big Data Analytics Who are the users? Data analysts, statisticians, computational scientists… Researchers, developers, testers… You! Who performs setup and tuning? The users! Usually lack expertise to tune the system 9/26/2011 4 Starfish
Background image of page 4
Tuning Challenges Heavy use of programming languages for MapReduce programs (e.g., Java/python) Data loaded/accessed as opaque files Large space of tuning choices Elasticity is wonderful, but hard to achieve (Hadoop has many useful mechanisms, but policies are lacking) Terabyte-scale data cycles 9/26/2011 5 Starfish
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Our goal: Provide good performance automatically Starfish: Self-tuning System 9/26/2011 6 MapReduce Execution Engine Distributed File System Hadoop Java / C++ / R / Python Oozie Hive Pig Elastic MapReduce Jaql HBase Starfish Analytics System Starfish
Background image of page 6
What are the Tuning Problems? 9/26/2011 7 Job-level MapReduce configuration Workload management Data layout tuning Cluster sizing Workflow optimization J 1 J 2 J 3 J 4 Starfish
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Starfish’s Core Approach to Tuning 9/26/2011 8 1) if Δ(conf. parameters) then what …? 2) if Δ(data properties) then what …? 3) if Δ(cluster properties) then what …? Profiler Collects concise summaries of execution What-if Engine Estimates impact of hypothetical changes on execution Optimizers Search through space of tuning choices Job Workflow Workload Data layout Cluster Starfish
Background image of page 8
Starfish Architecture 9/26/2011 9 Profiler What-if Engine Workflow Optimizer Workload Optimizer Elastisizer Job Optimizer Data Manager Metadata Mgr. Intermediate Data Mgr. Data Layout & Storage Mgr. Starfish
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
MapReduce Job Execution 9/26/2011 10 split 0 map out 0 reduce Two Map Waves One Reduce Wave split 2 map split 1 map split 3 map Out 1 reduce job j = < program p , data d , resources r , configuration c > Starfish
Background image of page 10
What Controls MR Job Execution?
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 01/17/2012.

Page1 / 32

starfish-cps216-sep - Herodotos Herodotou, Harold Lim, Fei...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online