{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}


Each type to concurrently run its tasks for a task a

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: each type) to concurrently run its tasks. for a task. A task is chosen by the scheduler and assigned to this slave. v  Rank the tasks based on their es�mated finish �me. v  Schedule the tasks on fast nodes. 11 12 2 9/17/13 Hadoop’s Scheduler Specula�ve Tasks Iden�fica�on v  Every task has a progress score in the range [0,1]. v  The tasks considered for assignment are ordered as follows: v  Map task: 1)  Failed tasks (highest priority) v  Progress score = the frac�on of input data read 2)  Non-­‐running tasks v  Reduce task: v  For map tasks -­‐> Choose first the tasks for data that is local v  The task is divided in 3 phases: at the reques�ng slave. v  Copy phase: map outputs are copied to local storage. 3)  Specula�ve tasks v  Sort phase: grouping by key. v  They are all equal, but data locality is also considered here v  Reduce phase: execu�ng the reduce func�on. for assignment. v  Each phase accounts for 1/3 of the progress score, and each score is the frac�on of data processed. 13 Specula�ve Tasks Iden�fica�on 14 Specula�ve Tasks Iden�fica�on v  Example: v  The average progress score for each task type is computed. v  If the reduce task has performed half of the copy phase: v  Progress score = (1/2).(1/3) = (1/6) v  Threshold = average progress score – 0.2 v  If the reduce task is in the middle of the reduce phase: v  A task is marked as a straggler and considered for v  Progress score = (1/3) + (1/3) + (1/2).(1/3) = (5/6) specula�on execu�on, if: v  It has been running for more than 1 minute. v  Its progress score is less than the threshold. 15 16 Assump�ons Invalidated Hadoop’s Scheduler Assump�ons 1.  The tasks progress rate at each slave are roughly the v  These assump�ons don’t hold in a virtualized data 2.  A task’s progress rate is constant. v  Not all slaves are equal same. center à heterogeneous environment. v  difference could be in the physical machines or the 3.  Performing specula�ve tasks in idle slots of slaves does compe�ng co-­‐allocated VMs. not incur addi�onal cost. v  This invalidates the first two assump�ons: progress rates are not equal. 4.  Each phase in the reduce task represents 1/3 of its total �me. v  Fixed threshold à too many specula�ve tasks. v  All specula�ve tasks are equal à incorrect, they vary in 5.  A straggler likelihood depend on its progress score. progress achieved and progress rate. 17 18 3 9/17/13 Assump�ons Invalidated Assump�ons Invalidated v  The other 3 assump�ons are invalid in both v  The assump�on that each phase in the reduce task homogeneous and heterogeneous clusters. represents 1/3 of its total �me. v  The copy phase is the slowest. v  The assump�on of no addi�onal cost for performing v  A�er the copy phase, the task’s progress score changes specula�ve tasks on idle slots of slaves does not hold when resources are shared:...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online