Unformatted text preview: from 1/3 to 1 quickly à increases the average progress score. v Ex: network, and disk I/O. v Needless specula�on can reduce throughput. v Also not true in “pay-‐as-‐you-‐go” se�ngs. 19 Assump�ons Invalidated 20 Assump�ons Invalidated v Example: 30% of reducers ﬁnish, while the others are v The assump�on that a straggler’s likelihood depend on s�ll in the copy phase. its progress score. v Average progress score = (0.3).(1) + (0.7).(1/3) ~= 53% v New and fast tasks can be chosen over old and slow tasks. v The threshold jumped from (33 – 20 = 13%) to (53 – 20 = v It should also consider progress rate. 33%) à all the remaining 70% of the reducers are considered for specula�on. à task slots will be ﬁlled from an arbitrary selec�on. v Average progress score limit is 100% à Threshold = 80%. à Tasks with progress score > 80% can never be considered for specula�on. v True stragglers may have not been chosen. v Network overloaded with unnecessary copying. 21 LATE Scheduler 22 LATE Scheduler v Goal: reduce the MR job response �me. v Determining which nodes are slow: v A node’s total performed work = sum of progress scores v Approach: for both completed and in-‐progress tasks. 1. Priori�ze the specula�ve tasks based on their es�mated v If total < SlowNodeThreshold à it is a straggler. ﬁnish �me. v To reduce resources cost: 2. Schedule the tasks on fast nodes (avoid stragglers). v specula�veCap: a cap on the number of running v Es�ma�ng a task’s ﬁnish �me: specula�ve tasks at a �me. v Progress rate = progressScore / T v Use the SlowTaskThreshold as an addi�onal condi�on for running specula�ve tasks. v Remaining �me = (1 – progressScore) / progressRate v If there are only fast tasks that are running à no need to v Assump�on: the task’s progress rate is constant (may not speculate them. hold). 23 24 4 9/17/13 LATE Scheduler LATE Scheduler v Algorithm: v Se�ng parameters: (by prac�ce) v When a node asks for a task v Specula�veCap = 10% of available task slots. v slowNodeThreshold = 25th percen�le of node progress. v If the number of running specula�ve tasks < specula�veCap v If the node’s total progress > slowNodeThreshold v slowTaskThreshold = 25th percen�le of task progress rates. v Rank running tasks (that are not currently speculated, and have run for more than 1 minute) by es�mated remaining �me v Choose the highest-‐ranked task with progress rate < slowTaskThreshold, then send it the asking node. v Note: it does not consider data locality such as in Hadoop. 25 LATE Scheduler 26 Incorrect Es�mated Times -‐ Example v Some incorrect es�mated �mes: v The es�mated remaining �me for a task was based on the progress rate: (1 – ProgressScore) / ProgressRate. v This assumed constant progress rate per task, whic...
View Full Document
- Fall '13