Unformatted text preview: oup. v The VMs within a group can exist in diﬀerent priority regions of the run queue. à The scheduler must check all regions for VMs within the group. 47 48 8 9/17/13 Three Approaches to Improve the Performance MR Job v P2P–MapReduce, by Marozzo et al. . MR Job Tasks Third Approach Tasks v A peer-‐to-‐peer architecture of MR nodes. v Ineﬃciency they are addressing: master failure. LATE : Specula�ve Task Scheduler v Goal: minimize lost compu�ng �me caused by master failure. P2P-‐MR : Architecture Change Nodes Virtualized Machines MRG : VM Scheduler Physical Machines 49 50 System Model in General System Architecture (more details and data structures in the paper) v The system is composed of a set of peer nodes: a number of Node them are assigned the master role, while the remaining are assigned the slave role. Slave v The assignment can be changed to maintain the master/slave ra�o by the coordinator (only if they are idle). Master Primary Master v Each job has a primary master that keeps the job state (ex: task assignments to slaves, and the status of each task), and a set of backup maters that are updated with a replica of this job state. Backup Master Coordinator Tasks v Upon the failure of the primary master, its place is taken by one of its backup masters. User Job 51 Example Discovery Service Node1: Master
Prim. Back. Coord. T3 Node1: Master Prim. Back. Coord. Node2: Master Prim. Back. Coord. Node3: Master Prim. Back. Coord. User 2 Job 1 T2 Job Submission: Master Choice Node3: Master Prim. Back. Coord. Node2: Master Prim. Back. Coord. User 1 T1 52 Job 2 T4 T5 T1 T2 T3 T4 T1 T2 Job 1 (4) Assign job Job 3 T3 T4 Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Node4: Node5: Node6: Node7: Node8: Node9: Node10: Node11: Slave Slave Slave Slave Slave Slave Slave Slave 53 Discovery Service (1) masters? User 1 (2) Available masters (3) Choose one with lowest load 54 9 9/17/13 Master Failure Recovery Backup and Task Assignment (3) Choose B backup nodes and S slaves (choose ones with lowest workload) Node1: Master Prim. Back. Coord. Node2: Master Prim. Back. Coord. Node3: Master Prim. Back. Coord. (1) (2) N2 and N3 detect N1’s failure à start elec�on for new primary Node1: Master Prim. Back. Coord. Node2: Master Prim. Back. Coord. (4) Assign backup job (1) (2) Available masters and slaves Discovery Service User 1 Job 1 User 1 (4) Choose another backup node and assign backup job (3) Tasks Tasks Tasks Node4: Node5: Node6: Slave Slave Slave (4) Assign tasks Node3: Master Prim. Back. Coord. Job 1 T1 In the event of backup master failure or slave failure à the primary master looks for a replacement. Tasks Node4: Slave 55 T2 T3 Tasks Node7: Slave Node12: Master Prim. Back. Coord. T4 T5 Node13: Master Prim. Back. Coord. Tasks Node11: Slave Tasks Node9: Slave (5) Send update messages 56 New Node Elec�on Procedure Discovery Service v For new primary (among backup nodes) or new coordinator (among all nodes): Tasks (2) wait Node i:...
View Full Document
- Fall '13