Moreover, each master node can act as a backup node for other master nodes. A user node can submit the job to one of the master nodes, which will manage it as usual in MapReduce. That master will dynamically replicate the entire job state (i.e., the assignments of tasks to nodes, the locations of intermediate results, etc.) on its backup nodes. In case those backup nodes detect the failure of the master, they will elect a new master among them that will manage the job computation using its local replica of the job state. The remainder of this section describes the architecture of the P2P-MapReduce frame- work, its current implementation, and a preliminary evaluation of its performance. 7.3.1 Architecture The P2P-MapReduce architecture includes three basic roles, shown in Fig. 7.2 : user ( U ), master ( M ), and slave ( S ). Master nodes and slave nodes form two logical P2P networks called M-net and S-net , respectively. As mentioned earlier, computing nodes are dynamically assigned the master or slave role, and hence, M-net and
117 7 A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud S-Net change their composition over time. The mechanisms used for maintaining this infrastructure are discussed in Section 3.2 . In the following, we describe, through an example, how a master failure is handled in the P2P-MapReduce architecture. We assume the initial configuration represented in Fig. 7.2 , where U is the user node that submits a MapReduce job, nodes M are the masters, and nodes S are the slaves. The following steps are performed to submit the job and recover from a master failure (see Fig. 7.3 ): 1. U queries M-net to get the list of the available masters, each one characterized by a workload index that measures how busy the node is. U orders the list by ascend- ing values of workload index and takes the first element as a primary master. In this example, the chosen primary master is M 1 ; thus, U submits the MapReduce job to M 1 . 2. M 1 chooses k masters for the backup role. In this example, assuming that k =2, M 1 chooses M 2 and M 3 for this role. Thus, M 1 notifies M 2 and M 3 that they will act as backup nodes for the current job (in Fig. 7.3 , the apex “ B ” to nodes M 2 and M 3 indicates the backup function). This implies that whenever the job state changes, M 1 backs up it on M 2 and M 3 , which in turn will periodically check whether M 1 is alive. 3. M 1 queries S-net to get the list of the available slaves, choosing (part of) them to execute a map or a reduce task. As for the masters, the choice of the slave nodes to use is done on the basis of a workload index. In this example, nodes S 1 , S 3 , and S 4 are selected as slaves. The tasks are started on the slave nodes and managed as usual in MapReduce. 4. The primary master M 1 fails. Backup masters M 2 and M 3 detect the failure of M 1 and start a distributed procedure to elect a new primary master among them.
- Spring '16
- Mr Gebre