This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Eﬃcient Allgather for Regular SMP-Clusters Jesper Larsson Tr¨aff C&C Research Laboratories, NEC Europe Ltd. Rathausallee 10, D-53757 Sankt Augustin, Germany [email protected] Abstract. We show how to adapt and extend a well-known allgather (all-to-all broadcast) algorithm to parallel systems with a hierarchical communication system such as clusters of SMP nodes. For small prob- lem sizes, the new algorithm requires a logarithmic number of commu- nication rounds in the number of SMP nodes, and gracefully degrades towards a linear algorithm as problem size increases. The algorithm has been used to implement the MPI Allgather collective operation of MPI in the MPI/SX library. Performance measurements on a 72 node SX-8 system shows that graceful degradation provides a smooth transi- tion from logarithmic to linear behavior, and significantly outperforms a standard, linear algorithm. The performance of the latter is furthermore highly sensitive to the distribution of MPI processes over the physical processors. 1 Introduction An important and well-studied collective communication primitive for message- passing systems is the allgather or all-to-all broadcast operation , in which each processor has data which have to be distributed (i.e. broadcast) to all other processors. This primitive has been extensively studied in a variety of settings and, correspondingly, is known also as (for instance) total exchange [5,4], catenation , and gossip . We will use the term allgather here. The allgather primitive is incorporated as a collective communication opera- tion in the Message-Passing Interface (MPI) standard  in two ﬂavors. The MPI Allgather collective is regular in the sense that the size of the data to be broadcast by each MPI process must be the same for all processes. The more general, irregular MPI Allgatherv collective does not have this restriction, and each process may contribute data of different size. A peculiarity of both MPI primitives, however, is that the size of the data contributed by each process is known by all processes in advance. There has recently been much interest in improving the collective operations in various MPI libraries, see for instance [1,10,12] (and the references therein). Various allgather algorithms for MPI were discussed and evaluated in . How- ever, the collective operations in many MPI libraries are not adapted to systems with hybrid, hierarchical communication systems such as clusters of SMP nodes (see [7,8,10] for exceptions). B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, pp. 58–65, 2006. c Springer-Verlag Berlin Heidelberg 2006 Eﬃcient Allgather for Regular SMP-Clusters 59 In this paper we give an improvement to a well-known allgather algorithm which makes it suitable to the SMP case. In this context an SMP cluster is simply a collection of multi-processor nodes interconnected by a communication network. Communication between processors on the same node (typically via shared memory) is assumed to be faster (lower latency, higher bandwidth) than...
View Full Document
- Spring '09
- Message Passing Interface, MPI Allgather, allgather algorithm, SMP nodes