Performance evaluation of allgather algorithms on Terascale linux cluster with fast ethernet

Performance evaluation of allgather algorithms on Terascale linux cluster with fast ethernet

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Performance Evaluation of Allgather Algorithms On Terascale Linux Cluster with Fast Ethernet * Jing Chen 2 , 1 Yunquan Zhang 3 , 2 jingchen@mail.rdcps.ac.cn zyq@mail.rdcps.ac.cn Linbo Zhang 4 Wei Yuan 3 , 2 zlb@lsec.cc.ac.cn yw@mail.rdcps.ac.cn Abstract We report our work on evaluating performance of several MPI Allgather algorithms on Fast Ethernet. These algorithms are ring, recursive doubling, Bruck, and neighbor exchange. The first three algorithms are widely used today. The neighbor exchange algorithm which was recently proposed by the authors incorporates pair-wise exchange, and is expected to perform better with certain configurations, mainly when using TCP/IP over Ethernet. We tested the four algorithms on terascale Linux clusters DeepComp 6800 and DAWNING 4000A using TCP/IP over Fast Ethernet. Results show that our neighbor exchange algorithm performs the best for long messages, the ring algorithm performs the best for medium-size messages and the recursive doubling algorithm performs the best for short messages. 1. Introduction High performance computing has undergone rapid change in the last decades. Today, Linux clusters become more and more popular, and thousands of processors can be incorporated into one parallel machine. These processors may be connected by expensive high-speed interconnects such as Myrinet or QsNET to achieve high performance, or by cheap slow commodity Fast Ethernet or Gigabit Ethernet to lower the cost of a cluster. For *This work was supported in partial by National 863 Plan of China under contract No.2004AA104020, National Natural Science Foundation No.60303020 of China, and 973 Program, G1999032805 and 2005CB321702 1 Department of Computer Science, University of Science and Technology of China 2 Lab. of Parallel Computing, Institute of Software, CAS 3 State Key Laboratory of Computer Science, China 4 Academy of Mathematics and Systems Sciences, CAS practical purpose, most clusters use both kinds of interconnection. On such large scale clusters, collective communications involving large number of processors would become performance bottleneck, thus optimizing their performance makes great sense. Through experiment, we found that different algorithms can behave very differently on different networks, for example, an algorithm that is optimal for Myrinet may be inferior for Fast Ethernet. In this study, we carried out lots of experiments to investigate the best allgather algorithm for Fast Ethernet. MPI_Allgather is one of the most frequently used collective communications in MPI library. In this function, each process sends a portion of data to all other processes. Three algorithms are used for Allgather in the latest versions of MPICH1.2.6: the ring, the recursive doubling and the Bruck algorithms. In the interest of minimizing TCP traffic over Fast Ethernet, we developed a new algorithm called the neighbor exchange allgather algorithm [1]. We tested these four algorithms on DeepComp6800 and DAWNING 4000A using Fast Ethernet and the performance results will be reported in this paper.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 6

Performance evaluation of allgather algorithms on Terascale linux cluster with fast ethernet

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online