Performance analysis of MPI Collective Operations

Performance analysis of MPI Collective Operations -...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Performance Analysis of MPI Collective Operations * Jelena Pjeˇ sivac–Grbovi´ c, Thara Angskun, George Bosilca, Graham E. Fagg, Edgar Gabriel Jack J. Dongarra Innovative Computing Laboratory , Computer Science Department, University of Tennessee, 1122 Volunteer Blvd., Suite 413, Knoxville, TN 37996-3450, USA { pjesa, angskun, bosilca, fagg, egabriel, dongarra } @cs.utk.edu Abstract Previous studies of application usage show that the performance of collective communica- tions are critical for high performance computing and are often overlooked when compared to the point-to-point performance. In this paper we attempt to analyze and improve collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP. The predictions from the models were compared to the experimentally gathered data and our findings were used to optimize the implementation of collective operations in the FT-MPI library. 1 Introduction Previous studies of application usage show that the performance of collective communications are critical to high performance computing ( HPC ). Profiling study [12] showed that applications spend more than eighty percent of a transfer time in collective operations. Given this fact, it is essential for MPI implementations to provide high-performance collective operations. However, collective opera- tion performance is often overlooked when compared to the point-to-point performance. Collective * This material is based upon work supported by the Department of Energy under Contract No. DE-FG02- 02ER25536. 1 operations ( collectives ) encompass a wide range of possible algorithms, topologies, and methods. The optimal 1 implementation of a collective for a given system depends on many factors, including for example, physical topology of the system, number of processes involved, message sizes, and the location of the root node (where applicable). Furthermore, many algorithms allow explicit segmen- tation of the message that is being transmitted, in which case the performance of the algorithm also depends on the used segment size. Some collective operations involve local computation (e.g. reduction operations), in which case we also need to consider local characteristics of each node as they could affect our decision on how to overlap communication with computation. Simple, yet time consuming way to find even a semi-optimal implementation of an individual collective operation is to run an extensive set of tests over a parameter space for the collective on a dedicated system. However, running such detailed tests even on relatively small clusters (32- 64 nodes), can take a substantial amount of time [15] 2 . If one were to analyze all of the MPI collectives in a similar manner, the tuning process could take days. Still, many of current MPI implementations use “extensive” testing to determine switching points between the algorithms.implementations use “extensive” testing to determine switching points between the algorithms....
View Full Document

Page1 / 18

Performance analysis of MPI Collective Operations -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online