8 Pages

03_programming_8up

Course: CS 554, Spring 2008
School: University of Illinois,...
Rating:
 
 
 
 
 

Word Count: 5727

Document Preview

Programming Parallel Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Outline Parallel Numerical Algorithms Chapter 3 Parallel Programming 2 1 Parallel Programming Paradigms MPI Message-Passing Interface MPI Basics Communication and Communicators Virtual Process...

Register Now

Unformatted Document Excerpt

Coursehero >> Illinois >> University of Illinois, Urbana Champaign >> CS 554

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Programming Parallel Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Outline Parallel Numerical Algorithms Chapter 3 Parallel Programming 2 1 Parallel Programming Paradigms MPI Message-Passing Interface MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization OpenMP Portable Shared Memory Programming Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CSE 512 / CS 554 3 Michael T. Heath Parallel Numerical Algorithms 1 / 63 Michael T. Heath Parallel Numerical Algorithms 2 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms Functional languages Parallelizing compilers Object parallel Data parallel Shared memory Remote memory access Message passing Functional Languages Express what to compute (i.e., mathematical relationships to be satised), but not how to compute it or order in which computations are to be performed Avoid articial serialization imposed by imperative programming languages Avoid storage references, side effects, and aliasing that make parallelization difcult Permit full exploitation of any parallelism inherent in computation Michael T. Heath Parallel Numerical Algorithms 3 / 63 Michael T. Heath Parallel Numerical Algorithms 4 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Functional Languages Parallelizing Compilers Automatically parallelize programs written in conventional sequential programming languages Difcult to do for arbitrary serial code Compiler can analyze serial loops for potential parallel execution, based on careful dependence analysis of variables occurring in loop User may provide hints (directives ) to help compiler determine when loops can be parallelized and how OpenMP is standard for compiler directives Often implemented using dataow, in which operations re whenever their inputs are available, and results then become available as inputs for other operations Tend to require substantial extra overhead in work and storage, so have proven difcult to implement efciently Have not been used widely in practice, though numerous experimental functional languages and dataow systems have been developed Michael T. Heath Parallel Numerical Algorithms 5 / 63 Michael T. Heath Parallel Numerical Algorithms 6 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallelizing Compilers Automatic or semi-automatic, loop-based approach has been most successful in exploiting modest levels of concurrency on shared-memory systems Many challenges remain before effective automatic parallelization of arbitrary serial code can be routinely realized in practice, especially for massively parallel, distributed-memory systems Parallelizing compilers can produce efcient node code for hybrid architectures with SMP nodes, thereby freeing programmer to focus on exploiting parallelism across nodes Michael T. Heath Parallel Numerical Algorithms 7 / 63 Object Parallel Parallelism encapsulated within distributed objects that bind together data and functions operating on data Parallel programs built by composing component objects that communicate via well-dened interfaces and protocols Implemented using object-oriented programming languages such as C++ or Java Often based on standards and supporting environments such as CORBA, DCE, CCA Examples include Charm++ and Legion Michael T. Heath Parallel Numerical Algorithms 8 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Data Parallel Data Parallel Data parallel languages provide facilities for expressing array operations for parallel execution, and some allow user to specify data decomposition and mapping to processors High Performance Fortran (HPF) is most visible attempt to standardize data parallel approach to programming Though naturally associated with SIMD architectures, data parallel languages have also been implemented successfully on general MIMD architectures Data parallel approach can be effective for highly regular problems, but tends to be too inexible to be effective for irregular or dynamically changing problems 9 / 63 Michael T. Heath Parallel Numerical Algorithms 10 / 63 Simultaneous operations on elements of data arrays, typied by vector addition Low-level programming languages, such as Fortran 77 and C, express array operations element by element in some specied serial order Array-based languages, such as APL, Fortran 90, and MATLAB, treat arrays as higher-level objects and thus facilitate full exploitation of array parallelism Michael T. Heath Parallel Numerical Algorithms Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Shared Memory Classic shared-memory paradigm, originally developed for multitasking operating systems, focuses on control parallelism rather than data parallelism Multiple processes share common address space accessible to all, though not necessarily with uniform access time Because shared data can be changed by more than one process, access must be protected from corruption, typically by some mechanism to enforce mutual exclusion Shared memory supports common pool of tasks from which processes obtain new work as they complete previous tasks Michael T. Heath Parallel Numerical Algorithms 11 / 63 Lightweight Threads Most popular modern implementation of explicit shared-memory programming, typied by pthreads (POSIX threads) Reduce overhead for context-switching by providing multiple program counters and execution stacks so that extensive program state information need not be saved and restored when switching control quickly among threads Provide detailed, low-level control of shared-memory systems, but tend to be tedious and error prone More suitable for implementing underlying systems software (such as OpenMP and run-time support for parallelizing compilers) than for user-level applications Michael T. Heath Parallel Numerical Algorithms 12 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Shared Memory Most naturally and efciently implemented on true shared-memory architectures, such as SMPs Can also be implemented with reasonable efciency on NUMA (nonuniform memory access) shared-memory or even distributed-memory architectures, given sufcient hardware or software support With nonuniform access or distributed shared memory, efciency usually depends critically on maintaining locality in referencing data, so design methodology and programming style often closely resemble techniques for exploiting locality in distributed-memory systems Partitioned Global Address Space (PGAS) languages, such as UPC, CAF, and Titanium, address these issues Michael T. Heath Parallel Numerical Algorithms 13 / 63 Remote Memory Access One-sided, put and get communication to store data in or fetch data from memory of another process Does not require explicit cooperation between processes Must be used carefully to avoid corruption of shared data Included in MPI-2, to be discussed later Michael T. Heath Parallel Numerical Algorithms 14 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Message Passing Two-sided, send and receive communication between processes Most natural and efcient paradigm for distributed-memory systems Can also be implemented efciently in shared-memory or almost any other parallel architecture, so it is most portable paradigm for parallel programming Assembly language of parallel computing because of its universality and detailed, low-level control of parallelism Fits well with our design philosophy and offers great exibility in exploiting data locality, tolerating latency, and other performance enhancement techniques Michael T. Heath Parallel Numerical Algorithms 15 / 63 Message Passing Provides natural synchronization among processes (through blocking receives, for example), so explicit synchronization of memory access is unnecessary Facilitates debugging because accidental overwriting of memory is less likely and much easier to detect than with shared-memory Sometimes deemed tedious and low-level, but thinking about locality tends to result in programs with good performance, scalability, and portability Dominant paradigm for developing portable and scalable applications for massively parallel systems Michael T. Heath Parallel Numerical Algorithms 16 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization MPI Message-Passing Interface Provides communication among multiple concurrent processes Includes several varieties of point-to-point communication, as well as collective communication among groups of processes Implemented as library of routines callable from conventional programming languages such as Fortran, C, and C++ Has been universally adopted by developers and users of parallel systems that rely on message passing Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 17 / 63 MPI Message-Passing Interface Closely matches computational model underlying our design methodology for developing parallel algorithms and provides natural framework for implementing them Although motivated by distributed-memory systems, works effectively on almost any type of parallel system Often outperforms other paradigms because it enables and encourages attention to data locality Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 18 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Message-Passing Interface MPI-1 MPI was developed in two major stages, MPI-1 and MPI-2 Includes more than 125 functions, with many different options and protocols Small subset sufces for most practical purposes We will cover just enough to implement algorithms we will consider In some cases, performance can be enhanced by using features that we will not cover in detail Features of MPI-1 include point-to-point communication collective communication process groups and communication domains virtual process topologies environmental management and inquiry proling interface bindings for Fortran and C Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 19 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 20 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI-2 Language Bindings Additional features of MPI-2 include: dynamic process management input/output one-sided operations for remote memory access bindings for C++ We will cover very little of MPI-2, which is not essential for algorithms we will consider and parts of which are not necessarily supported well on some parallel systems MPI includes bindings for Fortran, C, and C++ We will emphasize C bindings; Fortran usage is similar C versions of most MPI routines return error code as function value, whereas Fortran versions have additional integer argument, IERROR, for this purpose Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 21 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 22 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Building and Running MPI Programs Executable module must rst be built by compiling user program and linking with MPI library One or more header les, such as mpi.h, may be required to provide necessary denitions and declarations MPI is generally used in SPMD mode, so only one executable must be built, multiple instances of which are executed concurrently Most implementations provide command, typically named mpirun, for spawning MPI processes MPI-2 species mpiexec for portability Availability of MPI Custom versions of MPI supplied by vendors of almost all current parallel computers systems Freeware versions available for clusters and similar environments include MPICH: www.mcs.anl.gov/mpi/mpich OpenMPI: www.open-mpi.org Both websites also provide tutorials on learning and using MPI MPI Forum (www.mpi-forum.org) has free versions of MPI standard (both MPI-1 and MPI-2) User selects number of processes and on which processors they will run Michael T. Heath Parallel Numerical Algorithms 23 / 63 Michael T. Heath Parallel Numerical Algorithms 24 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Groups and Communicators Every MPI process belongs to one or more groups Each process is identied by its rank within given group Rank is integer from zero to one less than size of group (MPI_PROC_NULL is rank of no process) Initially, all processes belong to MPI_COMM_WORLD Additional groups can be created by user Same process can belong to more than one group Viewed as communication domain or context, group of processes is called communicator Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 25 / 63 Specifying Messages Information necessary to specify message and identify its source or destination in MPI include msg: location in memory where message data begins count: number of data items contained in message datatype: type of data in message source or dest: rank of sending or receiving process in communicator tag: identier for specic message or kind of message comm: communicator Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 26 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Data Types Minimal MPI Minimal set of six MPI functions we will need Available MPI data types include C: char, int, float, double Fortran: integer, real, double precision int MPI_Init(int *argc, char ***argv); Called before any other MPI function int MPI_Finalize(void); Called to conclude use of MPI int MPI_Comm_size(MPI_Comm comm, int *size); On return, size contains number of processes in communicator comm Use of MPI data types facilitates heterogeneous environments in which native data types may vary from machine to machine Also supports user-dened data types for contiguous or noncontiguous data Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 27 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 28 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Minimal MPI int MPI_Comm_rank(MPI_Comm comm, int *rank); On return, rank contains rank of calling process in communicator comm, with 0 rank size-1 int MPI_Send(void *msg, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm); On return, msg can be reused immediately int MPI_Recv(void *msg, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status); On return, msg contains requested message Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 29 / 63 Example: MPI Program for 1-D Laplace Example #include <mpi.h> int main(int argc, char **argv) { int k, p, me, left, right, count = 1, tag = 1, nit = 10; float ul, ur, u = 1.0, alpha = 1.0, beta = 2.0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &p); MPI_Comm_rank(MPI_COMM_WORLD, &me); left = me-1; right = me+1; if (me == 0) ul = alpha; if (me == p-1) ur = beta; for (k = 1; k <= nit; k++) { if (me % 2 == 0) { if (me > 0) MPI_Send(&u, count, MPI_FLOAT, left, tag, MPI_COMM_WORLD); if (me < p-1) MPI_Send(&u, count, MPI_FLOAT, right, tag, MPI_COMM_WORLD); if (me < p-1) MPI_Recv(&ur, count, MPI_FLOAT, right, tag, MPI_COMM_WORLD, &status); if (me > 0) MPI_Recv(&ul, count, MPI_FLOAT, left, tag, MPI_COMM_WORLD, &status); } Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 30 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Example: MPI Program for 1-D Laplace Example Standard Send and Receive Functions Standard send and receive functions are blocking, meaning they do not return until resources specied in argument list can safely be reused In particular, MPI_Recv returns only after receive buffer contains requested message MPI_Send may be initiated before or after matching MPI_Recv initiated Depending on specic implementation of MPI, MPI_Send may return before or after matching MPI_Recv initiated else { if (me < p-1) MPI_Recv(&ur, count, MPI_FLOAT, right, tag, MPI_COMM_WORLD, &status); MPI_Recv(&ul, count, MPI_FLOAT, left, tag, MPI_COMM_WORLD, &status); MPI_Send(&u, count, MPI_FLOAT, left, tag, MPI_COMM_WORLD); if (me < p-1) MPI_Send(&u, count, MPI_FLOAT, right, tag, MPI_COMM_WORLD); } u = (ul+ur)/2.0; } MPI_Finalize(); } Michael T. Heath Parallel Numerical Algorithms 31 / 63 Michael T. Heath Parallel Numerical Algorithms 32 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Standard Send and Receive Functions Other MPI Functions MPI functions covered thus far sufce to implement almost any parallel algorithm with reasonable efciency Dozens of other MPI functions provide additional convenience, exibility, modularity, robustness, and potentially improved performance But they also introduce substantial complexity that may be difcult to manage For example, some facilitate overlapping of communication and computation, but place burden of synchronization on user 33 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 34 / 63 For same source, tag, and comm, messages are received in order in which they were sent Wild card values MPI_ANY_SOURCE and MPI_ANY_TAG can be used for source and tag, respectively, in receiving message Actual source and tag can be determined from MPI_SOURCE and MPI_TAG elds of status structure (entries of status array in Fortran, indexed by parameters of same names) returned by MPI_Recv Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Communication Modes Communication modes for sending messages buffered mode : send can be initiated whether or not matching receive has been initiated, and send may complete before matching receive is initiated synchronous mode : send can be initiated whether or not matching receive has been initiated, but send will complete only after matching receive has been initiated ready mode : send can be initiated only if matching receive has already been initiated standard mode : may behave like either buffered mode or synchronous mode, depending on specic implementation of MPI and availability of memory for buffer space Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 35 / 63 Communication Functions for Various Modes Mode standard buffered synchronous ready Blocking MPI_Send MPI_Bsend MPI_Ssend MPI_Rsend Nonblocking MPI_Isend MPI_Ibsend MPI_Issend MPI_Irsend MPI_Recv and MPI_Irecv are blocking and nonblocking functions for receiving messages, regardless of mode MPI_Buffer_attach used to provide buffer space for buffered mode Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 36 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Communication Modes Nonblocking functions include request argument used subsequently to determine whether requested operation has completed Nonblocking is different from asynchronous MPI_Wait and MPI_Test wait or test for completion of nonblocking communication MPI_Probe and MPI_Iprobe probe for incoming message without actually receiving it Information about message determined by probing can be used to decide how to receive it MPI_Cancel cancels outstanding message request, useful for cleanup at end of program or after major phase of computation Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 37 / 63 Standard Mode Standard mode does not specify whether messages are buffered Buffering allows more exible programming, but requires additional time and memory for copying messages to and from buffers Given implementation of MPI may or may not use buffering for standard mode, or may use buffering only for messages within certain range of sizes To avoid potential deadlock when using standard mode, portability demands conservative assumptions concerning order in which sends and receives are initiated User can exert explicit control by using buffered or synchronous mode Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 38 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Persistent Communication Communication operations that are executed repeatedly with same argument list can be streamlined Persistent communication binds argument list to request, and then request can be used repeatedly to initiate and complete message transmissions without repeating argument list each time Once argument list has been bound using MPI_Send_init or MPI_Recv_init (or similarly for other modes), then request can subsequently be initiated repeatedly using MPI_Start Collective Communication MPI_Bcast MPI_Reduce MPI_Allreduce MPI_Alltoall MPI_Allgather MPI_Scatter MPI_Gather MPI_Scan MPI_Barrier Michael T. Heath Parallel Numerical Algorithms 39 / 63 Michael T. Heath Parallel Numerical Algorithms 40 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Manipulating Communicators Virtual Process Topologies MPI provides virtual process topologies corresponding to regular Cartesian grids or more general graphs MPI_Comm_create MPI_Comm_dup MPI_Comm_split MPI_Comm_compare MPI_Comm_free Virtual process topology can facilitate some applications, such as 2-D grid topology for matrices or 2-D or 3-D grid topology for discretized PDEs Virtual process topology may also help MPI achieve more efcient mapping of processes to given physical network Topology is optional additional attribute that can be given to communicator Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 41 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 42 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Virtual Process Topologies MPI_Graph_create creates general graph topology, based on number of nodes, node degrees, and graph edges specied by user MPI_Cart_create creates Cartesian topology based on number of dimensions, number of processes in each dimension, and whether each dimension is periodic (i.e., wraps around), as specied by user Hypercubes are included, since k-dimensional hypercube is simply k-dimensional torus with two processes per dimension MPI_Cart_shift provides shift of given displacement along any given dimension of Cartesian topology Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 43 / 63 Virtual Process Topologies MPI_Topo_test determines what type of topology, if any, has been dened for given communicator Inquiry functions provide information about existing graph topology, such as number of nodes and edges, lists of degrees and edges, number of neighbors of given node, or list of neighbors of given node Inquiry functions obtain information about existing Cartesian topology, such as number of dimensions, number of processes for each dimension, and whether each dimension is periodic Functions also available to obtain coordinates of given process, or rank of process with given coordinates Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 44 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Timing MPI Programs MPI_Wtime returns elapsed wall-clock time in seconds since some arbitrary point in past Elapsed time for program segment given by difference between MPI_Wtime values at beginning and end Process clocks are not necessarily synchronized, so clock values are not necessarily comparable across processes, and care must be taken in determining overall running time for parallel program (see MPI_WTIME_IS_GLOBAL) MPI_Wtick returns resolution of MPI_Wtime clock MPI Proling Interface MPI provides proling interface via name shift for each function For each MPI function with prex MPI_ , alternative function with prex PMPI_ provides identical functionality Proling library can intercept calls to each standard MPI function name, record appropriate data, then call equivalent alternate function to perform requested action MPI_Pcontrol provides proling control if proling is in use, but otherwise does nothing, so same code can run with or without proling, depending on whether executable module is linked with proling or nonproling MPI libraries Data collected by proling can be used for performance analysis or visualization 45 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 46 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPICL Tracing Facility MPICL is instrumentation package for MPI based on PICL tracing facility and MPI proling interface www.epm.ornl.gov/picl In addition to tracing MPI events, MPICL also provides tracing of user-dened events Resulting trace le contains one event record per line, with each event record containing numerical data specifying event type, timestamp, process rank, message length, etc Trace data output format is suitable for input to ParaGraph visualization tool Michael T. Heath Parallel Numerical Algorithms 47 / 63 MPICL Tracing Commands tracefiles(char *tempfile, char *permfile, int verbose); tracelevel(int mpicl, int user, int trace); tracenode(int tracesize, int flush, int sync); traceevent(char *eventtype, int eventid, int nparams, int *params); Michael T. Heath Parallel Numerical Algorithms 48 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization Example: MPICL Tracing MPICL Pcontrol Interface #include <mpi.h> void main(int argc, char **argv) { int k, myid, nprocs, ntasks, work; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); if (myid == 0) tracefiles("", "tracefile", 0); tracelevel(1, 1, 0); tracenode(100000, 0, 1); for (k = 0; k < ntasks; k++) { traceevent("entry", k, 0); {code for task k, including MPI calls} work = work done in task k; traceevent("exit", k, 1, &work); } MPI_Finalize(); } All MPICL tracing facilities can also be accessed through standard MPI_Pcontrol function Value of level argument to MPI_Pcontrol determines which MPICL tracing command is invoked, as specied in header le pcontrol.h Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 49 / 63 Michael T. Heath Parallel Numerical Algorithms MPI Basics Communication and Communicators Virtual Process Topologies Performance Monitoring and Visualization 50 / 63 Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programmin...

Textbooks related to the document above:
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

University of Illinois, Urbana Champaign - CS - 554
Motivation Architectures Networks CommunicationParallel Numerical AlgorithmsChapter 1 Parallel ComputingProf. Michael T. HeathDepartment of Computer Science University of Illinois at Urbana-ChampaignCSE 512 / CS 554Michael T. HeathParall
Syracuse - CSE - 687
= CONSOLE APPLICATION : A2.1 Project Overview=AppWizard has created this A2.1 application for you. This file contains a summary of what you will find in each of the files thatmake up your A2.1 application.A2.1.vcproj This is the main
Syracuse - CSE - 687
= CONSOLE APPLICATION : B2.6 Project Overview=AppWizard has created this B2.6 application for you. This file contains a summary of what you will find in each of the files thatmake up your B2.6 application.B2.6.vcproj This is the main
Syracuse - CSE - 687
firstsecondthird and considerably longer linefourth and last
Syracuse - CSE - 687
= CONSOLE APPLICATION : A1.2 Project Overview=AppWizard has created this A1.2 application for you. This file contains a summary of what you will find in each of the files thatmake up your A1.2 application.A1.2.vcproj This is the main
Syracuse - CSE - 687
= CONSOLE APPLICATION : A2.4 Project Overview=AppWizard has created this A2.4 application for you. This file contains a summary of what you will find in each of the files thatmake up your A2.4 application.A2.4.vcproj This is the main
Syracuse - CSE - 687
= CONSOLE APPLICATION : A2.10 Project Overview=AppWizard has created this A2.10 application for you. This file contains a summary of what you will find in each of the files thatmake up your A2.10 application.A2.10.vcproj This is the
CUNY Queens - AD - 60489
Techniques de comptabilit et de gestion - 410.B0TECHNIQUES DE COMPTABILIT ET DE GESTIONGestion des entreprises de mode410.BOSANCTION DES TUDES : Diplme d'tudes collgiales (DEC) avec mention Techniques de comptabilit et de gestion NOMBRE D'UNIT
Syracuse - CSE - 687
= WIN32 APPLICATION : B2.10 Project Overview=AppWizard has created this B2.10 application for you. This file contains a summary of what you will find in each of the files thatmake up your B2.10 application.B2.10.vcproj This is the ma
East Los Angeles College - ASH - 13140
PROGRAMME DETAIL SPECIFICATION Programme Summary1 Awarding institution 2 Teaching institution university 3a Programme accredited by: 3b Description of accreditation 4 Final award 5 Programme title 6 UCAS code 7 Subject benchmark statementLiverpoo
La Sierra - M - 324
Math 324, Test 2, November 10, 2004 Hints and Answers Instructions. Do 10 of the following 11 questions. The last three questions are take-home questions on which you may use MathCAD. 3 -1 -1 2 -1 . Find the eigenvalues of A. Find bases for 1. (a) C
Syracuse - CSE - 687
University of Texas - PHY - 302
Wave Nature of LightQuestion:Is a beam of light a stream of particles or is it a wave ? Light must be a wave !c L.Frommhold. p.32/32Historically, diffraction was observed rst:(Classical) Particles do not show diffractionHuygens Principl
East Los Angeles College - HEA - 11210
PROGRAMME DETAIL SPECIFICATION Programme Summary1 Awarding institution 2 Teaching institution university 3a Programme accredited by: 3b Description of accreditation 4 Final award 5 Programme title 6 UCAS code 7 Subject benchmark statement 8 Educati
Mesa State - PHYS - 321
Phys 321 Fall 2007Quantum Theory I: Homework 13Due: 5 December 20071 Quantum Theory I: Problems, problem VIII.4. 2 Quantum Theory I: Problems, problem VIII.5. 3 Quantum Theory I: Problems, problem VIII.6.1
Michigan - PHYSICS - 305
Physics 305Homework #104/20/98Krane: Question 12-41.a)The two stable nuclei in question are 3He which has two protons and only one neutron, and 1H which has no neutrons and one proton.b)The Coulombic repulsion increases as Z2, so havi
Mesa State - PHYS - 321
The process of exponentiating operators satisfies several convenient relationships which are comparable to those satisfied by the exponential of numbers. Theorem: Exponentiation satisfies eA e A = e(+)A eA^ ^ ^ ^(II.4.32a) (II.4.32b) (II.4.32c)
Mesa State - PHYS - 321
4.7Generating rotationsFor two rotations about the same axis, the definition of the rotation operator via Eq. (II.4.18) implies that ^ ^ ^ R(2 n)R(1 n) = R(2 + 1 )n) (II.4.24) where 1 and 2 are any angles. One way to construct any rotation is as
Mesa State - PHYS - 321
Measurement prior to evolution. |i SG x Sx = /2 Measurement after evolution. B |i |f SG x Sx = /2 Figure II.4.5: Quantum mechanical measurements on a magnetic dipole, initially in the state |i . In one case the measurement is made prior to evolut
Mesa State - PHYS - 321
Regions where the position probability density is low correspond to regions for which position measurements are not likely or, where the probability of nding the particle is low.1.3Position and momentum observablesAs with spin-1/2 and other dis
University of Texas - COLA - 079
The Program in Islamic Studies presents The 2007-2008 Islamic Studies Lecture SeriesPopular Representations of Gender in the Age of Modern Islamic Reform Dr. Farina Mir Department of History, University of MichiganThe late nineteenth and early tw
Mesa State - PHYS - 132
Phys 132L Spring 2009Laboratory 10: Currents and Magnetic ForcesA general feature of electromagnetism is that currents produce magnetic fields. This magnetic field can, in principle, be determined by applying the Biot-Savart law. In highly symmetr
University of Texas - I - 397
The Language of Qualitative ResearchMary Lynn Rice-Lively, PhD September 27, 2005Slides available at http:/www.ischool.utexas.edu/~marylynn/lectures/qual-res.pdfmarylynn@ischool.utexas.eduObjectives Define qualitative research Use and explore
La Sierra - M - 233
Math 233: Test II, Spring 2006 Multiple Integrals Name: Instructions. Do question 1 and any 4 of questions 2 through 6. Please justify all answers.3 2x1. (15 pts) (a) Sketch the region of integration for0 0f (x, y) dy dx.(b) What is the valu
Arizona - AZ - 1428
Mandarin Selection Trials in Arizona 2005-061Glenn C. Wright Department of Plant Sciences, U. of A., Yuma Mesa Agriculture Center, Yuma, AZAbstractSecond year yield and packout data from a trial containing `Fina', `Fina Sodea', `Sidi Aissa', `Or
Arizona - AZ - 1442
Alfalfa Variety Performance at Tucson, 2005-2006M. J. Ottman and S. E. SmithIntroductionNew alfalfa varieties are constantly being introduced into the marketplace. The number of varieties available for low-elevation desert areas in Arizona in the
Arizona - AZ - 1442
Survey of Durum Production Practices, 2005M. J. OttmanSummaryDurum growers were surveyed in cooperation with the USDAs National Agricultural Statistics Service to determine production practices and their effects on yield and protein in the 2005 g
Arizona - AZ - 1377
Small Grains Variety Evaluation at Arizona City, Coolidge, Maricopa, and Yuma, 2005M. J. OttmanSummarySmall grain varieties are evaluated each year by University of Arizona personnel. The purpose of these tests is to characterize varieties in ter
East Los Angeles College - LG - 611
%!PS-Adobe-2.0 %Creator: dvips(k) 5.95a Copyright 2005 Radical Eye Software %Title: handout.dvi %Pages: 1 %PageOrder: Ascend %BoundingBox: 0 0 595 842 %DocumentFonts: CMR12 CMR10 CMBX10 CMTT10 CMBX12 CMTI10 %DocumentPaperSizes: a4 %EndComments %DVIPS
East Los Angeles College - LG - 611
%!PS-Adobe-2.0 %Creator: dvips(k) 5.95a Copyright 2005 Radical Eye Software %Title: handout.dvi %Pages: 5 %PageOrder: Ascend %BoundingBox: 0 0 595 842 %DocumentFonts: CMR12 CMR10 CMBX12 CMTT10 CMSY10 CMBX10 CMR7 CMTI10 %+ CMR6 CMR8 %DocumentPaperSize
East Los Angeles College - LG - 611
%!PS-Adobe-2.0 %Creator: dvips(k) 5.95a Copyright 2005 Radical Eye Software %Title: handout.dvi %Pages: 4 %PageOrder: Ascend %BoundingBox: 0 0 595 842 %DocumentFonts: CMR12 CMR10 CMBX12 CMSY10 CMBX10 CMTI10 CMCSC10 CMTT10 %+ CMEX10 CMTI7 %DocumentPap