92 Pages

lecture17

Course: ECE 451, Fall 2008
School: Rutgers
Rating:
 
 
 
 
 

Word Count: 5663

Document Preview

8 Programming Chapter with Shared Memory 1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Shared memory multiprocessor system Any memory location can be accessible by any of the processors. A single address space exists, meaning that each...

Register Now

Unformatted Document Excerpt

Coursehero >> New Jersey >> Rutgers >> ECE 451

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
8 Programming Chapter with Shared Memory 1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Shared memory multiprocessor system Any memory location can be accessible by any of the processors. A single address space exists, meaning that each memory location is given a unique address within a single range of addresses. Generally, shared memory programming more convenient although it does require access to shared data to be controlled by the programmer (using critical sections etc.) 2 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Shared memory multiprocessor using a single bus 3 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Shared memory multiprocessor using a crossbar switch 4 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Alternatives for Programming Shared Memory Multiprocessors: Using heavy weight processes. Using threads. Example Pthreads Using a completely new programming language for parallel programming - not popular. Example Ada. Using library routines with an existing sequential programming language. Modifying the syntax of an existing sequential programming language to create a parallel programing language. Example UPC Using an existing sequential programming language supplemented with compiler directives for specifying parallelism. Example OpenMP 5 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Using Heavyweight Processes Operating systems often based upon notion of a process. Processor time shares between processes, switching from one process to another. Might occur at regular intervals or when an active process becomes delayed. Offers opportunity to deschedule processes blocked from proceeding for some reason, e.g. waiting for an I/O operation to complete. Concept could be used for parallel programming. Not much used because of overhead but fork/join concepts used elsewhere. 6 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. FORK-JOIN construct 7 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. UNIX System Calls 8 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. UNIX System Calls SPMD model with different code for master process and forked slave process. 9 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Differences between a process and threads 10 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Pthreads IEEE Portable Operating System Interface, POSIX, sec. 1003.1 standard 11 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Detached Threads It may be that thread are not bothered when a thread it creates terminates and then a join not needed. Threads not joined are called detached threads. When detached threads terminate, they are destroyed and their resource released. 12 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Pthreads Detached Threads 13 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Statement Execution Order Single processor: Processes/threads typically executed until blocked. Multiprocessor: Instructions of processes/threads interleaved in time. Example Process 1 Process 2 Instruction 1.1 Instruction 2.1 Instruction 1.2 Instruction 2.2 Instruction 1.3 Instruction 2.3 Several possible orderings, including Instruction 1.1 Instruction 1.2 Instruction 2.1 Instruction 1.3 Instruction 2.2 Instruction 2.3 assuming instructions cannot be divided into smaller steps. 14 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. If two processes were to print messages, for example, the messages could appear in different orders depending upon the scheduling of processes calling the print routine. Worse, the individual characters of each message could be interleaved if the machine instructions of instances of the print routine could be interleaved. 15 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Compiler/Processor Optimizations Compiler and processor reorder instructions for optimization. Example The statements a = b + 5; x = y + 4; could be compiled to execute in reverse order: x = y + 4; a = b + 5; and still be logically correct. May be advantageous to delay statement a = b + 5 because a previous instruction currently being executed in processor needs more time to produce the value for b. Very common for processors to execute machines instructions out of order for increased speed . 16 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Thread-Safe Routines Thread safe if they can be called from multiple threads simultaneously and always produce correct results. Standard I/O thread safe (prints messages without interleaving the characters). System routines that return time may not be thread safe. Routines that access shared data may require special care to be made thread safe. 17 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Accessing Shared Data Accessing shared data needs careful control. Consider two processes each of which is to add one to a shared data item, x. Necessary for the contents of the location x to be read, x + 1 computed, and the result written back to the location: 18 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Conflict in accessing shared variable 19 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Critical Section A mechanism for ensuring that only one process accesses a particular resource at a time is to establish sections of code involving the resource as so-called critical sections and arrange that only one such critical section is executed at a time This mechanism is known as mutual exclusion. This concept also appears in an operating systems. 20 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Locks Simplest mechanism for ensuring mutual exclusion of critical sections. A lock is a 1-bit variable that is a 1 to indicate that a process has entered the critical section and a 0 to indicate that no process is in the critical section. Operates much like that of a door lock: A process coming to the "door" of a critical section and finding it open may enter the critical section, locking the door behind it to prevent other processes from entering. Once the process has finished the critical section, it unlocks the door and leaves. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 21 Control of critical sections through busy waiting 22 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Pthread Lock Routines Locks are implemented in Pthreads with mutually exclusive lock variables, or "mutex" variables: . pthread_mutex_lock(&mutex1); critical section pthread_mutex_unlock(&mutex1); . If a thread reaches a mutex lock and finds it locked, it will wait for the lock to open. If more than one thread is waiting for the lock to open when it opens, the system will select one thread to be allowed to proceed. Only the thread that locks a mutex can unlock it. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 23 Deadlock Can occur with two processes when one requires a resource held by the other, and this process requires a resource held by the first process. 24 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Deadlock (deadly embrace) Deadlock can also occur in a circular fashion with several processes having a resource wanted by another. 25 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Pthreads Offers one routine that can test whether a lock is actually closed without blocking the thread: pthread_mutex_trylock() Will lock an unlocked mutex and return 0 or will return with EBUSY if the mutex is already locked might find a use in overcoming deadlock. 26 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Semaphores A positive integer (including zero) operated upon by two operations: P operation on semaphore s Waits until s is greater than zero and then decrements s by one and allows the process to continue. V operation on semaphore s Increments s by one and releases one of the waiting processes (if any). 27 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. P and V operations are performed indivisibly. Mechanism for activating waiting processes is also implicit in P and V operations. Though exact algorithm not specified, algorithm expected to be fair. Processes delayed by P(s) are kept in abeyance until released by a V(s) on the same semaphore. Devised by Dijkstra in 1968. Letter P is from the Dutch word passeren, meaning "to pass," and letter V is from the Dutch word vrijgeven, meaning "to release.") 28 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Mutual exclusion of critical sections can be achieved with one semaphore having the value 0 or 1 (a binary semaphore), which acts as a lock variable, but the P and V operations include a process scheduling mechanism: Process 1 Noncritical section . . . P(s) Critical section V(s) . . . Noncritical section Process 2 Noncritical section . . . P(s) Critical section V(s) . . . Noncritical section Process 3 Noncritical section . . . P(s) Critical section V(s) . . . Noncritical section 29 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. General semaphore (or counting semaphore) Can take on positive values other than zero and one. Provide, for example, a means of recording the number of "resource units" available or used and can be used to solve producer/ consumer problems. - more on that in operating system courses. Semaphore routines exist for UNIX processes. Not exist in Pthreads as such, though they can be written Do exist in real-time extension to Pthreads. 30 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Monitor Suite of procedures that provides only way to access shared resource. Only one process can use a monitor procedure at any instant. Could be implemented using a semaphore or lock to protect entry, i.e., monitor_proc1() { lock(x); . monitor body . unlock(x); return; } Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 31 Condition Variables Often, a critical section is to be executed if a specific global condition exists; for example, if a certain value of a variable has been reached. With locks, the global variable would need to be examined at frequent intervals ("polled") within a critical section. Very time-consuming and unproductive exercise. Can be overcome by introducing so-called condition variables. 32 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Pthread Condition Variables Pthreads arrangement for signal and wait: Signals not remembered - threads must already be waiting for a signal to receive it. 33 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Language Constructs for Parallelism Shared Data Shared memory variables might be declared as shared with, say, shared int x; 34 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. par Construct For specifying concurrent statements: par { S1; S2; . . Sn; } 35 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. forall Construct To start multiple similar processes together: forall (i = 0; i < n; i++) { S1; S2; . . Sm; } which generates n processes each consisting of the statements forming the body of the for loop, S1, S2, ..., Sm. Each process uses a different value of i. 36 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Example forall (i = 0; i < 5; i++) a[i] = 0; clears a[0], a[1], a[2], a[3], and a[4] to zero concurrently. 37 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Dependency Analysis To identify which processes could be executed together. Example Can see immediately in the code forall (i = 0; i < 5; i++) a[i] = 0; that every instance of the body is independent of other instances and all instances can be executed simultaneously. However, it may not be that obvious. Need algorithmic way of recognizing dependencies, for a parallelizing compiler. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 38 39 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 40 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. OpenMP An accepted standard developed in the late 1990s by a group of industry specialists. Consists of a small set of compiler directives, augmented with a small set of library routines and environment variables using the base language Fortran and C/C++. The compiler directives can specify such things as the par and forall operations described previously. Several OpenMP compilers available. 41 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. For C/C++, the OpenMP directives are contained in #pragma statements. The OpenMP #pragma statements have the format: #pragma omp directive_name ... where omp is an OpenMP keyword. May be additional parameters (clauses) after the directive name for different options. Some directives require code to specified in a structured block (a statement or statements) that follows the directive and then the directive and structured block form a "construct". Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 42 OpenMP uses "fork-join" model but thread-based. Initially, a single thread is executed by a master thread. Parallel regions (sections of code) can be executed by multiple threads (a team of threads). parallel directive creates a team of threads with a specified block of code executed by the multiple threads in parallel. The exact number of threads in the team determined by one of several ways. Other directives used within a parallel construct to specify parallel for loops and different blocks of code for threads. 43 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Parallel Directive #pragma omp parallel structured_block creates multiple threads, each one executing the specified structured_block, either a single statement or a compound statement created with { ...} with a single entry point and a single exit point. There is an implicit barrier at the end of the construct. The directive corresponds to forall construct. 44 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 45 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Number of threads in a team Established by either: 1. num_threads clause after the parallel directive, 2. or omp_set_num_threads() library routine being previously called, or 3. the environment variable OMP_NUM_THREADS is defined in the order given or is system dependent if none of the above. Number of threads available can also be altered automatically to achieve best use of system resources by a "dynamic adjustment" mechanism. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 46 Work-Sharing Three constructs in this classification: sections for single In all cases, there is an implicit barrier at the end of the construct unless a nowait clause is included. Note that these constructs do not start a new team of threads. That done by an enclosing parallel construct. 47 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Sections The construct #pragma omp sections { #pragma omp section structured_block #pragma omp section structured_block . . . } cause the structured blocks to be shared among threads in team. #pragma omp sections precedes the set of structured blocks. #pragma omp section prefixes each structured block. The first section directive is optional. Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 48 For Loop #pragma omp for for_loop causes the for loop to be divided into parts and parts shared among threads in the team. The for loop must be of a simple form. Way that for loop divided can "schedule" clause. Example: chunk_size) cause the for loop by chunk_size and allocated fashion. be specified by an additional the clause schedule (static, be divided into sizes specified to threads in a round robin 49 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Single The directive #pragma omp single structured block cause the structured block to be executed by one thread only. 50 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Combined Parallel Work-sharing Constructs If a parallel directive is followed by a single for directive, it can be combined into: #pragma omp parallel for for_loop with similar effects, i.e. it has the effect of each thread executing the same for loop. 51 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 52 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Master Directive The master directive: #pragma omp master structured_block causes the master thread to execute the structured block. Different to those in the work sharing group in that there is no implied barrier at the end of the construct (nor the beginning). Other threads encountering this directive will ignore it and the associated structured block, and will move on. 53 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Synchronization Constructs Critical The critical directive will only allow one thread execute the associated structured block. When one or more threads reach the critical directive: #pragma omp critical name structured_block they will wait until no other thread is executing the same critical section (one with the same name), and then one thread will proceed to execute the structured block. name is optional. All critical sections with no name map to one undefined name. 54 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Barrier When a thread reaches the barrier #pragma omp barrier it waits until all threads have reached the barrier and then they all proceed together. There are restrictions on the placement of barrier directive in a program. In particular, all threads must be able to reach the barrier. 55 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Atomic The atomic directive #pragma omp atomic expression_statement implements a critical section efficiently when the critical section simply updates a variable (adds one, subtracts one, or does some other simple arithmetic operation as defined by expression_statement). 56 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Flush A synchronization point which causes thread to have a "consistent" view of certain or all shared variables in memory. All current read and write operations on the variables allowed to complete and values written back to memory but any memory operations in the code after flush are not started, thereby creating a "memory fence". Format: #pragma omp flush (variable_list) Only applied to thread executing flush, not to all threads in the team. Flush occurs automatically at the entry and exit of parallel and critical directives (and combined parallel for and parallel sections directives), and at the exit of for, sections, and single (if a no-wait clause is not present). 57 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Ordered Used in conjunction with for and parallel for directives to cause an iteration to be executed in the order that it would have occurred if written as a sequential loop. See Appendix C of textbook for further details. 58 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Shared Memory Programming Performance Issues 59 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Shared Data in Systems with Caches All modern computer systems have cache memory, highspeed memory closely attached to each processor for holding recently referenced data and code. Cache coherence protocols Update policy - copies of data in all caches are updated at the time one copy is altered. Invalidate policy - when one copy of data is altered, the same data in any other cache is invalidated (by resetting a valid bit in the cache). These copies are only updated when the associated processor makes reference for it. 60 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. False Sharing Different parts of block required by different processors but not same bytes. If one processor writes to one part of the block, copies of the complete block in other caches must be updated or invalidated though the actual data is not shared. 61 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Solution for False Sharing Compiler to alter the layout of the data stored in the main memory, separating data only altered by one processor into different blocks. 62 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Critical Sections Serializing Code High performance programs should have as few as possible critical sections as their use can serialize the code. Suppose, all processes happen to come to their critical section together. They will execute their critical sections one after the other. In that situation, the execution time becomes almost that of a single processor. 63 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Illustration 64 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Sequential Consistency Formally defined by Lamport (1979): A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processors occur in this sequence in the order specified by its program. i.e. the overall effect of a parallel program is not changed by any arbitrary interleaving of instruction execution in time. 65 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Sequential Consistency 66 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Writing a parallel program for a system which is known to be sequentially consistent enables us to reason about the result of the program. Example Process P1 . data = new; flag = TRUE; . . . . Process 2 . . . . while (flag != TRUE) { }; data_copy = data; . Expect data_copy to be set to new because we expect the statement data = new to be executed before flag = TRUE and the statement while (flag != TRUE) { } to be executed before data_copy = data. Ensures that process 2 reads new data from another process 1. Process 2 will simple wait for the new data to be produced. 67 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Program Order Sequential consistency refers to "operations of each individual processor .. occur in the order specified in its program" or program order. In previous figure, this order is that of the stored machine instructions to be executed. 68 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Compiler Optimizations The order is not necessarily the same as the order of the corresponding high level statements in the source program as a compiler may reorder statements for improved performance. In this case, the term program order will depend upon context, either the order in the source program or the order in the compiled machine instructions. 69 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. High Performance Processors Modern processors usually reorder machine instructions internally during execution for increased performance. This does not alter a multiprocessor being sequential consistency, if the processor only produces the final results in program order (that is, retires values to registers in program order which most proc...

Textbooks related to the document above:
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Rutgers - ECE - 451
IBM Systems &amp; Technology Group Cell/Quasar Ecosystem &amp; Solutions EnablementDeveloping Code for Cell DMA &amp; MailboxesCell Programming Workshop Cell/Quasar Ecosystem Solutions Enablement1Cell Programming Workshop6/12/2008 2007 IBM Corporati
Rutgers - ECE - 451
IBM Systems &amp; Technology Group Cell/Quasar Ecosystem &amp; Solutions EnablementDeveloping Code for Cell - SIMDCell Programming Workshop Cell/Quasar Ecosystem Solutions Enablement1Cell Programming Workshop6/12/2008 2007 IBM CorporationIBM Sy
Rutgers - ECE - 451
IBM Systems &amp; Technology Group Cell/Quasar Ecosystem &amp; Solutions EnablementCell BE Multicore Development and Code Porting StepsCell Programming Workshop Cell/Quasar Ecosystem &amp; Solutions Enablement1Cell Programming Workshop6/12/2008 2007
Rutgers - ECE - 451
IBM Systems &amp; Technology Group Cell/Quasar Ecosystem &amp; Solutions EnablementSPE Software Module OverlayCell Programming Workshop Cell/Quasar Ecosystem &amp; Solutions Enablement1Cell Programming Workshop6/12/2008 2007 IBM CorporationIBM Syst
Rutgers - ECE - 572
A Cylindrical, Gossip Based Message Passing SystemScott Winter Department of Electrical And Computer Engineering Rutgers, The State University of New Jersey Winter37@caip.rutgers.eduAbstractData Communication within a system can be accomplished i
Rutgers - ECE - 572
ECE-572 Parallel and Distributed Computing Project ReportLocation Privacy in Sensor Network Routing Celal Ozturk April 21, 2004Abstract Wireless sensor networks are currently being investigated for ubiquitous computing applications. Privacy is bein
Rutgers - ECE - 572
Mobile ComputingXiaolin LiECE572 Parallel &amp; Distributed ComputingDept. of ECE Rutgers UniversityA Moving TargetInternet hosts and devices are increasingly mobileChanging physical media or attachment points often requires changing IP address
Rutgers - ECE - 572
Distributed Computing: SynchronizationManish Parashar parashar@ece.rutgers.edu Department of Electrical &amp; Computer Engineering Rutgers UniversityClock SynchronizationWhen each machine has its own clock, an event that occurred after another event
Rutgers - ECE - 572
ECE-572 (Advanced) Parallel and Distributed ComputingLecture 1: IntroductionManish Parashar parashar@ece.rutgers.edu Department of Electrical &amp; Computer Engineering Rutgers UniversityObjectivesThe objective of this course is to study the theor
Rutgers - ECE - 572
Distributed Computing: CommunicationManish Parashar parashar@ece.rutgers.edu Department of Electrical &amp; Computer Engineering Rutgers UniversityLayered Protocols (1)2-1Layers, interfaces, and protocols in the OSI model.Layered Protocols (2)
Rutgers - ECE - 572
Data Management for Grid EnvironmentsHeinz StockingerCERN, Switzerland heinz.stockinger@cern.chOmer F. RanaCardi University, UK o.f.rana@cs.cf.ac.ukReagan MooreSan Diego SuperComputer Centre, USA moore@sdsc.eduAndre MerzkyKonrad Zuse Zentr
Rutgers - ECE - 572
C O N T E X T- AWA R E C O M P U T I N GReconfigurable ContextSensitive Middleware for Pervasive ComputingContext-sensitive applications need data from sensors, devices, and user actions, and might need ad hoc communication support to dynamically
Rutgers - ECE - 572
Wireless Networks 1 (2001) 1161The Anatomy of a Context-Aware ApplicationAndy Harter , Andy Hopper, Pete Steggles , Andy Ward , Paul WebsterAT&amp;T Laboratories Cambridge, 24a Trumpington Street, Cambridge CB2 1QA, United Kingdom E-mail: ach,a
Rutgers - ECE - 333
332:333 Computer Architecture and Assembly Language Labhttp:/www.ece.rutgers.edu/~yyzhang/spring06This lab class is intended to train the students on both assembly language programming and VHDL design. The students who are taking this class should
Rutgers - ECE - 333
332:333 Computer Architecture and Assembly Language Labhttp:/www.ece.rutgers.edu/~yyzhang/spring05This lab class is intended to train the students on both assembly language programming and VHDL design. The students who are taking this class should
Rutgers - CS - 111
1Programming Fundamentals 50:198:111 (Spring 2009)Homework: Due Date: Oce: 2 2/19/09 321 BSB Professor: E-mail: URL: Phone: Suneeta Ramaswami rsuneeta@camden.rutgers.edu http:/crab.rutgers.edu/~rsuneeta (856)-225-6439Homework Assignment 2The as
Rutgers - LTHOMAS - 2
HONORS SEMINAR 50-525-112-02 Race and Ethnicity in the Americas: A Comparative History Thursday, 1:30-4:10 205 Robeson LibraryDr. Lorrin Thomas office: 317 Armitage Hall office phone: 225-2656 email: lthomas2@camden.rutgers.edu Office hours: Mon. 9
Rutgers - TAP - 1
&lt;!DOCTYPE HTML PUBLIC &quot;-/IETF/DTD HTML 2.0/EN&quot;&gt;&lt;HTML&gt;&lt;HEAD&gt;&lt;TITLE&gt;300 Multiple Choices&lt;/TITLE&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;H1&gt;Multiple Choices&lt;/H1&gt;The document name you requested (&lt;code&gt;/~guyk/pub/tap1/answer.txt&lt;/code&gt;) could not be found on this server.Howe
Rutgers - TAP - 1
A 1.8 approximation algorithm for augmenting edge-connectivity of a graph from 1 to 2 Guy Even Jon Feldman Guy Kortsarz Zeev NutovMay 31, 2008Abstract We present a 1.8-approximation algorithm for the following NP-hard problem: given a connected
Rutgers - MS - 200
http:/oceanusmag.whoi.edu/v42n2/mktivey.htmlThe Remarkable Diversity of Seafloor VentsExplorations reveal an increasing variety of hydrothermal ventsBy Margaret Kingston Tivey, Associate Scientist Marine Chemistry &amp; Geochemistry Department Woods
Rutgers - MS - 552
Remote Sensing:John Wilkin wilkin@marine.rutgers.edu IMCS Building Room 211C 732-932-6555 ext 251Active microwave systems (3) Scatterometers, SAR and CODARScatterometers satellite borne ocean surface vectors winds Incorporated into ECMWF me
Rutgers - MS - 24
Mole_Oce Lecture # 24:Introduction to genomicsDEFINITION:Genomics: the study of genomes or he study of genes and their function. Genomics (1980s):The systematic generation of information about genes and genomes Functional genomics:The systematic
Rutgers - MS - 309
Mole_Oce Lecture # 24:Introduction to genomicsDEFINITION:Genomics: the study of genomes or he study of genes and their function. Genomics (1980s):The systematic generation of information about genes and genomes Functional genomics:The systematic
Rutgers - MS - 309
news featureAll at seaThe oceans are full of microorganisms, which are thought to cycle nutrients and mediate climate on a global scale. Despite these environmental consequences, marine microbial biodiversity remains poorly understood. Jon Copley
Rutgers - MS - 309
THE PRIMARY PRODUCERSMarine versus TerrestrialMarineSunlight and nutrients In top 100 m or so Small, single-cells Supported by water Productivity grazed quickly, moves up food web Rapid transfer and recycling Not readily apparentTerrestrialWa
Rutgers - MS - 200
01/11:628:200 Marine Scienceshttp:/marine.rutgers.edu/dmcs/ms200Introduction This introductory course provides students with an overview of the contributions to marine science of the disciplines of physical oceanography, geology, chemistry, and b
Rutgers - MS - 615
8519483 20041009 00:00 2.95 2.948519483 20041009 01:00 2.25 2.178519483 20041009 02:00 1.62 1.468519483 20041009 03:00 1.14 0.918519483 20041009 04:00 1.07 0.808519483 20041009 05:00 1.61 1.458519483 20041009
Rutgers - WEEK - 4
Rutgers - WEEK - 1
R-Process Nucleosynthesis in SupernovaeThe heaviest elements are made only in cataclysmic events. Finding out whether supernovae are cataclysmic enough requires extensive astronomical observation and sophisticated computer modeling. John J. Cowan an
Rutgers - HISTEARTHS - 2006
Why size mattersPhysiological rates Community composition Evolutionary patternsOutline1. 2. 3. 4. What is allometry? Why should you care? Physiological bases of size scaling of metabolism Ecological consequences of size-dependent physiology 5. S
Rutgers - HISTEARTHS - 2006
Blackbody radiation: Monochromatic irradiance of radiation emitted by a blackbody at (absolute) temperature T is given by:C1 E =Weins Displacement Law5 e C2 T 1 WhenC2 = 3.74 x 10-16Wm2 C2 = 1.44 x 10-2 mokBlackbody radiatio
Rutgers - HISTEARTHS - 2006
Rutgers - HISTEARTHS - 2006
Period Late Triassic Early Jurassic Late Jurassic Early Cretaceous. Late Cretaceous.Classic Dinosaur Locality Petrified Forest, St. Johns, Arizona. Ghost Ranch, New Mexico. Morrison Formation. Colorado/Wyoming. Wealden beds. Southern England. Flami
Rutgers - HISTEARTHS - 2006
The Mesozoic Era came to an end 65 million years ago,when the Earth had a very bad dayFrom the top to the bottom of the food chain, land and sea species became extinct during this massive event. Dinosaurs, who had ruled the land for 160 million y
Rutgers - HISTEARTHS - 2006
Rutgers - HISTEARTHS - 2006
Rutgers - HISTEARTHS - 2006
Rutgers - HISTEARTHS - 2006
Rutgers - WEEK - 2
review articleThe synthesis of organic and inorganic compounds in evolved starsSun KwokInstitute of Astronomy &amp; Astrophysics, Academia Sinica, PO Box 23-141, Taipei 106, Taiwan..Recent isotopic analysis of meteorites and interplanetary dust h
Rutgers - WEEK - 12
review articleComputational and evolutionary aspects of languageMartin A. Nowak*, Natalia L. Komarova* &amp; Partha Niyogi* Institute for Advanced Study, Einstein Drive, Princeton, New Jersey 08540, USA Department of Mathematics, University of Leeds
Rutgers - WEEK - 6
PERSPECTIVEThe Paleoproterozoic snowball Earth: A climate disaster triggered by the evolution of oxygenic photosynthesisRobert E. Kopp*, Joseph L. Kirschvink, Isaac A. Hilburn, and Cody Z. NashDivision of Geological and Planetary Sciences, Califo
Rutgers - WEEK - 2
S C I E N C E S C O M PA S S65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Estimates of the magnitude of the
Rutgers - WEEK - 1
Vol 447 | 28 June 2007 | doi:10.1038/nature05905LETTERSChemical complexity in the winds of the oxygen-rich supergiant star VY Canis MajorisL. M. Ziurys1,2,3,4, S. N. Milam1,2,4, A. J. Apponi1,2,4 &amp; N. J. Woolf1,2The interstellar medium is enrich
Rutgers - HISTEARTHS - 2006
The Nernst Equation [Aox] + n [e-] + m[H+] [Ared] where m is the number of protons involved in the reduction of Aox. The redox potential for this reaction can be calculated by:E = Em7 + 59/n log [Ared]/[Aox][H+]m which can be rewritten as:E = Em
Rutgers - WEEK - 5
Deposition of banded iron formations by anoxygenic phototrophic Fe(II)-oxidizing bacteriaAndreas Kappler* California Institute of Technology, GPS Division, Pasadena, California 91125, USA Claudia Pasquero Kurt O. Konhauser Department of Earth and
Rutgers - WEEK - 13
letters to nature13. Pfanner, N. &amp; Geissler, A. Versatility of the mitochondrial protein import machinery. Nature Rev. Mol. Cell. Biol. 2, 339349 (2001). 14. Winzeler, E. A. et al. Functional characterization of the S. cerevisiae genome by gene dele
Rutgers - WEEK - 14
Human Domination of Earth's Ecosystems Peter M. Vitousek, et al. Science 277, 494 (1997); DOI: 10.1126/science.277.5325.494The following resources related to this article are available online at www.sciencemag.org (this information is current as of
Rutgers - WEEK - 5
EVOLUTION: When Did Photosynthesis Emerge on Earth? - Des Marais.http:/www.sciencemag.org/cgi/content/full/289/5485/1703Current IssuePrevious IssuesScience Express About the JournalScience ProductsMy ScienceHome &gt; Science Magazine &gt; 8 S
Rutgers - WEEK - 11
The Rise of Oxygen over the Past 205 Million Years and the Evolution of Large Placental Mammals Paul G. Falkowski, et al. Science 309, 2202 (2005); DOI: 10.1126/science.1116047 The following resources related to this article are available online at w
Rutgers - WEEK - 13
news and viewsEvolving ideas of brain evolutionJon H. Kaas and Christine E. CollinsRecent analyses of an old data set are starting to reveal patterns in the evolution of mammalian brains. The latest study shows that mammalian groups are characte
Rutgers - WEEK - 3
OpinionTRENDS in Genetics Vol.20 No.2 February 2004Reading the entrails of chickens: molecular timescales of evolution and the illusion of precisionDan Graur1 and William Martin21 2Department of Biology and Biochemistry, University of Houston
Rutgers - WEEK - 1
S C I E N C E S C O M PA S S65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 solved much better than in previou
Rutgers - WEEK - 3
nature debateshttp:/www.nature.com/nature/debates/fossil/fossil_4.html26 November 1998Molecules and the fossil recordIf the fossil record cannot provide definitive answers to evolutionary questions perhaps we should be looking elsewhere. Dr Bl
Rutgers - WEEK - 5
J. Phycol. 36, 36 (2000)MINIREVIEWRATIONALIZING ELEMENTAL RATIOS IN UNICELLULAR ALGAE1Paul G. Falkowski 2Environmental Biophysics and Molecular Ecology Program, Institute of Marine and Coastal Sciences and Dept. of Geology, Rutgers University,
Rutgers - WEEK - 2
Proterozoic Ocean Chemistry and Evolution: A Bioinorganic Bridge? A. D. Anbar, et al. Science 297, 1137 (2002); DOI: 10.1126/science.1069651 The following resources related to this article are available online at www.sciencemag.org (this information
Rutgers - WEEK - 11
PALEONTOLOGY:T. rex Was Fierce, Yes, But Feathered, Too - Appenz.http:/www.sciencemag.org/cgi/content/full/sci;285/5436/2052?maxtosh.Current IssuePrevious IssuesScience Express About the JournalScience ProductsMy ScienceHome &gt; Science M
Rutgers - WEEK - 3
Geobiology (2006), 4, 271283DOI: 10.1111/j.1472-4669.2006.00085.xThe loss of mass-independent fractionation in sulfur due to a Palaeoproterozoic collapse of atmospheric methaneMethane, Sulfur, T I C L E O R I G I N Publishing Ltd Blackwell A L A
Rutgers - WEEK - 1
review articleDetermining the composition of the EarthMichael J. Drake &amp; Kevin RighterLunar and Planetary Laboratory, University of Arizona, Tucson, Arizona 85721-0092, USA..A long-standing question in the planetary sciences asks what the Ear
Rutgers - WEEK - 3
letters to nature50% of this area in March) is reduced. However, the increases in stratospheric halogen loading due to anthropogenic emissions has contributed signicantly to the springtime decrease since the 1970s. The observed decadal decrease in c
Rutgers - MS - 24
news and viewsGenome sequences from the seaJed FuhrmanDespite their diminutive stature, phytoplankton have a huge global influence. The genomes of four strains of phytoplankton have now been completely sequenced, revealing their genetic adaptati
Rutgers - MS - 309
news and viewsGenome sequences from the seaJed FuhrmanDespite their diminutive stature, phytoplankton have a huge global influence. The genomes of four strains of phytoplankton have now been completely sequenced, revealing their genetic adaptati