Suplemental Notes for Lecture 12

Art of Parallel Programming

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE 160: Notes for Lecture 12 (10/31/06) CSE 160: Notes for Lecture 12 (10/31/06) For reference The SGI Origin: A ccNUMA Highly Scalable Server , ISCA, J. Lauden and D. Lenowski. Proceedings of the 24th International Symposium on Computer Architecture (ISCA'97) , pp. 241--251, Denver, Colorado, June 1997. Shared Memory In contrast to "shared nothing" architectures, memory is globally accessible under shared memory. Communication is anonymous; there is no explicit recipient of a shared memory access, as in message passing, and processors may communicate without necessarily being aware of one another. Shared memory provides 2 services: Direct access to another processor's local memory. Automatic address mapping of a (virtual) memory address onto a (processor, local memory address) pair. Shared memory designs are broken down into two major categories, depending on whether or not the access time to shared memory is uniform or non-uniform. These machines are referred to as UMA= Uniform Memory Access , and NUMA= Non-Uniform Memory Access , respectively. With UMA, the cost of accessing main memory is the same for all memory addresses in the absence of contention. UMA are often called flat shared memory, and the machines that are built on top of such memory are called Symmetric Multiprocessors . (To learn more about SMPs, see the web page on Commercial Symmetric Multiprocessors .) Processors inevitably contend for memory, i.e. they access the same memory module or even the same location simultaneously. When severe, contention effectively serializes memory accesses which no longer execute in parallel but one at a time. Though high end servers scale to many tens of processors, e.g. IBM 'p' series to 64 cores, Sun Fire E servers to 72 cores, the interconnect of higher end design would be too costly today. An alternative is to employ physically distributed memory, or a NUMA archictecture. In a NUMA architecture, memory access times are non-uniform. A processor sees different access times to http://www.cs.ucsd.edu/classes/fa06/cse160/Lectures/SharedMem/ (1 of 8) [2/14/2008 10:38:04 AM]
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSE 160: Notes for Lecture 12 (10/31/06) memory, depending on whether the access is local or not, and if not, on the distance to the target memory. Access to remote memory owned by another processor is more expensive. Complex hierarchies are possible, and memory access times are highly non-uniform. Large scale designs with up to 1024 processors are commercially available today. It is clear that the term "distributed memory" must be properly qualified since it arises in shared memory and message passing architectures. In addition to the above variety of shared memory we also have a `get/put' model, which is supported by the Cray T3E. This model can be viewed as a hybrid of message passing and shared memory. Each processor can directly access remote (non-local) memory, though it must explicitly designate which processor's memory it will access (with true shared memory,
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 8

Suplemental Notes for Lecture 12 - CSE 160: Notes for...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online