Lecture 12 - Multiprocessors

Lecture 12 - Multiprocessors - ECE565 Computer Architecture...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECE565: Computer Architecture Instructor: Vijay S. Pai Fall 2011 1 ECE 565, Fal 2011 ECE 565, Fal 2011 2 This Unit: Shared Memory Multiprocessors • Three issues • Cache coherence • Synchronization • Memory consistency • Two cache coherence approaches • “Snooping” (SMPs): < 16 processors • “Directory”/Scalable: lots of processors Application OS Firmware Compiler I/O Memory Digital Circuits Gates & Transistors CPU ECE 565, Fal 2011 3 Thread-Level Parallelism • Thread-level parallelism (TLP) • Collection of asynchronous tasks: not started and stopped together • Data shared loosely, dynamically • Example: database/web server ( each query is a thread) • accts is shared , can’t register allocate even if it were scalar • id and amt are private variables, register allocated to r1 , r2 • Focus on this struct acct_t { int bal; }; shared struct acct_t accts[MAX_ACCT]; int id,amt; if ( accts[id].bal >= amt) { accts[id].bal-= amt; spew_cash(); } 0: addi r1,accts,r3 1: ld 0(r3),r4 2: blt r4,r2,6 3: sub r4,r2,r4 4: st r4,0(r3) 5: call spew_cash ECE 565, Fal 2011 4 Shared Memory • Shared memory • Multiple execution contexts sharing a single address space • Multiple programs (MIMD) • Or more frequently: multiple copies of one program (SPMD) • Implicit (automatic) communication via loads and stores + Simple software • No need for messages, communication happens naturally – Maybe too naturally • Supports irregular, dynamic communication patterns • Both DLP and TLP – Complex hardware • Must create a uniform view of memory • Several aspects to this as we will see ECE 565, Fal 2011 5 Shared-Memory Multiprocessors P 1 P 2 P 3 P 4 Memory System • Provide a shared-memory abstraction • Familiar and efficient for programmers ECE 565, Fal 2011 6 • Provide a shared-memory abstraction • Familiar and efficient for programmers Shared-Memory Multiprocessors Interconnection Network P 1 Cache M 1 Interface P 2 Cache M 2 Interface P 3 Cache M 3 Interface P 4 Cache M 4 Interface ECE 565, Fal 2011 7 Paired vs. Separate Processor/Memory? • Separate processor/memory • Uniform memory access (UMA): equal latency to all memory + Simple software, doesn’t matter where you put data – Lower peak performance • Bus-based UMAs common: symmetric multi-processors (SMP) • Paired processor/memory • Non-uniform memory access (NUMA) : faster to local memory – More complex software: where you put data matters + Higher peak performance: assuming proper data placement CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem R R R R ECE 565, Fal 2011 8 Shared vs. Point-to-Point Networks • Shared network : e.g., bus (left) or crossbar (not shown) + Low latency – Low bandwidth: expensive to scale beyond ~16 processors + Shared property simplifies cache coherence protocols (later) • Point-to-point network : e.g., mesh or ring (right) – Longer latency: may need multiple “hops” to communicate...
View Full Document

This note was uploaded on 01/10/2012 for the course ECE 565 taught by Professor Pai during the Fall '11 term at Purdue.

Page1 / 22

Lecture 12 - Multiprocessors - ECE565 Computer Architecture...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online