cs240a-multicore

cs240a-multicore - 1 CS 240A Shared Memory& Multicore...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 CS 240A: Shared Memory & Multicore Programming with Cilk++ • Multicore and NUMA architectures • Multithreaded Programming • Cilk++ as a concurrency platform • Work and Span Thanks to Charles E. Leiserson for some of these slides 2 Multicore Architecture Network … Memory I/O $ $ $ Chip Multiprocessor (CMP) core core core 3 cc-NUMA Architectures AMD 8-way Opteron Server ([email protected]) A processor (CMP) with 2/4 cores Memory bank local to a processor Point-to-point interconnect 4 cc-NUMA Architectures ∙ No Front Side Bus ∙ Integrated memory controller ∙ On-die interconnect among CMPs ∙ Main memory is physically distributed among CMPs (i.e. each piece of memory has an affinity to a CMP) ∙ NUMA: Non-uniform memory access. For multi-socket servers only Your desktop is safe (well, for now at least) Triton nodes are not NUMA either 5 Desktop Multicores Today This is your AMD Barcelona or Intel Core i7 ! On-die interconnect Private cache: Cache coherence is required 6 Multithreaded Programming ∙ POSIX Threads (Pthreads) is a set of threading interfaces developed by the IEEE ∙ “Assembly language” of shared memory programming ∙ Programmer has to manually: Create and terminate threads Wait for threads to complete Manage interaction between threads using mutexes, condition variables, etc. 7 Concurrency Platforms • Programming directly on PThreads is painful and error-prone. • With PThreads, you either sacrifice memory usage or load-balance among processors • A concurrency platform provides linguistic support and handles load balancing. • Examples: • Threading Building Blocks (TBB) • OpenMP • Cilk++ 8 Cilk vs PThreads How will the following code execute in PThreads? In Cilk? for (i=1; i<1000000000; i++) { spawn-or-fork foo(i); } sync-or-join; What if foo contains code that waits (e.g., spins) on a variable being set by another instance of foo? They have different liveness properties: ∙ Cilk threads are spawned lazily, “may” parallelism ∙ PThreads are spawned eagerly, “must” parallelism 9 Cilk vs OpenMP ∙ Cilk++ guarantees space bounds On P processors, Cilk++ uses no more than P times the stack space of a serial execution. ∙ Cilk++ has a solution for global variables (called “reducers” / “hyperobjects”) ∙ Cilk++ has nested parallelism that works and provides guaranteed speed-up. Indeed, cilk scheduler is provably optimal. ∙ Cilk++ has a race detector (cilkscreen) for debugging and software release. ∙ Keep in mind that platform comparisons are (always will be) subject to debate 10 T P = execution time on P processors T 1 = work T ∞ = span * * Also called critical-path length or computational depth ....
View Full Document

This note was uploaded on 12/27/2011 for the course CMPSC 240A taught by Professor Gilbert during the Fall '09 term at UCSB.

Page1 / 47

cs240a-multicore - 1 CS 240A Shared Memory& Multicore...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online