cs240a-multicore

cs240a-multicore - 1 CS 240A: Shared Memory &...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA architectures Multithreaded Programming Cilk++ as a concurrency platform Work and Span Thanks to Charles E. Leiserson for some of these slides 2 Multicore Architecture Network Memory I/O $ $ $ Chip Multiprocessor (CMP) core core core 3 cc-NUMA Architectures AMD 8-way Opteron Server (neumann@cs.ucsb.edu) A processor (CMP) with 2/4 cores Memory bank local to a processor Point-to-point interconnect 4 cc-NUMA Architectures No Front Side Bus Integrated memory controller On-die interconnect among CMPs Main memory is physically distributed among CMPs (i.e. each piece of memory has an affinity to a CMP) NUMA: Non-uniform memory access. For multi-socket servers only Your desktop is safe (well, for now at least) Triton nodes are not NUMA either 5 Desktop Multicores Today This is your AMD Barcelona or Intel Core i7 ! On-die interconnect Private cache: Cache coherence is required 6 Multithreaded Programming POSIX Threads (Pthreads) is a set of threading interfaces developed by the IEEE Assembly language of shared memory programming Programmer has to manually: Create and terminate threads Wait for threads to complete Manage interaction between threads using mutexes, condition variables, etc. 7 Concurrency Platforms Programming directly on PThreads is painful and error-prone. With PThreads, you either sacrifice memory usage or load-balance among processors A concurrency platform provides linguistic support and handles load balancing. Examples: Threading Building Blocks (TBB) OpenMP Cilk++ 8 Cilk vs PThreads How will the following code execute in PThreads? In Cilk? for (i=1; i<1000000000; i++) { spawn-or-fork foo(i); } sync-or-join; What if foo contains code that waits (e.g., spins) on a variable being set by another instance of foo? They have different liveness properties: Cilk threads are spawned lazily, may parallelism PThreads are spawned eagerly, must parallelism 9 Cilk vs OpenMP Cilk++ guarantees space bounds On P processors, Cilk++ uses no more than P times the stack space of a serial execution. Cilk++ has a solution for global variables (called reducers / hyperobjects) Cilk++ has nested parallelism that works and provides guaranteed speed-up. Indeed, cilk scheduler is provably optimal. Cilk++ has a race detector (cilkscreen) for debugging and software release. Keep in mind that platform comparisons are (always will be) subject to debate 10 T P = execution time on P processors T 1 = work T = span * * Also called critical-path length or computational depth ....
View Full Document

Page1 / 47

cs240a-multicore - 1 CS 240A: Shared Memory &...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online