{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cs240a-multicore

cs240a-multicore - S hare m C 240A S d Me ory Multicore...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Click to edit Master subtitle style CS 240A: Shared Memory & Multicore Programming with Cilk++   Multicore and NUMA architectures   Multithreaded Programming   Cilk++ as a concurrency platform   Work and Span Thanks to  Charles E. Leiserson for some of these slides
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 Multicore Architecture Network Memory I/O $ $ $ Chip Multiprocessor (CMP) core core core
Background image of page 2
3 cc-NUMA Architectures AMD 8-way Opteron Server ([email protected]) A processor  (CMP) with  2/4 cores  Memory bank  local to a  processor  Point-to-point  interconnect 
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
4 cc-NUMA Architectures No Front Side Bus Integrated memory controller  On-die interconnect among CMPs  Main memory is physically distributed  among  CMPs (i.e. each piece of memory has an  affinity to a CMP) NUMA: Non-uniform memory access. § For multi-socket servers only  § Your desktop is safe (well, for now at least)
Background image of page 4
5 Desktop Multicores Today This is your AMD Barcelona or Intel Core i7 ! On-die  interconnect  Private cache:  Cache  coherence is  required 
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
6 Multithreaded Programming POSIX Threads (Pthreads) is a set of  threading interfaces developed by the IEEE “Assembly language” of shared memory  programming Programmer has to manually: § Create and terminate threads § Wait for threads to complete  § Manage interaction between threads using 
Background image of page 6
7 Concurrency Platforms Programming directly on PThreads is painful  and error-prone. With PThreads, you either sacrifice memory usage or load-balance among  processors  concurrency platform   provides linguistic support and handles load  balancing. Examples:  Threading Building Blocks (TBB)  OpenMP   Cilk++
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
8 Cilk vs PThreads How will the following code execute in  PThreads?  In Cilk? for (i=1; i<1000000000; i++) { spawn-or-fork foo(i); } sync-or-join; What if  foo  contains code that waits (e.g., spins) on a  variable being set by another instance of foo? They have different liveness  properties: Cilk threads are spawned lazily, “may” parallelism PThreads are spawned eagerly, “must” parallelism
Background image of page 8
9 Cilk vs OpenMP Cilk++ guarantees space bounds § On P processors, Cilk++ uses no more than P times  the stack space of a serial execution.  Cilk++ has a solution for global variables (called  “reducers” / “hyperobjects”)  Cilk++ has nested parallelism that works and  provides guaranteed speed-up.  § Indeed, cilk scheduler is provably optimal.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}