cs140-multicoreintro

cs140-multicoreintro - CS 140 : Jan 31 Feb 7, 2011...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS 140 : Jan 31 – Feb 7, 2011 Multicore (and Shared Memory) Programming with Cilk++ Multicore and NUMA architectures Multithreaded Programming Cilk++ as a concurrency platform Divide and conquer paradigm for Cilk++ Thanks to Charles E. Leiserson for some of these slides
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Multicore Architecture Network Memory I/O $ $ $ Chip Multiprocessor (CMP) core core core
Background image of page 2
3 cc-NUMA Architectures AMD 8-way Opteron Server (neumann@cs.ucsb.edu) CMP with 4 cores Memory bank local to a CMP
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 cc-NUMA Architectures No Front Side Bus Integrated memory controller On-die interconnect among CMPs Main memory is physically distributed among CMPs -- each piece of memory has an affinity to one CMP NUMA: Non-uniform memory access For multi-socket servers only Your laptop is safe (well, for now at least) Triton nodes are also NUMA !
Background image of page 4
5 Desktop Multicores Today This is your AMD Shangai or Intel Core i7 (Nehalem) ! On-chip interconnect Private cache: Cache coherence is required
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Multithreaded Programming A thread of execution is a fork of a computer program into two or more concurrently running tasks. POSIX Threads (Pthreads) is a set of threading interfaces developed by the IEEE Assembly of shared memory programming Programmer has to manually: Create and terminating threads Wait for threads to complete Manage the interaction between threads using mutexes, condition variables, etc.
Background image of page 6
7 Concurrency Platforms Programming directly on PThreads is painful and error-prone. With PThreads, you either sacrifice memory usage or load-balance among processors A concurrency platform provides linguistic support and handles load balancing. Examples: Threading Building Blocks (TBB) OpenMP • Cilk++
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 Cilk vs. PThreads How will the following code execute in PThreads? In Cilk? for (i=1; i<1000000000; i++) { spawn-or-fork foo(i); } sync-or-join; What if foo contains code that waits (e.g., spins) on a variable being set by another instance of foo? This difference is a liveness property: Cilk threads are spawned lazily, “may” parallelism PThreads are spawned eagerly, “must” parallelism
Background image of page 8
9 Cilk vs. OpenMP Cilk++ guarantees space bounds. On P processors, Cilk++ uses no more
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 30

cs140-multicoreintro - CS 140 : Jan 31 Feb 7, 2011...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online