… John Markoff, New York Times, May 17, 2004
CMU 15-418/618, Spring 2016ILP tapped out + end of frequency scalingNo further benefit from ILPProcessor clock rate stops increasingImage credit: “The free Lunch is Over” by Herb Sutter, Dr. Dobbs 2005= Transistor density= Clock frequency= Instruction-level parallelism (ILP)= Power
CMU 15-418/618, Fall 2019Programmer’s Perspective on PerformanceQuestion: How do you make your program run faster? Answer before2004: -Just wait 6 months, and buy a new machine! -(Or if you’re really obsessed, you can learn about parallelism.) Answer after2004: -You need to write parallel software.
CMU 15-418/618, Fall 2019Parallel Machines TodayExamples from Apple’s product line:Mac Pro 28 Intel Xeon W cores iMac Pro 18 Intel Xeon W cores (images from apple.com)MacBook Pro Retina 15” 8 Intel Core i9 cores iPad Pro 8 A12X cores (4 fast + 4 low-power)iPhone XS 6 A12 cores (2 fast + 4 low-power)
CMU 15-418/618, Fall 2019Intel Coffee Lake-S Core i9 (2019)8-core CPU + multi-core GPU integrated on one chip
CMU 15-418/618, Fall 2019Intel Xeon Phi 7120A “coprocessor”▪61 “simple” x86 cores (1.3 Ghz, derived from Pentium) ▪Targeted as an accelerator for supercomputing applications
CMU 15-418/618, Fall 2019NVIDIA GeForce GTX 1660 Ti GPU (2019)24 major processing blocks (but much, much more parallelism available... details coming next class)
CMU 15-418/618, Fall 2019Mobile parallel processingPower constraints heavily influence design of mobile systemsApple A12: (in iPhone XS) 6-core CPU + GPU + image processor and more on one chip
CMU 15-418/618, Fall 2019Supercomputing▪Today: clusters of multi-core CPUs + GPUs ▪Oak Ridge Lab’s Summit (fastest supercomputer in the world) -4,608 nodes, each containing: -two 22-core IBM Power9 CPUs + 6 NVIDIA Volta V100 GPUs
CMU 15-418/618, Fall 2019What is a parallel computer?
CMU 15-418/618, Fall 2019One common definitionA parallel computer is a collection of processing elements that cooperate to solve problems quicklyWe’re going to use multiple processors to get itWe care about performance * We care about eﬃciency* Note: different motivation from “concurrent programming” using pthreads in 15-213
CMU 15-418/618, Fall 2019DEMO 1 (This semester’s first parallel program)
CMU 15-418/618, Fall 2019SpeedupOne major motivation of using parallel processing: achieve a speedup For a given problem: speedup( using P processors ) = execution time (using 1 processor)execution time (using P processors)
CMU 15-418/618, Fall 2019Class observations from demo 1▪Communication limited the maximum speedup achieved -In the demo, the communication was telling each other the partial sums ▪Minimizing the cost of communication improves speedup -Moving students (“processors”) closer together (or let them shout)
CMU 15-418/618, Fall 2019DEMO 2 (scaling up to four “processors”)
CMU 15-418/618, Fall 2019Class observations from demo 2▪Imbalance in work assignment limited speedup -Some students (“processors”) ran out work to do (went idle), while others were still working on their assigned task ▪Improving the distribution of work improved speedup
CMU 15-418/618, Fall 2019DEMO 3 (massively parallel execution)
CMU 15-418/618, Fall 2019Class observations from demo 3▪
You've reached the end of your free preview.
Want to read all 40 pages?
- Spring '19
- Hassan Kasfy