John Marko ff New York Times 18 Spring 2016 ILP tapped

John marko ff new york times 18 spring 2016 ilp tapped

This preview shows page 18 - 36 out of 40 pages.

John Markoff, New York Times, May 17, 2004
Background image
CMU 15-418/618, Spring 2016ILP tapped out + end of frequency scalingNo further benefit from ILPProcessor clock rate stops increasingImage credit: “The free Lunch is Over” by Herb Sutter, Dr. Dobbs 2005= Transistor density= Clock frequency= Instruction-level parallelism (ILP)= Power
Background image
CMU 15-418/618, Fall 2019Programmer’s Perspective on PerformanceQuestion: How do you make your program run faster? Answer before2004: -Just wait 6 months, and buy a new machine! -(Or if you’re really obsessed, you can learn about parallelism.) Answer after2004: -You need to write parallel software.
Background image
CMU 15-418/618, Fall 2019Parallel Machines TodayExamples from Apple’s product line:Mac Pro 28 Intel Xeon W cores iMac Pro 18 Intel Xeon W cores (images from apple.com)MacBook Pro Retina 15” 8 Intel Core i9 cores iPad Pro 8 A12X cores (4 fast + 4 low-power)iPhone XS 6 A12 cores (2 fast + 4 low-power)
Background image
CMU 15-418/618, Fall 2019Intel Coffee Lake-S Core i9 (2019)8-core CPU + multi-core GPU integrated on one chip
Background image
CMU 15-418/618, Fall 2019Intel Xeon Phi 7120A “coprocessor”61 “simple” x86 cores (1.3 Ghz, derived from Pentium) Targeted as an accelerator for supercomputing applications
Background image
CMU 15-418/618, Fall 2019NVIDIA GeForce GTX 1660 Ti GPU (2019)24 major processing blocks (but much, much more parallelism available... details coming next class)
Background image
CMU 15-418/618, Fall 2019Mobile parallel processingPower constraints heavily influence design of mobile systemsApple A12: (in iPhone XS) 6-core CPU + GPU + image processor and more on one chip
Background image
CMU 15-418/618, Fall 2019SupercomputingToday: clusters of multi-core CPUs + GPUs Oak Ridge Lab’s Summit (fastest supercomputer in the world) -4,608 nodes, each containing: -two 22-core IBM Power9 CPUs + 6 NVIDIA Volta V100 GPUs
Background image
CMU 15-418/618, Fall 2019What is a parallel computer?
Background image
CMU 15-418/618, Fall 2019One common definitionA parallel computer is a collection of processing elements that cooperate to solve problems quicklyWe’re going to use multiple processors to get itWe care about performance * We care about eciency* Note: different motivation from “concurrent programming” using pthreads in 15-213
Background image
CMU 15-418/618, Fall 2019DEMO 1 (This semester’s first parallel program)
Background image
CMU 15-418/618, Fall 2019SpeedupOne major motivation of using parallel processing: achieve a speedup For a given problem: speedup( using P processors ) = execution time (using 1 processor)execution time (using P processors)
Background image
CMU 15-418/618, Fall 2019Class observations from demo 1Communication limited the maximum speedup achieved -In the demo, the communication was telling each other the partial sums Minimizing the cost of communication improves speedup -Moving students (“processors”) closer together (or let them shout)
Background image
CMU 15-418/618, Fall 2019DEMO 2 (scaling up to four “processors”)
Background image
CMU 15-418/618, Fall 2019Class observations from demo 2Imbalance in work assignment limited speedup -Some students (“processors”) ran out work to do (went idle), while others were still working on their assigned task Improving the distribution of work improved speedup
Background image
CMU 15-418/618, Fall 2019DEMO 3 (massively parallel execution)
Background image
CMU 15-418/618, Fall 2019Class observations from demo 3
Background image
Image of page 36

You've reached the end of your free preview.

Want to read all 40 pages?

  • Spring '19
  • Hassan Kasfy

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture