{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Carnegie Mellon Parralel Computing Notes on Lecture 4

# Carnegie Mellon Parralel Computing Notes on Lecture 4 -...

This preview shows pages 1–10. Sign up to view the full content.

Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Lecture 4: Parallel Programming Basics

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CMU 15-418, Spring 2014 YACHT Tripped and Fell in Love (Shangri-La) Tunes “I was so blown away by the experience of speeding my programs up by more than a factor of eight, I ran right to the keyboard and just had to compose a song.” - Claire Evans, on her tribute to writing her rst parallel program.
CMU 15-418, Spring 2014 Quiz export void sinx( uniform int N, uniform int terms, uniform float* x, uniform float* result) { // assume N % programCount = 0 for ( uniform int i=0; i<N; i+= programCount ) { int idx = i + programIndex ; float value = x[idx]; float numer = x[idx] * x[idx] * x[idx]; uniform int denom = 6; // 3! uniform int sign = -­‐1; for ( uniform int j=1; j<=terms; j++) { value += sign * numer / denom numer *= x[idx] * x[idx]; denom *= (2*j+2) * (2*j+3); sign *= -­‐1; } result[idx] = value; } } This is an ISPC function. It contains a loop nest. Which iterations of the loop(s) are parallelized by ISPC? Which are not?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CMU 15-418, Spring 2014 Creating a parallel program Thought process: 1. Identify work that can be performed in parallel 2. Partition work (and also data associated with the work) 3. Manage data access, communication, and synchronization Recall one of our main goals is speedup * For a xed computation: Speedup( P processors ) = Time (1 processor) Time (P processors) * Other goals include high e ciency (cost, area, power, etc.) or working on bigger problems than can t on one machine
CMU 15-418, Spring 2014 Steps in creating a parallel program Problem to solve Subproblems (a.k.a. “tasks”, “work to do”) Parallel Threads ** (“workers”) Parallel program (communicating threads) Execution on parallel machine Decomposition Assignment Orchestration Mapping These steps are performed by the programmer, by the system (compiler, runtime, hardware), or by both! ** I had to pick a term

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CMU 15-418, Spring 2014 Decomposition Break up problem into tasks that can be carried out in parallel - Decomposition need not happen statically - New tasks can be identi ed as program executes Main idea: create at least enough tasks to keep all execution units on a machine busy Key aspect of decomposition: identifying dependencies (or... a lack of dependencies)
CMU 15-418, Spring 2014 Amdahl’s Law: dependencies limit maximum speedup due to parallelism You run your favorite sequential program... Let S = the fraction of sequential execution that is inherently sequential (dependencies prevent parallel execution) Then maximum speedup due to parallel execution 1 / S

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CMU 15-418, Spring 2014 A simple example Consider a two-step computation on an N-by-N image - Step 1: double brightness of all pixels (independent computation on each grid element) - Step 2: compute average of all pixel values Sequential implementation of program - Both steps take ~ N 2 time, so total time is ~ 2N 2 N N Execution time Parallelism N 2 N 2 1
CMU 15-418, Spring 2014 Overall performance: Speedup Speedup 2 First attempt at parallelism (P processors) Strategy: - Step 1: execute in parallel - time for phase 1: N 2 /P - Step 2: execute serially - time for phase 2: N 2 Execution time

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}