Carnegie Mellon Parralel Computing Notes on Lecture 4

# Strategy step 1 execute in parallel time for phase 1

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: grid element) Step 2: compute average of all pixel values ▪ Sequential implementation of program - Both steps take ~ N2 time, so total time is ~ 2N2 Parallelism N 1 N N2 N2 Execution time CMU 15-418, Spring 2014 First attempt at parallelism (P processors) ▪ Strategy: - Step 1: execute in parallel - time for phase 1: N2/P Step 2: execute serially - time for phase 2: N2 ▪ Overall performance: P Sequential program Parallelism - N2 N2 1 Execution time Speedup N2/P Speedup ≤ 2 Parallelism P Parallel program N2 1 Execution time CMU 15-418, Spring 2014 Parallelizing step 2 ▪ Strategy: - Step 1: execute in parallel - time for phase 1: N2/P Step 2: compute partial sums in parallel, combine results serially - time for phase 2: N2/P + P ▪ Overall performance: - Speedup overhead: combining the partial sums N2/P N2/P + P Note: speedup → P when N >> P Parallelism P Parallel program 1 Execution time CMU 15-418, Spring 2014 Amdahl’s law ▪ Let S = the fraction of sequential execution that is inherently sequential ▪ Max speedup on P processors given by: speedup Max Speedup S=0.01 S=0.05 S=0.1 Processors CMU 15-418, Spring 2014 Decomposition ▪ Who is responsible for performing decomposition? - In many cases: the programmer ▪ Automatic decomposition of sequential programs continues to be a challenging research problem (very diﬃcult in general case) - Compiler must analyze program, identify dependencies - What if dependencies are data-dependent (not known at compile time)? - Researchers have had modest success with simple loops, loop nests - The “magic parallelizing compiler” for complex, general-purpose code has not yet been achieved CMU 15-418, Spring 2014 Assignment Problem to solve Decomposition Subproblems (a.k.a. “tasks”, “work to do”) Assignment Parallel Threads ** (“workers”) Orchestration Parallel program (communicating threads) Execution on parallel machine Mapping ** I had to pick a term CMU 15-418, Spring 2014 Assignment ▪ Assigning tasks to threads - Think of the threads as “workers” ▪ Goals: balance workload, reduce communication costs ▪ Can be performed statically, or dynamically during execution ▪ While programmer often responsible for decomposition, many languages/runtimes take responsibility for assignment. CMU 15-418, Spring 2014 Assignment examples in ISPC export void sinx( uniform int N, uniform int terms, uniform float* x, uniform float* result) { // assumes N % programCount = 0 for (uniform int i=0; i<N; i+=programCount) { int idx = i + programIndex; float value = x[idx]; float numer = x[idx] * x[idx] * x[idx];...
View Full Document

## This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online