Carnegie Mellon Parralel Computing Notes on Lecture 4

n float value xi float numer xi xi

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: uniform int denom = 6; // 3! uniform int sign = - 1; for (uniform int j=1; j<=terms; j++) { value += sign * numer / denom numer *= x[idx] * x[idx]; denom *= (2*j+2) * (2*j+3); sign *= - 1; } result[i] = value; } } export void sinx( uniform int N, uniform int terms, uniform float* x, uniform float* result) { foreach (i = 0 ... N) { float value = x[i]; float numer = x[i] * x[i] * x[i]; uniform int denom = 6; // 3! uniform int sign = - 1; for (uniform int j=1; j<=terms; j++) { value += sign * numer / denom numer *= x[i] * x[i]; denom *= (2*j+2) * (2*j+3); sign *= - 1; } result[i] = value; } } Decomposition by loop iteration Decomposition by loop iteration Programmer managed assignment: Static assignment Assign iterations to instances in interleaved fashion Foreach construct exposes independent work to system System-manages assignment of iterations (work) to instances CMU 15-418, Spring 2014 Orchestration Problem to solve Decomposition Subproblems (a.k.a. “tasks”, “work to do”) Assignment Parallel Threads ** (“workers”) Orchestration Parallel program (communicating threads) Execution on parallel machine Mapping ** I had to pick a term CMU 15-418, Spring 2014 Orchestration ▪ Involves: - Structuring communication Adding synchronization to preserve dependencies Organizing data structures in memory scheduling tasks ▪ Goals: reduce costs of communication/sync, preserve locality of data reference, reduce overhead, etc. ▪ Machine details impact many of these decisions - If synchronization is expensive, might use it more sparsely CMU 15-418, Spring 2014 Mapping Problem to solve Decomposition Subproblems (a.k.a. “tasks”, “work to do”) Assignment Parallel Threads ** (“workers”) Orchestration Parallel program (communicating threads) Execution on parallel machine Mapping ** I had to pick a term CMU 15-418, Spring 2014 Mapping ▪ Mapping “threads” to hardware execution units ▪ Traditionally, a responsibility of the OS - e.g., map kernel thread to CPU core execution context Counter example 1: mapping ISPC program instances to vector instruction lanes Counter example 2: mapping CUDA thread blocks to GPU cores (future lecture) ▪ Some interesting mapping decision...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online