This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ete per second on this array? • How long will it take to do a single large read on this array? • How long will it take to do a single large write on this array? 6 Problem 5: Parallelizing code (10 points)
Consider the following code. Each function in this code takes a long time to run.
A
E
F
G =
=
=
= f1(B,
f2(A,
f3(C,
f4(E, C, D)
B)
D)
A) // Line 1
// Line 2
// Line 3
// Line 4 Part A: 4 points
Draw a dependency graph with one node for each line. Part B: 3 points
How many threads can this code take advantage of? Part C: 3 points
If Line 1 takes t1 units of time to run, Line 2 takes t2 units, etc., then what is the speedup of a
fully parallelized version of this code over a sequential version? 7 ing Scaling Bottlenecks in6: Parallel performance metrics (15 points)
Problem MultiThreaded Applications
The graph below shows the speedup vs. number of threads for three parallel applications: blackscline that goes highest), facesim (the squares), and cholesky (the triangles).
erman
Kristof Duholes (the Lieven Eeckhout
Bois
IS Department, Ghent University, Belgium
Please ask for help if anything on the graph is too small to read.
16
14 up stack, which quandelimiters on multile stack. We describe
tacks on a multicore
to be accurate within
d applications. We
how speedup stacks
ecks, classify benchderstand LLC perfor returns in improving
the computer indusrrent generalpurpose
f cores in the typical
Intel Nehalem, Intel
ldozer, etc. It is to be
l increase in the comstor density improvexempliﬁed by Intel’s
th more than 50 cores
g core counts is the
nycore systems. Ala challenge for many
munity, given the recore processors, par 2 IEEE speedup blackscholes 12 lly show sublinear
the achieved speedup
res and threads. Subuses, such as poorly
pinning and/or yieldrces such as the lastn memory subsystem.
ssor designers to ung and emerging workperformance and de facesim
cholesky 10
8
6
4
2
0
1 thread 2 threads 4 threads 8 threads 16 threads Figure 1. Speedup as a function of the num From Eyerman et al., “Speedup Stacks: Identifying Scaling Bottlenecks in M...
View
Full
Document
This note was uploaded on 02/08/2014 for the course CS 351 taught by Professor Dr.suzannerivoire during the Fall '13 term at Sonoma.
 Fall '13
 Dr.SuzanneRivoire
 Computer Architecture

Click to edit the document details