Exam 4- 2013

How long will it take to do a single large write on

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ete per second on this array? • How long will it take to do a single large read on this array? • How long will it take to do a single large write on this array? 6 Problem 5: Parallelizing code (10 points) Consider the following code. Each function in this code takes a long time to run. A E F G = = = = f1(B, f2(A, f3(C, f4(E, C, D) B) D) A) // Line 1 // Line 2 // Line 3 // Line 4 Part A: 4 points Draw a dependency graph with one node for each line. Part B: 3 points How many threads can this code take advantage of? Part C: 3 points If Line 1 takes t1 units of time to run, Line 2 takes t2 units, etc., then what is the speedup of a fully parallelized version of this code over a sequential version? 7 ing Scaling Bottlenecks in6: Parallel performance metrics (15 points) Problem Multi-Threaded Applications The graph below shows the speedup vs. number of threads for three parallel applications: blackscline that goes highest), facesim (the squares), and cholesky (the triangles). erman Kristof Duholes (the Lieven Eeckhout Bois IS Department, Ghent University, Belgium Please ask for help if anything on the graph is too small to read. 16 14 up stack, which quandelimiters on multile stack. We describe tacks on a multi-core to be accurate within d applications. We how speedup stacks ecks, classify benchderstand LLC perfor- returns in improving the computer indusrrent general-purpose f cores in the typical Intel Nehalem, Intel ldozer, etc. It is to be l increase in the comstor density improvexemplified by Intel’s th more than 50 cores g core counts is the ny-core systems. Ala challenge for many munity, given the recore processors, par- 2 IEEE speedup blackscholes 12 lly show sublinear the achieved speedup res and threads. Subuses, such as poorly pinning and/or yieldrces such as the lastn memory subsystem. ssor designers to ung and emerging workperformance and de- facesim cholesky 10 8 6 4 2 0 1 thread 2 threads 4 threads 8 threads 16 threads Figure 1. Speedup as a function of the num- From Eyerman et al., “Speedup Stacks: Identifying Scaling Bottlenecks in M...
View Full Document

This note was uploaded on 02/08/2014 for the course CS 351 taught by Professor Dr.suzannerivoire during the Fall '13 term at Sonoma.

Ask a homework question - tutors are online