G factorial should speed up bemer than negadon csep

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: sity per Element computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 24 Performance Gotcha #2: Load Balance NegaDon + Ramp: ComputaDonal Intensity per Element –  Block distribuDon: green and purple have ~the same work computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 25 Performance Gotcha #2: Load Balance Factorial + Ramp: ComputaDonal Intensity per Element computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 26 Performance Gotcha #2: Load Balance Factorial + Ramp: ComputaDonal Intensity per Element –  Block DistribuDon computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 27 Performance Gotcha #2: Load Balance Factorial + Ramp: ComputaDonal Intensity per Element –  Block DistribuDon: Purple has ~3x as much work as green computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 28 Performance Gotcha #2: Load Balance Factorial + Ramp: ComputaDonal Intensity per Element –  Cyclic DistribuDon computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 29 Performance Gotcha #2: Load Balance Factorial + Ramp: ComputaDonal Intensity per Element –  Cyclic DistribuDon: Purple only has numItems/2 more work computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 30 Performance Gotcha #2: Load Balance Factorial + Random: –  Block distribuDon: green has ~1.5x the work of purple •  (for the data set shown) computa@onal intensity array elements CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 31 Load Balance Implica3ons for Assignment #1 •  Block + factorial + ramp exhibits bad load balance –  some tasks had significantly more work than others –  cyclic/random input sets may result in beMer load balance •  Keep in mind that many algorithms must be wriMen without knowing their input sets –  i.e., can’t think “aha, my input will be a ramp so …” CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 32 Assignment #1 Debrief •  Who saw execuDon Dme behaviors similar to what I just described? –  what kinds of things did you “do right” to get this result? –  what kinds of issues did others do differently to not see it? –  or perhaps, rather, what did you stumble across then fix? •  measuring aggregate performance of all threads, not wallclock Dme CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 33 Assignment #1 Summary: Distribu3ons Block & Cyclic: + give each task a similar number of work items + reasonably easy to compute Block: + results in good spaDal locality (touches adjacent elements) –  can expose sensiDviDes to work distribuDon •  as in ramp+factorial Cyclic: + less likely to be sensiDve to work distribuDon –  can result in false sharing issues CSEP 524: Parallel ComputaDon Winter 2013: Chamberlain 34 Time for a Break/Something Different? Alterna3ves to Block and Cyclic •  Other distribuDons can help address the drawbacks of block and cyclic: –  Block-­‐Cyclic distribuDon –  Dynamic distribuDons –...
View Full Document

This document was uploaded on 04/04/2014.

Ask a homework question - tutors are online