25 05 1 2 gflops byte 4 8 16 double peak performance

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ] depends on M[i-1,w] and M[i-1,w-wi] We can parallelize the inner loop What would be the problems? cache, many spawns Tiling Knapsack parallelization M[i, w] depends on M[i-1,w] and M[i-1,w-wi] We can tile (as in prime sieve) What would be the dependences between tiles? tileI,J depends on tiles left of it and above it Knapsack wavefront block parallelization Knapsack wavefront block parallelization Knapsack wavefront block parallelization Knapsack wavefront block parallelization Knapsack wavefront block parallelization Knapsack wavefront block parallelization Knapsack wavefront block parallelization Knapsack wavefront block parallelization Roofline Model "   Architectural model*, based on intuition that offchip memory bandwidth is the constraining resource. *:David Patterson et.al. Parallel Computing Lab, Berkeley "   Operational Intensity: flops per byte of memory traffic, i.e. bytes exchanged between cache(s) and memory. "   Roofline plots Gflops/sec as a function of Gflops/ byte on a log log scale " Polynomia become straight lin...
View Full Document

This note was uploaded on 02/12/2014 for the course CS 475 taught by Professor Staff during the Fall '08 term at Colorado State.

Ask a homework question - tutors are online