If b is accessed for i 1 to n for j 1 to

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: cks are mapped to the same set or block frame; also called collision misses or interference misses. Improving cache performance: hardware "   Increased cache capacity "   higher associativity "   hardware prefetching of instructions and data "   equidistant locality "   second- level / third level cache (L2, L3) "   L3 often shared by multiple cores "   out of order instruction execution "   branch prediction All this makes modern CPUs highly complex. Improving cache performance: software "   Merging Arrays: Improve spatial locality by single array of structs vs. parallel arrays (Fortran). "   Loop Interchange: Change nesting of loops to access data in the order stored in memory. "   Loop Fusion: Combine 2 or more independent loops that have the same looping and some variables overlap. "   Blocking or “tiling” : Improve temporal locality by accessing “blocks” of data repeatedly vs. going down whole columns or rows. (prime sieve) Matrix Multiply Data or loop reordering for improve cache...
View Full Document

Ask a homework question - tutors are online