This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Advanced Topics Optimization for parallel machines and memory hierarchies Last Time • Dependence analysis Today • Loop transformations • An example  McKinley, Carr, Tseng loop transformations to improve cache performance CS 380C Lecture 24 1 Locality Analysis Which Loops are Parallel? review do I = 1, N do J = 1, N S 1 A(I,J) = A(I,J1) + 1 do I = 1, N do J = 1, N S 2 A(I,J) = A(I1,J1) + 1 do I = 1, N do J = 1, N S 3 B(I,J) = B(I1,J+1) + 1 J I • A dependence D = ( d 1 ,..., d k ) is carried at level i , if d i is the first nonzero element of the distance/direction vector. • A loop l i is parallel , if ∃ a dependence D j carried at level i . Either distance vector direction vector ∀ D j d 1 ,..., d i 1 > d 1 ,..., d i 1 = “ < OR d 1 ,..., d i = d 1 ,..., d i = “ = CS 380C Lecture 24 2 Locality Analysis Loop Transformations Taxonomy • Loop unrolling • Loop interchange • Loop fusion • Loop distribution (a.k.a. fission) • Loop skewing • Strip mine and interchange (a.k.a. tiling & blocking) • Unrollandjam (a variety of tiling) • Loop reversal CS 380C Lecture 24 3 Locality Analysis Loop Interchange do I = 1, N do J = 1, N S 1 A(I,J) = A(I1,J) + 1 enddo enddo do I = 1, N do J = 1, N S 2 B(I,J) = B(I1,J+1) + 1 enddo enddo I J I J Loop interchange is safe iff • it does not reverse the execution order of the source and sink of any dependence in the nest, i.e., if the distance vector would become negative. ◦ Enables parallelization of outer and/or inner loops ◦ Changes execution order of the statements ◦ Can improve reuse CS 380C Lecture 24 4 Locality Analysis Loop Fusion = ⇒ loop fusion = ⇒ do i = 2, n s 1 a(i) = b(i) do i = 2, n s 2 c(i) = b(i) * a(i1) do i = 2, n s 1 a(i) = b(i) s 2 c(i) = b(i) * a(i1) ⇐ = loop distribution ⇐ = Loop Fusion is safe iff • no forward dependence between nests becomes a backward loop carried dependence. ⇒ Would fusion be safe if s 2 referenced a ( i + 1 ) ? • Benefits ◦ Reuse ◦ Eliminates synchronization between parallel loops ◦ Reduced loop overhead CS 380C Lecture 24 5 Locality Analysis Loop Distribution = ⇒ loop distribution = ⇒ do i = 2, n s 1 a(i) = b(i) s 2 c(i) = b(i) * a(i+1) do i = 2, n s 2 c(i) = b(i) * a(i+1) do i = 2, n s 1 a(i) = b(i) Loop Distribution is safe iff • statements involved in a cycle of dependences ( recurrence ) remain in the same loop, & • if ∃ a dependence between two statements placed in different loops, it must be forward....
View
Full Document
 Fall '08
 shmat
 CPU cache, Locality of reference, Compiler optimizations, CS 380C Lecture, Loop Distribution

Click to edit the document details