Ziavras execution time 9 cycles ti 25 opscycle vliw

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: BNE R1,R2,Loop 56-32 Assuming NO BRANCH DELAY NO BRANCH DELAY S. Ziavras Execution time: 9 cycles ti 2.5 ops/cycle VLIW Discussion • Original VLIW approach VLIW – Increased code size • Exploiting parallelism requires ambitiously parallelism requires ambitiously unrolling loops • Unused FUs wasted bits in instr. encoding Clever encodings needed. E.g.: Compress instrs. in memory; uncompress when fetched Only 1 large immediate field for any FU – Limitations of lockstep operation of lockstep operation • No hazard detection HW (original approach) Blocking caches, causing all FUs to stall S. Ziavras VLIW Discussion (2) • More recent VLIWs – FUs operate more independently – Compiler tries to avoid hazards at issue time ti – HW supports unsynchronized execution once instrs. are issued • Binary code compatibility: problem in early approach – Depends on #s & types of FUs, & unit latencies Solutions object-code translation or emulation Temper the strictness of approach (IA-64) S. Ziavras Extracting (& Enhancing) Loop Extracting (& Enhancing) Loop-Level Parallelism Example with independence between iterations For (i=1000; i>0; i=i-1) (i=1000; i>0; i=i x[i] = x[i] + s; • The compiler checks for loop dependences in early stages (before the machine-language code is produced) • This code doesn’t involve any loop-carried dependences S. Ziavras Extracting Loop-Level Parallelism (2) For (i=1; i<=100; i=i+1) { A[i+1] = A[i] + C[i]; /* S1 */ B[i+1] = B[i] + A[i+1]; /* S2 */ B[i] A[i+1] /* S2 */ } Dependences • Between 2 consecutive iterations of S1 (a loop-carried dependence) • Between 2 consecutive iterations of S2 (a loop-carried dependence) • Between S1 & S2 in the same iteration S. Ziavras Extracting Loop-Level Parallelism (3) For (i=1; i<=100; i=i+1) { (i A[i] = A[i] + B[i]; /* S1 */ B[i+1] C[i] D[i]; /* S2 */ B[i+1] = C[i] + D[i]; /* S2 */ } Loop-carried dependence S1 S2 S1’ S2’ Doesn’t prevent parallelism Parallel loop because of non-circular dependences Unroll 2 iterations iterations S1: A[i] = A[i] + B[i] S2: B[i+1] = C[i] + D[i]...
View Full Document

This document was uploaded on 02/09/2014.

Ask a homework question - tutors are online