Chapter_Four_Loop_Unrolling

Chapter_Four_Loop_Unrolling - CSC 520 Computer Architecture...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CSC 520 Computer Architecture Exploiting Instruction-Level Parallelism with Software Approaches Consider the following code, where we add a scalar to a vector, this Code lends itself to parallel execution. For ( I = 1000; I > 0; i++) x[i] = x[i] + s; Straightforward MIPS code, not scheduled for the pipeline Loop: L.D F0, 0(R1) f0 = array element ADD.D F4,F0,F2 add scalar in f2 S.D F4,0(R1) store result DADDI R1,R1,#-8 decrement pointer 8 bytes BNE R1,R2,Loop branch r1 != r2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSC 520 Computer Architecture Basic Compiler Techniques for Exposing ILP Basic pipelining and loop unrolling Throughout this chapter, we assume the FP latencies shown in the following table. Further, we assume a standard five stage integer pipeline, so that branches have a delay of one clock cycle. We also assume that functional units are replicated so an operation of any type can be issued on every clock cycle and there are no structural hazards. Inst. Producing result inst. using result Latency clock cycles FP ALU op another FP ALU op 3 FP ALU op store double 2 load double FP ALU op 1 load double store double 0
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/10/2012 for the course CSC 520 taught by Professor Simmons during the Fall '07 term at S. Alabama.

Page1 / 15

Chapter_Four_Loop_Unrolling - CSC 520 Computer Architecture...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online