ILP_ChapThree

ILP_ChapThree - Computer Architecture CSC 520 Chapter Three...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
Computer Architecture CSC 520 Chapter Three Instruction Level Parallelism
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Computer Architecture CSC 520 All processors since about 1985 use pipeliining to overlap the execution of instructions and improve performance. This potential overlap of instructions is called instruction level parallelism [ILP] There are two approaches to exploiting ILP. Chapter Three discusses those approaches that are largely dynamic and rely on hardware. Chapter Four discusses those that are static and rely on software. There is a measure of overlap between the two.
Background image of page 2
Computer Architecture CSC 520 The dynamic, hardware intensive approaches dominate the desktop and server markets. Pentium III and IV Athlon MIPS R10000/12000 Sun UltraSparc III PowerPC 603,G3, and G4 Alpha 21264 The static software intensive approach is more common in the embedded market. But also: IA-64 Intel’s Itanium use this approach.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Computer Architecture CSC 520 Recall the CPI for a pipelined processor: Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls If we can minimize the terms on the right hand side of this equation, we can minimize the CPI and maximize the IPC [instructions per clock] What is ILP? The amount of parallelism available within a basic block --- a straight line code sequence with no branches in except to the entry and no branches out except at the exit -- is quite small. For a typical MIPS program, the average dynamic branch frequency is between 15% and 25%. This means that, typically, between four and seven instructions execute between a pair of branches. Usually these instructions are interdependent allowing little overlap of inst. within the basic block. ILP must be exploited across multiple basic blocks.
Background image of page 4
Computer Architecture CSC 520 Figure 3.1 Effect of major techniques examined in Appendix A, Chap. 3 and Chap. 4 Technique Reduces Forwarding and bypassing Potential data hazard stalls Delayed branches and simple branch scheduling Control hazard stalls Basic dynamic scheduling Data hazard stalls from true dependencies Dynamic scheduling and renaming Data hazard stall and stalls from antidependencies and output dependencies Dynamic branch prediction Control stalls Issuing multiple instructions per cycle Ideal CPI Speculation Data hazard and control hazard stalls Dynamic memory disambiguation Data hazard stalls with memory Loop Unrolling Control hazard stalls Basic compiler pipeline scheduling Data hazard stalls Compiler dependence analysis Ideal CPI, data hazard stalls Software pipelining, trace scheduling Ideal CPI, data hazard stalls Compiler speculation Ideal CPI, data, control stalls
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Computer Architecture CSC 520 Simplest and most common way to increase parallelism available among instructions is to exploit parallelism among iterations of a loop. Called:
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 39

ILP_ChapThree - Computer Architecture CSC 520 Chapter Three...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online