{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lec11 - LECTURE 11 Instruction Level Parallelism Pipelining...

This preview shows pages 1–7. Sign up to view the full content.

LECTURE - 11

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Instruction Level Parallelism Pipelining achieves Instruction Level Parallelism (ILP) Multiple instructions in parallel But, problems with pipeline hazards CPI = Ideal CPI + stalls/instruction Stalls = Structural + Data (RAW/WAW/WAR) + Control How to reduce stalls? That is, how to increase ILP?
Techniques for Improving ILP Loop unrolling Basic pipeline scheduling Dynamic scheduling, scoreboarding, register renaming Dynamic memory disambiguation Dynamic branch prediction Multiple instruction issue per cycle Software and hardware techniques

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Loop-Level Parallelism Basic block: straight-line code w/o branches Fraction of branches: 0.15 ILP is limited! Average basic-block size is 6-7 instructions And, these may be dependent Hence, look for parallelism beyond a basic block Loop-level parallelism is a simple example of this
Loop-Level Parallelism: An Example Consider the loop: for(int i = 1000; i >= 1; i = i-1) { x[i] = x[i] + C; // FP } Each iteration of the loop is independent of other iterations Loop-level parallelism To convert it into ILP: Loop unrolling ( static , dynamic) Vector instructions

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The Loop, in DLX In DLX, the loop looks like: Loop: LD F0, 0(R1) // F0 is array element ADDD F4, F0, F2// F2 has the scalar 'C' SD 0(R1), F4 // Stored result
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}