CSE502_lec08,9,10-ILP & dynamic-schedS10

CSE502_lec08,9,10-ILP & dynamic-schedS10 - CSE 502...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 502 Graduate Computer Architecture Lec 8-10 Instruction Level Parallelism Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson, UC-Berkeley cs252-s06 2/24; 3/1,8/10 (quiz was 2/22, QuizAns 3/3) CSE502-S10, Lec 08+9+10-ILP 2 Outline ILP Instruction Level Parallelism Compiler techniques to increase ILP Loop Unrolling Static Branch Prediction Dynamic Branch Prediction Overcoming Data Hazards with Dynamic Scheduling (Start) Tomasulo Algorithm Conclusion Reading Assignment: Chapter 2 today, (Quiz back next class), Chapter 3 next. 2/24; 3/1,8/10 (quiz was 2/22, QuizAns 3/3) CSE502-S10, Lec 08+9+10-ILP 3 Recall from Pipelining Review Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI : measure of the maximum performance attainable by the implementation Structural hazards : HW cannot support this combination of instructions Data hazards : Instruction depends on result of prior instruction still in the pipeline Control hazards : Caused by delay between fetching of instructions and decisions about changes in control flow (branches and jumps {MIPS: j jump, jal call, jr return} ) 2/24; 3/1,8/10 (quiz was 2/22, QuizAns 3/3) CSE502-S10, Lec 08+9+10-ILP 4 Instruction Level Parallelism Instruction-Level Parallelism ( ILP ): overlap the execution of instructions to run programs faster (improve performance) Two approaches to exploit ILP: 1) Rely on hardware to help discover and exploit the parallelism dynamically (e.g., Pentium 4, AMD Opteron, IBM Power) , and 2) Rely on software technology to find parallelism, statically at compile-time (e.g., Itanium 2) Next 3 lectures on ILP 2/24; 3/1,8/10 (quiz was 2/22, QuizAns 3/3) CSE502-S10, Lec 08+9+10-ILP 5 Instruction-Level Parallelism (ILP) Basic Block (BB) ILP is quite small BB: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit average dynamic branch frequency 15% to 25% => 4 to 7 instructions execute between a pair of branches other problem: instructions in a BB likely depend on each other To obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks (trace scheduling) Simplest: loop-level parallelism to exploit parallelism among iterations of a loop. E.g., for (j=0; j<1000; j=j+1) x[j+1] = x[j+1] + y[j+1]; for (i=0; i<1000; i=i+4){ x[I+1] = x[I+1] + y[I+1]; x[I+2] = x[I+2] + y[I+2]; x[I+3] = x[I+3] + y[I+3]; x[I+4] = x[I+4] + y[I+4]; } //{Vector instruction HW can make loops run much faster.} 2/24; 3/1,8/10 (quiz was 2/22, QuizAns 3/3) CSE502-S10, Lec 08+9+10-ILP 6 Loop-Level Parallelism Exploit loop-level parallelism to find run-time parallelism by unrolling loops either via 1.dynamic branch prediction by CPU hardware or 2.static loop unrolling by a compiler (Other ways: vectors & parallelism - covered later)...
View Full Document

Page1 / 71

CSE502_lec08,9,10-ILP & dynamic-schedS10 - CSE 502...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online