csee4824_f11_lec13

csee4824_f11_lec13 - CSEE W4824 Computer Architecture Fall...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CSEE W4824 – Computer Architecture Fall 2011 Luca Carloni Department of Computer Science Columbia University in the City of New York http://www.cs.columbia.edu/~cs4824/ Lecture 13 Compiler Techniques for Increasing ILP: Loop Unrolling, Software Pipelining & VLIW Processors CSEE 4824 – Fall 2011 - Lecture 13 Page 2 Luca Carloni – Columbia University Announcements •M id t e rm : Monday, October 31 In class, 75 minutes – OPEN BOOK and OPEN NOTES – NO ELECTRONIC EQUIPMENT MAY BE USED
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 CSEE 4824 – Fall 2011 - Lecture 13 Page 3 Luca Carloni – Columbia University Review: Tomasulo’s Algorithm • Distributed control based on reservation station tag fields • Register file bypassed in broadcasting to reservation stations via Common Data Bus • Precise exception – stall issuing if there is a pending branch in the pipeline • speculation will remove this restriction • IBM 360 had pipelined functional units –w e d e s c r i b e i t w i t h multiple functional units CSEE 4824 – Fall 2011 - Lecture 13 Page 4 Luca Carloni – Columbia University Review: Tomasulo’s Algorithm: Loop Example • Once the system reaches steady state in the execution of this loop, two copies of the loop can be sustained with CPI close to 1.0 – provided that there are enough resources and that the multiplication can complete in 4 cycles • With multiple-instruction issue, Tomasulo’s algorithm can sustain more than one instruction per clock 16 15 7 MUL.D F4, F0, F2 10 16 14 9 Execute 8
Background image of page 2
3 CSEE 4824 – Fall 2011 - Lecture 13 Page 5 Luca Carloni – Columbia University Tomasulo’s Algorithm: Summary • Main advantages – distribution of control & hazard detection logic – elimination of stalls for WAR and WAW hazards via register renaming – CDB-based by-passing via broadcasting from functional units to reservation stations • Issues – implementation issues • design complexity (fast associative buffers) • CDB bottleneck (many CDB must interact with all reserv. stations) – interrupt model is not precise (due to out-of-order committing) • in the first example, SUB.D and ADD.D could both complete before MUL.D raises an exception – hardware support for speculation will solve the problem (stay tuned…) CSEE 4824 – Fall 2011 - Lecture 13 Page 6 Luca Carloni – Columbia University Clock Frequency vs. Throughput vs. Latency • Pipelining decreases CPI (or decreases CCT) but cannot reduce latency • Multiple-issue processors use specialized HW to decrease CPI
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 CSEE 4824 – Fall 2011 - Lecture 13 Page 7 Luca Carloni – Columbia University Compiler Technology to Improve ILP • Software Pipelining
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 20

csee4824_f11_lec13 - CSEE W4824 Computer Architecture Fall...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online