lect.11.Memory.4up

Lect.11.Memory.4up - Announcements TBD Lecture 11 Memory Design Christos Kozyrakis Stanford University http/eeclass.stanford.edu/ee108b C Kozyrakis

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
EE 108b Lecture 11 C. Kozyrakis 1 Lecture 11 Memory Design Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b EE 108b Lecture 11 C. Kozyrakis 2 Announcements • TBD EE 108b Lecture 11 C. Kozyrakis 3 Review: Reducing Branch Control Hazard Time (clock cycles) beq r1, r2, L sub r4, r1, r3 L: and r6, r2, r7 or r8, r7, r9 add r1, r2, r1 IF ID/RF EX MEM WB ALU Im Reg Dm Reg Im Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg = Reg EE 108b Lecture 11 C. Kozyrakis 4 Branch CPI 65% branches are taken 50% of single delay slots are filled usefully (1 • 0.15)•0.50 = 0.075 Delayed branch ID (1 • 0.15)•0.65 + 0•0.35 = 0.098 Predict Not taken ID (1 • 0.15)•0.65 + (3 • 0.15) •0.35 = 0.255 Predict taken ID (target) / MEM (=) 1 • 0.15 = 0.15 Predict taken ID 1 • 0.15 = 0.15 stall ID 3 • 0.15 = 0.45 stall MEM Branch CPI Branch strategy Control point
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
EE 108b Lecture 11 C. Kozyrakis 5 Review: Advanced Pipelining • Where have all the transistors gone? – MIPS R3000 : 120 thousand transistors – Intel Pentium 4 : 160 Million transistors – Many transistors in the cache • Fancy techniques to decrease CPI and increase clock frequency – Superscalar (multiple instruction execution) – Deep pipelining – Dynamic scheduling (out-of-order execution) – Dynamic branch prediction – Register renaming • All pipelining techniques exploit instruction-level parallelism (ILP) EE 108b Lecture 11 C. Kozyrakis 6 Review: What is ILP? • Independence among instructions • Example with ILP • add $t0, $t1, $t2 or $t3, $t1, $t2 sub $t4, $t1, $t2 and $t5, $t1, $t2 • Example with no ILP • add $t0, $t1, $t2 or $t3, $t0, $t2 sub $t4, $t3, $t2 and $t5, $t4, $t2 ILP in real programs is limited EE 108b Lecture 11 C. Kozyrakis 7 Limits of Advanced Pipelining • Limited ILP in real programs • Pipeline overhead – Branch and load delays exacerbated – Clock cycle timing limits • Limited branch prediction accuracy (85%-98%) – Even a few percent really hurts with long/wide pipes! • Memory inefficiency – Load delays + # of loads/cycle • Complexity of implementation – Too expensive to design – Requires too much power management • We have reached the limits today! Clock Cycle FF Mux Logic FF Mux Logic EE 108b Lecture 11 C. Kozyrakis 8 Five Components Processor Computer Control Datapath Memory Devices Input Output •Datapath •Control •Memory •Input •Output
Background image of page 2
EE 108b Lecture 11 C. Kozyrakis 9 Memory Systems Outline • Memory System Basics – A little history – Memory technologies • SRAM, DRAM, Disk • Principle of locality – Program characteristics • Memory Hierarchies – Exploiting locality EE 108b Lecture 11 C. Kozyrakis 10 In The Beginning • In the earliest electronic computing memory was very hard – Memory devices used dots on a CRT screen, and stuff that was even worse! – In late 40’s core memory was invented
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/08/2011 for the course EE 108B at Stanford.

Page1 / 11

Lect.11.Memory.4up - Announcements TBD Lecture 11 Memory Design Christos Kozyrakis Stanford University http/eeclass.stanford.edu/ee108b C Kozyrakis

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online