{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lect.11.Memory.4up

lect.11.Memory.4up - Announcements TBD Lecture 11 Memory...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
EE 108b Lecture 11 C. Kozyrakis 1 Lecture 11 Memory Design Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b EE 108b Lecture 11 C. Kozyrakis 2 Announcements TBD EE 108b Lecture 11 C. Kozyrakis 3 Review: Reducing Branch Control Hazard Time (clock cycles) beq r1, r2, L sub r4, r1, r3 L: and r6, r2, r7 or r8, r7, r9 add r1, r2, r1 IF ID/RF EX MEM WB ALU Im Reg Dm Reg ALU Im Dm Reg ALU Im Reg Dm Reg Im ALU Reg Dm Reg Im ALU Reg Dm Reg = Reg EE 108b Lecture 11 C. Kozyrakis 4 Branch CPI 65% branches are taken 50% of single delay slots are filled usefully (1 • 0.15)•0.50 = 0.075 Delayed branch ID (1 • 0.15)•0.65 + 0•0.35 = 0.098 Predict Not taken ID (1 • 0.15)•0.65 + (3 • 0.15) •0.35 = 0.255 Predict taken ID (target) / MEM (=) 1 • 0.15 = 0.15 Predict taken ID 1 • 0.15 = 0.15 stall ID 3 • 0.15 = 0.45 stall MEM Branch CPI Branch strategy Control point
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
EE 108b Lecture 11 C. Kozyrakis 5 Review: Advanced Pipelining Where have all the transistors gone? MIPS R3000 : 120 thousand transistors Intel Pentium 4 : 160 Million transistors Many transistors in the cache Fancy techniques to decrease CPI and increase clock frequency Superscalar (multiple instruction execution) Deep pipelining Dynamic scheduling (out-of-order execution) Dynamic branch prediction Register renaming All pipelining techniques exploit instruction-level parallelism (ILP) EE 108b Lecture 11 C. Kozyrakis 6 Review: What is ILP? Independence among instructions Example with ILP • add $t0, $t1, $t2 or $t3, $t1, $t2 sub $t4, $t1, $t2 and $t5, $t1, $t2 Example with no ILP • add $t0, $t1, $t2 or $t3, $t0, $t2 sub $t4, $t3, $t2 and $t5, $t4, $t2 ILP in real programs is limited EE 108b Lecture 11 C. Kozyrakis 7 Limits of Advanced Pipelining Limited ILP in real programs Pipeline overhead Branch and load delays exacerbated Clock cycle timing limits Limited branch prediction accuracy (85%-98%) Even a few percent really hurts with long/wide pipes! Memory inefficiency Load delays + # of loads/cycle Complexity of implementation Too expensive to design Requires too much power management We have reached the limits today! Clock Cycle FF Mux Logic FF Mux Logic EE 108b Lecture 11 C. Kozyrakis 8 Five Components Processor Computer Control Datapath Memory Devices Input Output •Datapath •Control •Memory •Input •Output
Background image of page 2
EE 108b Lecture 11 C. Kozyrakis 9 Memory Systems Outline Memory System Basics A little history Memory technologies SRAM, DRAM, Disk Principle of locality Program characteristics Memory Hierarchies Exploiting locality EE 108b Lecture 11 C. Kozyrakis 10 In The Beginning In the earliest electronic computing memory was very hard Memory devices used dots on a CRT screen, and stuff that was even worse! In late 40’s core memory was invented This was the dominant memory until the mid 70’s Threaded by hand!
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}