07_superscalar

07_superscalar - Remainder of CIS501: Parallelism Last...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CIS 501 (Martin): Superscalar 1 CIS 501 Computer Architecture Unit 7: Superscalar Slides originally developed by Amir Roth with contributions by Milo Martin at University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. CIS 501 (Martin): Superscalar 2 Remainder of CIS501: Parallelism • Last unit: pipeline-level parallelism • Work on execute of one instruction in parallel with decode of next • Next: instruction-level parallelism (ILP) • Execute multiple independent instructions fully in parallel • Today: multiple issue • After that: dynamic scheduling • Extract much more ILP via out-of-order processing • Data-level parallelism (DLP) • Single-instruction, multiple data • Example: one instruction, four 16-bit adds (using 64-bit registers) • Thread-level parallelism (TLP) • Multiple software threads running on multiple cores This Unit: Superscalar Execution • Superscalar scaling issues • Multiple fetch and branch prediction • Dependence-checks & stall logic • Wide bypassing • Register file & cache bandwidth • Multiple-issue designs • Superscalar • VLIW and EPIC (Itanium) CIS 501 (Martin): Superscalar 3 CPU Mem I/O System software App App App Readings • Textbook (MA:FSPTCM) Sections 3.1, 3.2 (but not “Sidebar” in 3.2), 3.5.1 Sections 4.2, 4.3, 5.3.3 CIS 501 (Martin): Superscalar 4
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CIS 501 (Martin): Superscalar 5 Scalar Pipeline and the Flynn Bottleneck So far we have looked at scalar pipelines One instruction per stage With control speculation, bypassing, etc. Performance limit (aka “Flynn Bottleneck”) is CPI = IPC = 1 Limit is never even achieved (hazards) Diminishing returns from “super-pipelining” (hazards + overhead) regfile D$ I$ B P CIS 501 (Martin): Superscalar 6 Multiple-Issue Pipeline Overcome this limit using multiple issue Also called superscalar Two instructions per stage at once, or three, or four, or eight… “Instruction-Level Parallelism (ILP)” [Fisher, IEEE TC’81] • Today, typically “4-wide” (Intel Core i7, AMD Opteron) Some more (Power5 is 5-issue; Itanium is 6-issue) Some less (dual-issue is common for simple cores) regfile D$ I$ B P CIS 501 (Martin): Superscalar 7 Superscalar Pipeline Diagrams - Ideal scalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r14,r15 ! r6 F D X M W add r12,r13 ! r7 F D X M W add r17,r16 ! r8 F D X M W lw 0(r18) ! r9 F D X M W 2-way superscalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r14,r15 ! r6 F D X M W add r12,r13 ! r7 F D X M W add r17,r16 ! r8 F D X M W lw 0(r18) ! r9 F D X M W CIS 501 (Martin): Superscalar 8 Superscalar Pipeline Diagrams - Realistic scalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r4,r5 ! r6 F d* D X M W add r2,r3 ! r7 F D X M W add r7,r6 ! r8 F D X M W lw 0(r8) ! r9 F D X M W 2-way superscalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r4,r5 ! r6 F d* d* D X M W add r2,r3 ! r7 F d* D X
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/19/2011 for the course CS 501 taught by Professor Matin during the Fall '10 term at UPenn.

Page1 / 13

07_superscalar - Remainder of CIS501: Parallelism Last...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online