{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

07_superscalar

07_superscalar - Remainder of CIS501 Parallelism Last unit...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CIS 501 (Martin): Superscalar 1 CIS 501 Computer Architecture Unit 7: Superscalar Slides originally developed by Amir Roth with contributions by Milo Martin at University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. CIS 501 (Martin): Superscalar 2 Remainder of CIS501: Parallelism Last unit: pipeline-level parallelism Work on execute of one instruction in parallel with decode of next Next: instruction-level parallelism (ILP) Execute multiple independent instructions fully in parallel Today: multiple issue After that: dynamic scheduling Extract much more ILP via out-of-order processing Data-level parallelism (DLP) Single-instruction, multiple data Example: one instruction, four 16-bit adds (using 64-bit registers) Thread-level parallelism (TLP) Multiple software threads running on multiple cores This Unit: Superscalar Execution Superscalar scaling issues Multiple fetch and branch prediction Dependence-checks & stall logic Wide bypassing Register file & cache bandwidth Multiple-issue designs • Superscalar VLIW and EPIC (Itanium) CIS 501 (Martin): Superscalar 3 CPU Mem I/O System software App App App Readings Textbook (MA:FSPTCM) Sections 3.1, 3.2 (but not “Sidebar” in 3.2), 3.5.1 Sections 4.2, 4.3, 5.3.3 CIS 501 (Martin): Superscalar 4
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
CIS 501 (Martin): Superscalar 5 Scalar Pipeline and the Flynn Bottleneck So far we have looked at scalar pipelines One instruction per stage With control speculation, bypassing, etc. Performance limit (aka “Flynn Bottleneck”) is CPI = IPC = 1 Limit is never even achieved (hazards) Diminishing returns from “super-pipelining” (hazards + overhead) regfile D$ I$ B P CIS 501 (Martin): Superscalar 6 Multiple-Issue Pipeline Overcome this limit using multiple issue Also called superscalar Two instructions per stage at once, or three, or four, or eight… “Instruction-Level Parallelism (ILP)” [Fisher, IEEE TC’81] Today, typically “4-wide” (Intel Core i7, AMD Opteron) Some more (Power5 is 5-issue; Itanium is 6-issue) Some less (dual-issue is common for simple cores) regfile D$ I$ B P CIS 501 (Martin): Superscalar 7 Superscalar Pipeline Diagrams - Ideal scalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r14,r15 ! r6 F D X M W add r12,r13 ! r7 F D X M W add r17,r16 ! r8 F D X M W lw 0(r18) ! r9 F D X M W 2-way superscalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r14,r15 ! r6 F D X M W add r12,r13 ! r7 F D X M W add r17,r16 ! r8 F D X M W lw 0(r18) ! r9 F D X M W CIS 501 (Martin): Superscalar 8 Superscalar Pipeline Diagrams - Realistic scalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r4,r5 ! r6 F d* D X M W add r2,r3 ! r7 F D X M W add r7,r6 ! r8 F D X M W lw 0(r8) ! r9 F D X M W 2-way superscalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) ! r2 F D X M W lw 4(r1) ! r3 F D X M W lw 8(r1) ! r4 F D X M W add r4,r5 ! r6 F d* d* D X M W add r2,r3 ! r7 F d* D X
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}