{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}


Need to have very fast thread switching doesnt slow

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: threads only on costly stalls, such as L2 cache misses • Advantages – Relieves need to have very fast thread-switching – Doesn’t slow down thread, since instructions from other th threads issued only when the thread encounters a costly th th stall • Disadvantage: hard to overcome throughput losses from shorter stalls due to pipeline start from shorter stalls, due to pipeline start-up costs costs – Since CPU issues instructions from 1 thread, when a stall occurs, the pipeline must be emptied or frozen – New thread must fill pipeline before instructions can thread must fill pipeline before instructions can complete • Because of this start-up overhead, coarse-grained multithreading is better for reducing penalty of high cost stalls, where pipeline refill << stall time • Used in IBM AS/400 (Power III processor) 6 For most apps, most execution units lie idle 8-way superscalar SHORT INTEGER From: Tullsen, Eggers, and Levy, “Simultaneous Multithreading: Maximizing On-chip Parallelism, ISCA 1995. 7 Do both ILP and TLP? • TLP and ILP exploit two different kinds of parallel structure in a program • Could a processor oriented at ILP exploit TLP? – functional units are often idle in data path designed for ILP because of either stalls or dependences in the code • Could the TLP be used as a source of the TLP be used as source of independent instructions that might keep the processor busy during stalls? Could TLP be used to employ the functional units that would otherwise lie idle when insufficient ILP exists? 8 Simultaneous Multi-threading ... One thread, 8 units Cycle M M FX FX FP FP BR CC Two threads, 8 units Cycle M M FX FX FP FP BR CC 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 M = Load/Store, FX = Fixed Point, FP = Floating Point, BR = Branch, CC = Condition Codes 9 Simultaneous Multithreading (SMT) • Based on the insight that a dynamically scheduled processor already has many HW mechanisms to support multithreading – Large set of virtual registers that can be used to hold the register sets of independent threads – Register renaming provides unique register identifiers, renaming unique register identifiers, so instructions from multiple threads can be mixed in datapath without confusing sources and destinations across threads threads – Out-of-order completion allows the threads to execute out of order, and get better utilization of the HW Per thread – renaming table – PC PC – reorder buffer (for independent commitment1999 10 ) Source: Micrprocessor Report, December 6, “Compaq Chooses SMT for Alpha” Time (processor cycle) Multithreaded Categories Superscalar Fine-Grained Coarse-Grained Multiprocessing Simultaneous Multithreading Thread 1 Thread 3 Thread 5 Thread 2 Thread 4 Idle slot 11 Design Challenges in SMT • Since SM...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online