Lecture 11 - Multithreading

Lecture 11 - Multithreading - ECE565: Computer Architecture...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
ECE565: Computer Architecture Instructor: Vijay S. Pai Fall 2011 1 ECE 565, Fal 2011 ECE 565, Fal 2011 2 This Unit: Multithreading (MT) • Why multithreading (MT)? • Utilization vs. performance • Three implementations • Coarse-grained MT • Fine-grained MT • Simultaneous MT (SMT) • MT for reliability • Redundant multithreading • Multithreading for performance • Speculative multithreading Application OS Firmware Compiler I/O Memory Digital Circuits CPU ECE 565, Fal 2011 3 Performance And Utilization • Performance (IPC) important • Utilization (actual IPC / peak IPC) important too Even moderate superscalars (e.g., 4-way) not fully utilized Average sustained IPC: 1.5–2 <50% utilization Mis-predicted branches Cache misses, especially L2 Data dependences Multi-threading (MT) Improve utilization by multiplexing multiple threads on single CPU One thread cannot fully utilize CPU? Maybe 2, 4 (or 100) can
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
ECE 565, Fal 2011 4 Latency vs Throughput MT trades (single-thread) latency for throughput – Sharing processor degrades latency of individual threads + But improves aggregate latency of both threads + Improves utilization • Example • Thread A: individual latency=10s, latency with thread B=15s • Thread B: individual latency=20s, latency with thread A=25s • Sequential latency (first A then B or vice versa): 30s • Parallel latency (A and B simultaneously): 25s – MT slows each thread by 5s + But improves total latency by 5s Different workloads have different parallelism • SpecFP has lots of ILP (can use an 8-wide machine) • Server workloads have TLP (can use multiple threads) ECE 565, Fal 2011 5 MT Implementations: Similarities • How do multiple threads share a single processor? • Different sharing mechanisms for different kinds of structures • Depend on what kind of state structure stores No state : ALUs Dynamically shared Persistent hard state (aka “context”) : PC, registers • Replicated Persistent soft state : caches, bpred • Dynamically partitioned (like on a multi-programmed uni-processor) • TLBs need ASIDs, caches/bpred tables don’t Exception: ordered “soft” state (BHR, RAS) is replicated Transient state : pipeline latches, ROB, RS • Partitioned … somehow
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/10/2012 for the course ECE 565 taught by Professor Pai during the Fall '11 term at Purdue University-West Lafayette.

Page1 / 7

Lecture 11 - Multithreading - ECE565: Computer Architecture...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online