Lecture 24 - TLP (2010-04-08)

Lecture 24 - TLP (2010-04-08) - CDA 5106 Advanced Computer...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CDA 5106 Advanced Computer Architecture I Thread Thread-Level Parallelism Level Parallelism School of Electrical Engineering and Computer Science University of Central Florida Motivation N it / l (assume very good branch prediction) IPC instr./cycle more data dependences, cache misses, etc. (current state-of-art branch prediction) University of Central Florida 2 Instruction window size
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Resource Utilization (small instruction window) Cache miss Cache miss repaired University of Central Florida 3 time Resource Utilization (large instruction window) Cache miss Cache miss repaired University of Central Florida 4 time
Background image of page 2
3 Resource Utilization (cont.) • Limited instruction-level parallelism from single thread • How about multiple threads? => multithreading University of Central Florida 5 Multithreading • Run multiple independent programs at same time • Hardware support for multithreading – Replicate some or all resources • Fetch multiple programs – Replicate state • Each program needs its own context • Context = register state + memory state – Mechanism to switch context (?) University of Central Florida 6 Mechanism to switch context (?)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Multithreading on a Chip • Two major alternatives – Simultaneous multithreading (SMT) – Chip multiprocessor (CMP – Chip multiprocessor (CMP) University of Central Florida 7 SMT Motivation • Wide superscalar is the way to go – Must continue to improve single-program performance • Multiple programs share fetch & issue bandwidth – Thread-level parallelism (TLP) improves utilization of wide superscalar – Can still exploit high ILP in a single program University of Central Florida 8
Background image of page 4
5 CMP Motivation • Wide superscalar not the way to go – Trades fast clock for minor IPC gain – Design and verification complexity • Multiple simple processors (e.g., 2-way or 4-way superscalar) – Exploit TLP – Moderate ILP plus fast clock – Use multiple cores to exploit single-thread ILP University of Central Florida 9 thread 1 thread 2 thread 3 single-threaded SMT University of Central Florida 10 CMP
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 More multithreading architectures Static partitioning of execution resources Dynamic partitioning of execution resources Spatial partition Temporal partition Per cycle Per functional unit University of Central Florida 11 (a) CMP (b) FGMT (c) CGMT (d) SMT CMP: IBM Power4 Fine-grain multi-threading (FGMT): CDC-6600 Coarse-grain multi-threading (CGMT): Northstar/Pulsar PowerPC SMT: Intel Pentium 4 ISCA ISCA-23 SMT Paper 23 SMT Paper • D. Tullsen, S. Eggers, J. Emer, et. al., “Exploiting choice:
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/22/2010 for the course CDA 5106 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Page1 / 18

Lecture 24 - TLP (2010-04-08) - CDA 5106 Advanced Computer...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online