CSE502_lec12+13-SMTS10

CSE502_lec12+13-SMTS10 - CSE 502 Graduate Computer...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 502 Graduate Computer Architecture Lec 12-13 Threading & Simultaneous Multithreading Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson, UC-Berkeley cs252-s06 3/15-17/10 1 CSE502-F09, Lec 11+12-Threads & SMT 3/15-17/10 CSE502-F09, Lec 11+12-Threads & SMT 2 Outline Thread Level Parallelism (from H&P Chapter 3) Multithreading Simultaneous Multithreading Power 4 vs. Power 5 Head to Head: VLIW vs. Superscalar vs. SMT Commentary Conclusion Read Chapter 3 of Text Next Reading Assignment: Vector Appendix F 3/15-17/10 CSE502-F09, Lec 11+12-Threads & SMT 3 Performance beyond single thread ILP ILP for arbitrary code is limited now to 3 to 6 issues/cycle, there can be much higher natural parallelism in some applications (e.g., database or scientific codes) Explicit (specified by compiler) Thread Level Parallelism or Data Level Parallelism Thread : a process with its own instructions and data (or much harder on compiler: carefully selected, rarely interacting code segments in the same process) A thread may be one process that is part of a parallel program of multiple processes, or it may be an independent program Each thread has all the state (instructions, data, PC, register state, and so on) necessary to allow it to execute Data Level Parallelism : Perform identical (lock- step) operations on data and have lots of data 3/15-17/10 CSE502-F09, Lec 11+12-Threads & SMT 4 Thread Level Parallelism (TLP) ILP (last lectures) exploits implicitly parallel operations within a loop or straight-line code segment TLP is explicitly represented by the use of multiple threads of execution that are inherently parallel Goal: Use multiple instruction streams to improve 1.Throughput of computers that run many programs 2.Execution time of multi-threaded programs TLP could be more cost-effective to exploit than ILP for many applications. 3/15-17/10 CSE502-F09, Lec 11+12-Threads & SMT 5 New Approach: Multithreaded Execution Multithreading: multiple threads to share the functional units of one processor via overlapped execution processor must duplicate independent state of each thread, e.g., a separate copy of register file, a separate PC, and if running as independent programs, a separate page table memory shared through the virtual memory mechanisms, which already support multiple processes HW for fast thread switch (0.1 to 10 clocks) is much faster than a full process switch (100s to 1000s of clocks) that copies state (state = registers and memory) When switch among threads? Alternate instructions from new threads (fine grain) When a thread is stalled, perhaps for a cache miss, another thread can be executed (coarse grain) In cache-less multiprocessors, at start of each memory access 3/15-17/10 CSE502-F09, Lec 11+12-Threads & SMT 6 For most applications, the processing units stall 80% or more of time during execution...
View Full Document

Page1 / 29

CSE502_lec12+13-SMTS10 - CSE 502 Graduate Computer...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online