L9_SMT - Simultaneous Multithreading Derived from David...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
EE504 Lecture 9 Simultaneous Multithreading Derived from David Patterson’s CS252  Dr. Yifong Shih Northwestern Polytechnic University  Spring 2008
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
01/20/11 CS252 S06 Lec9 Limits and SMT 2 Performance beyond single thread ILP There can be much higher natural parallelism in some applications (e.g., Database or Scientific codes) Explicit Thread Level Parallelism or Data Level Parallelism Thread : process with own instructions and data thread may be a process part of a parallel program of multiple processes, or it may be an independent program Each thread has all the state (instructions, data, PC, register state, and so on) necessary to allow it to execute Data Level Parallelism : Perform identical operations on data, and lots of data
Background image of page 2
Data Level Parallelism One instruction operates on a multitude of data. Generic names for DLP are streaming instructions or Single Instruction Multiple Data (SIMD) Intel’s earlier effort is MMX; the latest incarnation is SSE (Streaming SIMD Extensions), comprised of 70 SIMD instructions and 8 128-bit registers, each may pack up to four 32-bit integers Application examples: DSP, MPEG, 3D graphics 01/20/11 CS252 S06 Lec9 Limits and SMT 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
01/20/11 CS252 S06 Lec9 Limits and SMT 4 Thread Level Parallelism (TLP) ILP exploits implicit parallel operations within a loop or straight-line code segment TLP explicitly represented by the use of multiple threads of execution that are inherently parallel Goal: Use multiple instruction streams to improve 1. Throughput of computers that run many programs 2. Execution time of multi-threaded programs TLP could be more cost-effective to exploit than ILP
Background image of page 4
01/20/11 CS252 S06 Lec9 Limits and SMT 5 New Approach: Mulithreaded Execution Multithreading: multiple threads to share the functional units of 1 processor via overlapping processor must duplicate independent state of each thread e.g., a separate copy of register file, a separate PC, and for running independent programs, a separate page table memory shared through the virtual memory mechanisms, which already support multiple processes HW for fast thread switch; much faster than full process switch 100s to 1000s of clocks When switch? Alternate instruction per thread (fine grain) When a thread is stalled, perhaps for a cache miss, another thread can be executed (coarse grain)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
01/20/11 CS252 S06 Lec9 Limits and SMT 6 Fine-Grained Multithreading Switches between threads on each instruction, causing the execution of multiples threads to be interleaved Usually done in a round-robin fashion, skipping
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/19/2011 for the course EE EE504 taught by Professor Drlao during the Fall '10 term at College of the North Atlantic.

Page1 / 23

L9_SMT - Simultaneous Multithreading Derived from David...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online