Tullsen95SMT - Simultaneous Multithreading Maximizing On-Chip Parallelism Dean M Tullsen Susan J Eggers and Henry M Levy Department of Computer

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Simultaneous Multithreading: Maximizing On-Chip Parallelism Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy Department of Computer Science and Engineering University of Washington Seattle, WA 98195 Abstract This paper examines simultaneous multithreading, a technique per- mitting several independent threads to issue instructions to a su- perscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide superscalar, a fine-grain mul- tithreaded processor, and single-chip, multiple-issue multiprocess- ing architectures. Our results show that both (single-threaded) su- perscalar and fine-grain multithreaded architectures are limited in their ability to utilize the resources of a wide-issue processor. Si- multaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi- threading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous multithreading is an attractive alter- native to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources. While simultaneous multithreading has excellent potential to in- crease processor utilization, it can add substantial complexity to the design. We examine many of these complexities and evaluate alternative organizations in the design space. 1 Introduction This paper examines simultaneous mr.dtithreading (SM), a technique that permits several independent threads to issue to multiple func- tional units each cycle. In the most general case, the binding between thread and functional unit is completely dynamic. The objective of SM is to substantially increase processor utilization in the face of both long memory latencies and limited available parallelism per thread, Simultaneous mukithreading combines the multiple-issue- per-instruction features of modem superscalar processors with the latency-hiding ability of multithreaded architectures. It also inherits numerous design challenges from these architectures, e.g., achiev- ing high register file bandwidth, supporting high memory access demands, meeting large forwarding requirements, and scheduling instructions onto functional units. In this paper, we (1) introduce several SM models, most of which limit key aspects of the complex- Permission to copy without fee all or parl of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery.To copy otherwise, or to republish, requires a fee and/or specific permission.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/12/2011 for the course CSEE 4824 taught by Professor Carloni during the Fall '11 term at Columbia.

Page1 / 12

Tullsen95SMT - Simultaneous Multithreading Maximizing On-Chip Parallelism Dean M Tullsen Susan J Eggers and Henry M Levy Department of Computer

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online