{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}



Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
21 0272-1732/05/$20.00 2005 IEEE Published by the IEEE computer Society Over the past two decades, micro- processor designers have focused on improv- ing the performance of a single thread in a desktop processing environment by increas- ing frequencies and exploiting instruction level parallelism (ILP) using techniques such as multiple instruction issue, out-of-order issue, and aggressive branch prediction. The emphasis on single-thread performance has shown diminishing returns because of the lim- itations in terms of latency to main memory and the inherently low ILP of applications. This has led to an explosion in microproces- sor design complexity and made power dissi- pation a major concern. For these reasons, Sun Microsystems’ Nia- gara processor takes a radically different approach to microprocessor design. Instead of focusing on the performance of single or dual threads, Sun optimized Niagara for mul- tithreaded performance in a commercial serv- er environment. This approach increases application performance by improving throughput, the total amount of work done across multiple threads of execution. This is especially effective in commercial server appli- cations such as databases 1 and Web services, 2 which tend to have workloads with large amounts of thread level parallelism (TLP). In this article, we present the Niagara processor’s architecture. This is an entirely new implementation of the Sparc V9 archi- tectural specification, which exploits large amounts of on-chip parallelism to provide high throughput. Niagara supports 32 hard- ware threads by combining ideas from chip multiprocessors 3 and fine-grained multi- threading. 4 Other studies 5 have also indicated the significant performance gains possible using this approach on multithreaded work- loads. The parallel execution of many threads effectively hides memory latency. However, having 32 threads places a heavy demand on the memory system to support high band- Poonacha Kongetira Kathirgamar Aingaran Kunle Olukotun Sun Microsystems THE NIAGARA PROCESSOR IMPLEMENTS A THREAD-RICH ARCHITECTURE DESIGNED TO PROVIDE A HIGH-PERFORMANCE SOLUTION FOR COMMERCIAL SERVER APPLICATIONS. THE HARDWARE SUPPORTS 32 THREADS WITH A MEMORY SUBSYSTEM CONSISTING OF AN ON-BOARD CROSSBAR, LEVEL-2 CACHE, AND MEMORY CONTROLLERS FOR A HIGHLY INTEGRATED DESIGN THAT EXPLOITS THE THREAD-LEVEL PARALLELISM INHERENT TO SERVER APPLICATIONS, WHILE TARGETING LOW LEVELS OF POWER CONSUMPTION. NIAGARA: A 32-WAY MULTITHREADED SPARC PROCESSOR
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
width. To provide this bandwidth, a crossbar interconnects scheme routes memory refer- ences to a banked on-chip level-2 cache that all threads share. Four independent on-chip memory controllers provide in excess of 20 Gbytes/s of bandwidth to memory. Exploiting TLP also lets us improve per- formance significantly without pushing the envelope on CPU clock frequency. This and the sharing of CPU pipelines among multi- ple threads enable an area- and power- ef±cient design. Designers expect Niagara to dissipate about 60 W of power, making it
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 9


This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online