This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Compilation for Explicitly Managed Memory Hierarchies Timothy J. Knight Ji Young Park Manman Ren Mike Houston Mattan Erez Kayvon Fatahalian Alex Aiken William J. Dally Pat Hanrahan Stanford University Abstract We present a compiler for machines with an explicitly managed memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the ma- chine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor. Categories and Subject Descriptors D.3.4 [ Programming Lan- guages ]: ProcessorsCompilers, Optimization; C.1.4 [ Processor Architectures ]: Parallel ArchitecturesDistributed architectures General Terms Performance, Design, Experimentation Keywords Software-managed memory hierarchy, bulk operations 1. Introduction The advances in semiconductor technology that have dramatically increased the performance possible on a single chip have also un- dermined the classical random-access model of memory: the idea that a processor can access every memory address in a mostly uni- form, and tolerable, amount of time. Instead, there is a large and still growing gap between the processing capacity of functional units and the available global on-chip and off-chip memory band- width needed to supply those functional units with data. Addition- ally, the latency to access off-chip memory and large on-chip mem- ory structures is growing when compared with arithmetic through- put. For decades the standard solution to this problem has been to bridge the gap with hardware-managed caches. An emerging class of high performance architectures, including the Sony/Toshiba/IBM Cell Broadband Engine Processor TM  (Cell), the ClearSpeed CSX600 , and academic projects such as Stanfords Imagine and Merrimac [12, 24], seek to achieve much higher performance and efficiency by exposing a hierarchy of dis- tinct memories managed explicitly in software. Machines with an explicitly managed memory hierarchy are distinguished from con- ventional cache architectures by three key characteristics. First, processing is highly parallel: multiple high-peak-performance pro- cessing elements (PEs) execute in isolation entirely out of a local level of the memory hierarchy. Second, individual PE local mem- ories are not virtualized by hardware address translationno PE This work is supported in part by the Department of Energy under the ASCI Alliances program (Contract LLNLB523583), in part by National Science Foundation Graduate Research Fellowships, in part by an Intel Foundation Graduate Fellowship, and by donations from the IBM Corporation....
View Full Document
This note was uploaded on 09/28/2009 for the course CS 525 taught by Professor Rjyosy during the Winter '09 term at Central Mich..
- Winter '09