reconfig09_tc - A Traversal Cache Framework for FPGA...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
A Traversal Cache Framework for FPGA Acceleration of Pointer Data Structures: A Case Study on Barnes-Hut N-body Simulation James Coole, John Wernsing, Greg Stitt Department of Electrical and Computer Engineering University of Florida Gainesville, FL {jcoole, wernsing, gstitt}@ufl.edu Abstract —Numerous studies have shown that field- programmable gate arrays (FPGAs) often achieve large speedups compared to microprocessors. However, one significant limitation of FPGAs that has prevented their use on important applications is the requirement for regular memory access patterns. Traversal caches were previously introduced to improve the performance of FPGA implementations of algorithms with irregular memory access patterns, especially those traversing pointer-based data structures. However, a significant limitation of previous traversal caches is that speedup was limited to traversals repeated frequently over time, thus preventing speedup for algorithms without repetition, even if the similarity between traversals was large. This paper presents a new framework that extends traversal caches to enable performance improvements in such cases and provides additional improvements through reduced memory accesses and parallel processing of multiple traversals. Most importantly, we show that, for algorithms with highly similar traversals, the traversal cache framework achieves approximately linear kernel speedup with additional area, thus eliminating the memory bandwidth bottleneck commonly associated with FPGAs. We evaluate the framework using a Barnes-Hut n-body simulation case study, showing application speedups ranging from 12x to 13.5x on a Virtex4 LX100 with projected speedups as high as 40x on today’s largest FPGAs. Keywords-FPGA, traversal cache, pointers, speedup I. INTRODUCTION Much previous work has shown that field-programmable gate arrays (FPGAs) can achieve order of magnitude speedups compared to microprocessors for many important embedded and scientific computing applications [3][4][9]. However, one limitation of FPGAs that has prevented widespread usage is the requirement for regular memory access patterns (i.e., sequential streaming of data from memory) due to the heavily-pipelined circuits common in FPGA implementations. Applications with irregular memory access patterns, such as pointer-based data structure traversals, achieve much lower memory bandwidth due to increased row-address-strobe (RAS) operations and memory indirection caused by pointer accesses. Consequently, this lower memory bandwidth results in many pipeline stalls, which waste large amounts of parallelism, often resulting in limited or even no FPGA speedup. Previous work [12] partially addressed the inefficiency of irregular memory access patterns on FPGAs by using a traversal cache framework that identified repeated traversals of pointer-based data structures (which had irregular access patterns), reordered the repeated traversals into a sequential sequence of data, and then stored the reordered data into a
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 6

reconfig09_tc - A Traversal Cache Framework for FPGA...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online