codes166-stitt - Traversal Caches: A First Step towards...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Traversal Caches: A First Step towards FPGA Acceleration of Pointer-Based Data Structures Greg Stitt, Gaurav Chaudhari, James Coole University of Florida Department of Electrical and Computer Engineering {gstitt, guchaudhari, jcoole}@ufl.edu ABSTRACT Field-programmable gate arrays (FPGAs) often achieve order of magnitude speedups compared to microprocessors, but typically have been unable to improve the performance of applications with irregular memory access patterns, such as traversals of pointer- based data structures. Due to the common use of these data structures, the applicability and widespread success of FPGAs has been limited. In this paper, we introduce the traversal cache framework – a first step towards improving the performance of FPGA applications that utilize pointer-based data structures. The traversal cache is a local FPGA memory that stores repeated traversals of pointer-based data structures, allowing for these traversals to be efficiently streamed into the FPGA. Although the cache is generally limited to improving applications that exhibit repeated traversals, we show that many applications in fact have this characteristic. Furthermore, we show that few repetitions are needed to achieve performance improvements. We present experimental results showing that FPGA implementations using the traversal cache framework achieve speedups ranging from 7x to 29x compared to pointer-based software on a 3.2 GHz Xeon. Categories and Subject Descriptors C.3 [ Special-Purpose and Application-Based Systems ]: Real- time and embedded systems. General Terms Performance, Design. Keywords Traversal cache, FPGA, pointers, synthesis, CAD, hardware/software partitioning. 1. INTRODUCTION Field-programmable gate arrays (FPGAs) and other reconfigurable computing devices have been shown to achieve 10x to 100x speedups compared to state-of-the-art microprocessors for many applications [5][11]. FPGAs achieve such speedup by exploiting tremendous amounts of parallelism, ranging from the bit level up to the task level. FPGA designs often use heavily pipelined implementations to improve throughput, which greatly increases memory bandwidth requirements [14]. If data cannot be delivered at a sufficient rate, these pipelines frequently stall, often resulting in significant slowdown compared to microprocessors [12]. Due to the need for efficient data transfer, FPGAs have typically been unable to effectively implement code with irregular access patterns [6]. We define an irregular access pattern as any access that does not fetch data sequentially from memory, or patterns that cannot be buffered based on compile time analysis [10]. Although there are many types of problematic irregular access patterns, in this paper we focus on pointer-based patterns, which cause several performance problems for FPGA implementations. First, traversals of pointer-based structures, such as lists, require
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/25/2011 for the course EEL 4930 taught by Professor Staff during the Fall '08 term at University of Florida.

Page1 / 6

codes166-stitt - Traversal Caches: A First Step towards...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online