This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W Lee , Changkyu Kim , Jatin Chhugani , Michael Deisher , Daehyun Kim , Anthony D. Nguyen , Nadathur Satish , Mikhail Smelyanskiy , Srinivas Chennupaty , Per Hammarlund , Ronak Singhal and Pradeep Dubey email@example.com Throughput Computing Lab, Intel Corporation Intel Architecture Group, Intel Corporation ABSTRACT Recent advances in computing have led to an explosion in the amount of data being generated. Processing the ever-growing data in a timely manner has made throughput computing an important as- pect for emerging applications. Our analysis of a set of important throughput computing kernels shows that there is an ample amount of parallelism in these kernels which makes them suitable for to- days multi-core CPUs and GPUs. In the past few years there have been many studies claiming GPUs deliver substantial speedups (be- tween 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after ap- plying optimizations appropriate for both CPUs and GPUs the per- formance gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor narrows to only 2.5x on average. In this pa- per, we discuss optimization techniques for both CPU and GPU, analyze what architecture features contributed to performance dif- ferences between the two architectures, and recommend a set of architectural features which provide significant improvement in ar- chitectural efficiency for throughput kernels. Categories and Subject Descriptors C.1.4 [ Processor Architecture ]: Parallel architectures ; C.4 [ Performance of Systems ]: Design studies ; D.3.4 [ Software ]: Processors Optimization General Terms Design, Measurement, Performance Keywords CPU architecture, GPU architecture, Performance analysis, Perfor- mance measurement, Software optimization, Throughput Comput- ing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISCA10, June 1923, 2010, Saint-Malo, France. Copyright 2010 ACM 978-1-4503-0053-7/10/06 ...$10.00. 1. INTRODUCTION The past decade has seen a huge increase in digital content as more documents are being created in digital form than ever be- fore. Moreover, the web has become the medium of choice for storing and delivering information such as stock market data, per- sonal records, and news. Soon, the amount of digital data will ex- ceed exabytes (10 18 ) . The massive amount of data makes stor- ing, cataloging, processing, and retrieving information challenging.ing, cataloging, processing, and retrieving information challenging....
View Full Document
This note was uploaded on 11/28/2011 for the course COMP 790 taught by Professor Staff during the Fall '08 term at UNC.
- Fall '08