MapReduceDBMS-stonebraker - Doi:10.1145 1629175.1629197...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
64 COMMUNICATIONS OF THE ACM | JANUARY 2010 | VOL. 53 | NO. 1 THE MAPREDUCE 7 (MR) PARADIGM has been hailed as a revolutionary new platform for large-scale, massively parallel data access. 16 Some proponents claim the extreme scalability of MR will relegate relational database management systems (DBMS) to the status of legacy technology. At least one enterprise, Facebook, has implemented a large data warehouse system using MR technology rather than a DBMS. 14 Here, we argue that using MR systems to perform tasks that are best suited for DBMSs yields less than satisfactory results, 17 concluding that MR is more like an extract-transform-load (ETL) system than a DOI:10.1145/1629175.1629197 MapReduce complements DBMSs since databases are not designed for extract- transform-load tasks, a MapReduce specialty. BY MICHAEL STONEBRAKER, DANIEL ABADI, DAVID J. DEWITT, SAM MADDEN, ERIK PAULSON, ANDREW PAVLO, AND ALEXANDER RASIN MapReduce and Parallel DBMSs: Friends or Foes? DBMS, as it quickly loads and pro- cesses large amounts of data in an ad hoc manner. As such, it comple- ments DBMS technology rather than competes with it. We also discuss the differences in the architectural deci- sions of MR systems and database systems and provide insight into how the systems should complement one another. The technology press has been fo- cusing on the revolution of “cloud computing,” a paradigm that entails the harnessing of large numbers of processors working in parallel to solve computing problems. In effect, this suggests constructing a data center by lining up a large number of low-end servers, rather than deploying a small- er set of high-end servers. Along with this interest in clusters has come a proliferation of tools for programming them. MR is one such tool, an attrac- tive option to many because it provides a simple model through which users are able to express relatively sophisti- cated distributed programs. Given the interest in the MR model both commercially and academically, it is natural to ask whether MR sys- tems should replace parallel database systems. Parallel DBMSs were first available commercially nearly two de- cades ago, and, today, systems (from about a dozen vendors) are available. As robust, high-performance comput- ing platforms, they provide a high- level programming environment that is inherently parallelizable. Although it might seem that MR and parallel DBMSs are different, it is possible to write almost any parallel-processing task as either a set of database queries or a set of MR jobs. Our discussions with MR users lead us to conclude that the most common use case for MR is more like an ETL sys- tem. As such, it is complementary to DBMSs, not a competing technology, since databases are not designed to be good at ETL tasks. Here, we describe what we believe is the ideal use of MR technology and highlight the different MR and parallel DMBS markets.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 8

MapReduceDBMS-stonebraker - Doi:10.1145 1629175.1629197...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online