{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

dryadlinq - DryadLINQ A System for General-Purpose...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Yuan Yu Michael Isard Dennis Fetterly Mihai Budiu Úlfar Erlingsson 1 Pradeep Kumar Gunda Jon Currey Microsoft Research Silicon Valley 1 joint affiliation, Reykjavík University, Iceland Abstract DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale dis- tributed computing. It generalizes previous execution en- vironments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed .NET objects; and by supporting general-purpose imperative and declarative operations on datasets within a traditional high-level programming language. A DryadLINQ program is a sequential program com- posed of LINQ expressions performing arbitrary side- effect-free transformations on datasets, and can be writ- ten and debugged using standard .NET development tools. The DryadLINQ system automatically and trans- parently translates the data-parallel portions of the pro- gram into a distributed execution plan which is passed to the Dryad execution platform. Dryad, which has been in continuous operation for several years on production clusters made up of thousands of computers, ensures ef- ficient, reliable execution of this plan. We describe the implementation of the DryadLINQ compiler and runtime. We evaluate DryadLINQ on a varied set of programs drawn from domains such as web-graph analysis, large-scale log mining, and machine learning. We show that excellent absolute performance can be attained—a general-purpose sort of 10 12 Bytes of data executes in 319 seconds on a 240-computer, 960- disk cluster—as well as demonstrating near-linear scal- ing of execution time on representative applications as we vary the number of computers used for a job. 1 Introduction The DryadLINQ system is designed to make it easy for a wide variety of developers to compute effectively on large amounts of data. DryadLINQ programs are written as imperative or declarative operations on datasets within a traditional high-level programming language, using an expressive data model of strongly typed .NET objects. The main contribution of this paper is a set of language extensions and a corresponding system that can auto- matically and transparently compile imperative programs in a general-purpose language into distributed computa- tions that execute efficiently on large computing clusters. Our goal is to give the programmer the illusion of writing for a single computer and to have the sys- tem deal with the complexities that arise from schedul- ing, distribution, and fault-tolerance. Achieving this goal requires a wide variety of components to inter- act, including cluster-management software, distributed- execution middleware, language constructs, and devel- opment tools. Traditional parallel databases (which we survey in Section 6.1) as well as more recent data-processing systems such as MapReduce [15] and Dryad [26] demonstrate that it is possible to implement
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern