clement-sosp09 - UpRight Cluster Services Allen Clement,...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: UpRight Cluster Services Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, Mike Dahlin, Taylor Rich Department of Computer Sciences The University of Texas at Austin Austin, Texas, USA {aclement, manos, sangmin, yangwang lorenzo, dahlin, riche}@cs.utexas.edu ABSTRACT The UpRight library seeks to make Byzantine fault toler- ance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices in UpRight favor simplifying adoption by existing applications; performance is a secondary con- cern. Despite these priorities, our BFT Zookeeper and BFT HDFS implementations have performance comparable with the originals while providing additional robustness. Categories and Subject Descriptors C.2.4 [ Computer Systems Organization ]: Distributed Systems Client/server ; D.4.5 [ Operating Systems ]: Re- liability Fault-tolerance General Terms Design, Reliability Keywords Byzantine fault tolerance, Cluster services, Reliability 1. INTRODUCTION Our objective is to make Byzantine fault tolerance (BFT) something that practitioners can easily adopt both to safe- guard availability (keeping systems up) and to safeguard cor- rectness (keeping systems right.) To that end, we construct UpRight, a new library for fault tolerant replication, and we use it to build BFT versions of two widely-deployed open- source crash fault tolerant (CFT) systems, the Zookeeper coordination service [35] and the Hadoop Distributed File system (HDFS) [16]. Practitioners routinely pay non-trivial costs to tolerate crash failures (e.g., off-line backup, on-line redundancy [10, 15], Paxos [6,20,31]). However, although non-crash failures Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SOSP09, October 1114, 2009, Big Sky, Montana, USA. Copyright 2009 ACM 978-1-60558-752-3/09/10 ...$10.00. occur with some regularity and can have significant con- sequences [2,7,30] and although the research community has done a great deal of work to improve BFT technolo- gies [1,8,11,12,18,19,3234], deployment of BFT replica- tion remains rare. We believe that for practitioners to see BFT as a viable option they must be able to use it to build and deploy sys- tems of interest at low incremental cost compared to the CFT systems they build and deploy now: BFT systems must be competitive with CFT systems not just in terms of per- formance, hardware overheads, and availability, but also in terms of engineering effort....
View Full Document

Page1 / 14

clement-sosp09 - UpRight Cluster Services Allen Clement,...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online