zz-tr - Extended Technical Report for EuroSys 2011 Paper 1...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Extended Technical Report for EuroSys 2011 Paper 1 ZZ and the Art of Practical BFT Execution Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, And Emmanuel Cecchet Department of Computer Science, University of Massachusetts Amherst Abstract The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adop- tion in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT ser- vices from 2 f + 1 to practically f + 1 . The key insight in ZZ is to use f + 1 execution replicas in the normal case and to activate additional replicas only upon failures. In data cen- ters where multiple applications share a physical server, ZZ reduces the aggregate number of execution replicas running in the data center, improving throughput and response times. ZZ relies on virtualizationa technology already employed in modern data centersfor fast replica activation upon fail- ures, and enables newly activated replicas to immediately be- gin processing requests by fetching state on-demand. A pro- totype implementation of ZZ using the BASE library and Xen shows that, when compared to a system with 2 f + 1 repli- cas, our approach yields lower response times and up to 33% higher throughput in a prototype data center with four BFT web applications. We also show that ZZ can handle simulta- neous failures and achieve sub-second recovery. 1 Introduction Todays enterprises rely on data centers to run their critical business applications. As users have become increasingly de- pendent on online services, malfunctions have become highly problematic, resulting in financial losses, negative publicity, or frustrated users. Consequently, maintaining high availabil- ity of critical services is a pressing need as well as a challenge in modern data centers. Byzantine fault tolerance (BFT) is a powerful replication approach for constructing highly-available services that can tolerate arbitrary (Byzantine) faults. This approach requires replicas to agree upon the order of incoming requests and pro- cess them in the agreed upon order. Despite numerous efforts to improve the performance or fault scalability of BFT sys- tems [3, 6, 15, 25, 1, 13], existing approaches remain expen- sive, requiring at least 2 f +1 replicas to execute each request in order to tolerate f faults [15, 27]. This high replication cost has been a significant barrier to their adoptionto the best of our knowledge, no commercial data center application uses BFT techniques today, despite the wealth of research in this area. Many recent efforts have focused on optimizing the agree- ment protocol used by BFT replicas [6, 15]; consequently, todays state-of-the-art protocols can scale to a throughput of 80,000 requests/s and incur overheads of less than 10 s per request for reaching agreement [15]. In contrast, request exe- cution overheads for typical applications such as web servers and databases [25] can be in the order of milliseconds or tens...
View Full Document

This note was uploaded on 02/08/2012 for the course ECE 428 taught by Professor Hu during the Spring '08 term at University of Illinois, Urbana Champaign.

Page1 / 17

zz-tr - Extended Technical Report for EuroSys 2011 Paper 1...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online