nsdi2010 - Volley: Automated Data Placement for...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Volley: Automated Data Placement for Geo-Distributed Cloud Services Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman Microsoft Research, { sagarwal, jdunagan, navendu, ssaroiu, alecw } @microsoft.com Harbinder Bhogan University of Toronto, hbhogan@cs.toronto.edu Abstract: As cloud services grow to span more and more globally distributed datacenters, there is an increasingly urgent need for automated mechanisms to place applica- tion data across these datacenters. This placement must deal with business constraints such as WAN bandwidth costs and datacenter capacity limits, while also mini- mizing user-perceived latency. The task of placement is further complicated by the issues of shared data, data inter-dependencies, application changes and user mobil- ity. We document these challenges by analyzing month- long traces from Microsofts Live Messenger and Live Mesh, two large-scale commercial cloud services. We present Volley, a system that addresses these chal- lenges. Cloud services make use of Volley by submitting logs of datacenter requests. Volley analyzes the logs us- ing an iterative optimization algorithm based on data ac- cess patterns and client locations, and outputs migration recommendations back to the cloud service. To scale to the data volumes of cloud service logs, Volley is designed to work in SCOPE [5], a scalable MapReduce-style platform; this allows Volley to per- form over 400 machine-hours worth of computation in less than a day. We evaluate Volley on the month-long Live Mesh trace, and we find that, compared to a state- of-the-art heuristic that places data closest to the pri- mary IP address that accesses it, Volley simultaneously reduces datacenter capacity skew by over 2 , reduces inter-datacenter traffic by over 1.8 and reduces 75th percentile user-latency by over 30%. 1 Introduction Cloud services continue to grow rapidly, with ever more functionality and ever more users around the globe. Because of this growth, major cloud service providers now use tens of geographically dispersed datacenters, and they continue to build more [10]. A major unmet challenge in leveraging these datacenters is automati- cally placing user data and other dynamic application data, so that a single cloud application can serve each of its users from the best datacenter for that user. At first glance, the problem may sound simple: de- termine the users location, and migrate user data to the closest datacenter. However, this simple heuris- tic ignores two major sources of cost to datacenter operators: WAN bandwidth between datacenters, and over-provisioning datacenter capacity to tolerate highly skewed datacenter utilization. In this paper, we show that a more sophisticated approach can both dramatically re- duce these costs and still further reduce user latency. The more sophisticated approach is motivated by the follow- ing trends in modern cloud services: Shared Data: Communication and collaboration are increasingly important to modern applications.increasingly important to modern applications....
View Full Document

This note was uploaded on 11/12/2011 for the course CE 726 taught by Professor Staf during the Spring '11 term at SUNY Buffalo.

Page1 / 16

nsdi2010 - Volley: Automated Data Placement for...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online