This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Moara: Flexible and Scalable Group-Based Querying System Steven Y. Ko 1 , Praveen Yalagandula 2 , Indranil Gupta 1 , Vanish Talwar 2 , Dejan Milojicic 2 , Subu Iyer 2 1 University of Illinois at Urbana-Champaign 2 HP Labs, Palo Alto Abstract. Users and administrators of large-scale infrastructures (e.g., datacen- ters and PlanetLab) are frequently in need of monitoring groups of machines in the infrastructure. Though there exist several distributed querying systems for this monitoring purpose, they are not group-based; they mostly focus on querying the entire system. In this paper, we present Moara , a new querying system that makes two novel contributions. First, Moara builds aggregation trees for differ- ent groups and adaptively maintains the trees to optimize the total message cost. Second, Moara supports a query language allowing groups to be specified implic- itly via predicates consisting of arbitrarily nested unions and intersections. Our evaluations on Emulab, on PlanetLab, and with large-scale simulations, demon- strate Moaras ability to answer complex queries within a fraction of a second, to deal with high levels of dynamism in groups, and to incur a low bandwidth overhead per host per query in comparison to existing centralized and distributed aggregation systems. 1 Introduction Large-scale distributed infrastructures have become increasingly common in various domains. Todays enterprise data centers  are equipped with thousands of machines and run thousands of different applications and services. Federated computing infras- tructures such as PlanetLab , proposed GENI infrastructure , and computational grids  consist of thousands of hosts providing resources for a number of projects. A frequent need of the users and the administrators of such infrastructures is moni- toring and querying the status of groups of machines in the infrastructure, as well as the infrastructure as a whole. These groups may be static or dynamic, e.g., the PlanetLab slices, the machines running a particular service in a datacenter, or the machines with CPU utilization above 50%. Further, users typically desire to express complex criteria for the selection of the host groups to be queried. For example, find top-3 loaded hosts where (ServiceX = true) and (Apache = true) is a query that targets two groups - hosts that run service X and hosts that run Apache. Dynamic groups mean that the size and composition of groups vary across different queries as well as time. In general, users and administrators desire to monitor the performance of these groups, to troubleshoot any failures or performance degradations, and to track usage of allocated resources. These requirements point to the need for a group-based query- ing system that can provide instantaneous answers to queries over in-situ data targeting one or more groups. In fact, several existing distributed aggregation systems  can be considered as a special case of group-based querying systems, as they target querying...
View Full Document
- Spring '08