2009.dfs.hotpower - On the Energy (In)efficiency of Hadoop...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: On the Energy (In)efficiency of Hadoop Clusters Jacob Leverich, Christos Kozyrakis Computer Systems Laboratory Stanford University {leverich, kozyraki}@stanford.edu ABSTRACT Distributed processing frameworks, such as Yahoo!s Hadoop and Googles MapReduce, have been successful at harnessing expansive datacenter resources for large-scale data analysis. However, their ef- fect on datacenter energy efficiency has not been scrutinized. More- over, the filesystem component of these frameworks effectively pre- cludes scale-down of clusters deploying these frameworks (i.e. op- erating at reduced capacity). This paper presents our early work on modifying Hadoop to allow scale-down of operational clusters. We find that running Hadoop clusters in fractional configurations can save between 9% and 50% of energy consumption, and that there is a trade- off between performance energy consumption. We also outline further research into the energy-efficiency of these frameworks. 1. INTRODUCTION Energy consumption and cooling are now large com- ponents of the operational cost of datacenters and pose significant limitations in terms of scalability and reli- ability [3]. A growing segment of datacenter work- loads is managed with MapReduce-style frameworks, whether by privately managed instances of Yahoo!s Hadoop [2], by Amazons Elastic MapReduce [12], or ubiquitously at Google by their archetypal implemen- tation [5]. Therefore, it is important to understand the energy efficiency of this emerging workload. The energy efficiency of a cluster can be improved in two ways: by matching the number of active nodes to the current needs of the workload, placing the remain- ing nodes in low-power standby modes; by engineering the compute and storage features of each node to match its workload and avoid energy waste on oversized com- ponents. Unfortunately, MapReduce frameworks have many characteristics that complicate both options. First, MapReduce frameworks implement a dis- tributed data-store comprised of the disks in each node, which enables affordable storage for multi-petabyte datasets with good performance and reliability. Asso- ciating each node with such a large amount of state ren- ders state-of-the-art techniques that manage the number of active nodes, such as VMWares VMotion [13], im- practical. Even idle nodes remain powered on to ensure HotPower 09, Copyright 2009 ACM 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 System Inactivity Distribution Fraction of Runtime Inactivity Duration (s) Multi-job Mix (32GB Scans and Sorts) (a) Distribution of the lengths of system inactivity periods across a cluster during a multi-job batch workload, comprised of several scans and sorts of 32GB of data. A value of .38 at x = 40 seconds means that 38% of the time, a node was idle for 40 seconds or longer....
View Full Document

Page1 / 5

2009.dfs.hotpower - On the Energy (In)efficiency of Hadoop...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online