Beaver - Finding a needle in Haystack Facebook’s photo...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Finding a needle in Haystack: Facebook’s photo storage Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, Peter Vajgel, Facebook Inc. { doug, skumar, hcli, jsobel, pv } Abstract: This paper describes Haystack, an object stor- age system optimized for Facebook’s Photos applica- tion. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data. Users up- load one billion new photos ( ∼ 60 terabytes) each week and Facebook serves over one million images per sec- ond at peak. Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS. Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups. We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory. This choice conserves disk operations for reading actual data and thus increases overall throughput. 1 Introduction Sharing photos is one of Facebook’s most popular fea- tures. To date, users have uploaded over 65 billion pho- tos making Facebook the biggest photo sharing website in the world. For each uploaded photo, Facebook gen- erates and stores four images of different sizes, which translates to over 260 billion images and more than 20 petabytes of data. Users upload one billion new photos ( ∼ 60 terabytes) each week and Facebook serves over one million images per second at peak. As we expect these numbers to increase in the future, photo storage poses a significant challenge for Facebook’s infrastruc- ture. This paper presents the design and implementation of Haystack, Facebook’s photo storage system that has been in production for the past 24 months. Haystack is an object store [7, 10, 12, 13, 25, 26] that we designed for sharing photos on Facebook where data is written once, read often, never modified, and rarely deleted. We engineered our own storage system for photos because traditional filesystems perform poorly under our work- load. In our experience, we find that the disadvantages of a traditional POSIX [21] based filesystem are directo- ries and per file metadata. For the Photos application most of this metadata, such as permissions, is unused and thereby wastes storage capacity. Yet the more sig- nificant cost is that the file’s metadata must be read from disk into memory in order to find the file itself. While insignificant on a small scale, multiplied over billions of photos and petabytes of data, accessing metadata is the throughput bottleneck. We found this to be our key problem in using a network attached storage (NAS) ap- pliance mounted over NFS. Several disk operations were necessary to read a single photo: one (or typically more) to translate the filename to an inode number, another to read the inode from disk, and a final one to read the file itself. In short, using disk IOs for metadata was thefile itself....
View Full Document

This note was uploaded on 12/08/2011 for the course CS 525 taught by Professor Gupta during the Spring '08 term at University of Illinois, Urbana Champaign.

Page1 / 14

Beaver - Finding a needle in Haystack Facebook’s photo...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online