slides.0308 - BigTable A System for Distributed Structured...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
BigTable A System for Distributed Structured Storage Yanen Li Department of Computer Science University of Illinois at Urbana-Champaign yanenli2@illinois.edu 03/08/2011 cs525 course presentation, partial materials are from the internet
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Motivation: What BigTable is? BigTable is a distributed Database A more application-friendly storage service Data in Google URLs: Contents, crawl metadata, links, anchors, pagerank,… Per-user data: User preference settings, recent queries/search results, Geographic locations: Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, … 2
Background image of page 2
Question: Why not just use commercial DB ? TeraData, Oracle, MySql with sharding Data is large Billions of URLs, many versions/page (~20K/version) Hundreds of millions of users, thousands of q/sec 100TB+ of satellite image data Many incoming requests Scale to thousands of machines - 450,000 machines (NYTimes estimate, June 14th 2006) No commercial system big enough Cost is too high Might not have made appropriate design choices 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Goals of BigTable Need to support: Data is highly available at any time Very high read/write rates Efficient scans over all or interesting subsets of data Asynchronous and continuously updates High Scalability 4
Background image of page 4
Data model: a big map BigTable is a distributed multi-dimensional sparse map (row, column, timestamp) cell contents Provides lookup, insert, and delete API Row keys and Column keys are strings Arbitrary “columns” - Column family:qualifier - Column-oriented physical store Does not support a relational model 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Building Blocks Scheduler (Google WorkQueue) Google Filesystem - SSTable file format Chubby - {lock/file/name} service - Coarse-grained locks - discover tablet server - store meta data of tablets 6
Background image of page 6
Tablet Serving Structure Cluster Scheduling Master handles failover, monitoring GFS holds tablet data, logs Lock service holds metadata, handles master-election Bigtable tablet server serves data Read/write, split tablet Bigtable tablet server Bigtable tablet server Bigtable master performs metadata ops, load balancing Bigtable cell Bigtable client Bigtable client library Open() 7 serves data Read/write, split tablet serves data Read/write, split tablet Multiple masters – Only 1 elected active master at any given point of time and others sitting to acquire master lock
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Where data is stored SSTable ( "Static and Sorted Table") Immutable, sorted file of key-value string pairs Chunks of data plus an index Index is of block ranges, not values triplicated across three machines in GFS Index 64K block 64K block 64K block SSTable 8
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/08/2011 for the course CS 525 taught by Professor Gupta during the Spring '08 term at University of Illinois, Urbana Champaign.

Page1 / 45

slides.0308 - BigTable A System for Distributed Structured...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online