cis6930fa11_Dremel

cis6930fa11_Dremel - Dremel Interactive Analysis of...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
Dremel Interactive Analysis of Web-Scale Datasets Aravinth Bheemaraj [email protected] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton,and Theo Vassilakis
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
What is the paper about? Overview of Dremel system Columnar storage format for nested data Dremel’s query language and execution Execution trees used in web search systems Experimental results
Background image of page 2
What is Dremel? System for interactive analysis of data. Uses data, sitting on different storage systems. Data modeled in a columnar, semi-structured (Protocol Buffers) format Offers SQL-like Query language
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Runs a MapReduce to extract billions of signals from web pages Ad hoc SQL against Dremel DEFINE TABLE t AS /path/to/data/* SELECT TOP(signal, 100), COUNT(*) FROM t More MR-based processing on the data (FlumeJava, Sawzall) Can register the new dataset in a project Example : Data Exploration
Background image of page 4
Dremel system Trillion-record, multi-terabyte datasets at interactive speed – Scales to thousands of nodes – Fault tolerant execution Nested data model – Complex datasets; normalization is prohibitive – Columnar storage and processing Tree architecture (as in web search) Interoperates with Google's data mgmt tools In situ data access (e.g., GFS, Bigtable)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Widely used inside Google Analysis of crawled web documents Tracking install data for applications on Android Market Crash reporting for Google products OCR results from Google Books Spam analysis Debugging of map tiles on Google Maps Tablet migrations in managed Bigtable instances Results of tests run on Google's distributed build system
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/09/2011 for the course CIS 6930 taught by Professor Staff during the Fall '08 term at University of Florida.

Page1 / 24

cis6930fa11_Dremel - Dremel Interactive Analysis of...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online