cis6930fa11_Dremel - Dremel Interactive Analysis of...

Info icon This preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
Dremel Interactive Analysis of Web-Scale Datasets Aravinth Bheemaraj [email protected] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton,and Theo Vassilakis
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
What is the paper about? Overview of Dremel system Columnar storage format for nested data Dremel’s query language and execution Execution trees used in web search systems Experimental results
Image of page 2
What is Dremel? System for interactive analysis of data. Uses data, sitting on different storage systems. Data modeled in a columnar, semi-structured (Protocol Buffers) format Offers SQL-like Query language
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Runs a MapReduce to extract billions of signals from web pages Ad hoc SQL against Dremel DEFINE TABLE t AS /path/to/data/* SELECT TOP(signal, 100), COUNT(*) FROM t More MR-based processing on the data (FlumeJava, Sawzall) Can register the new dataset in a project Example : Data Exploration
Image of page 4