cis6930fa11_MADLib

cis6930fa11_MADLib - MAD Skills: New Analysis Practices for...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
MAD Skills: New Analysis Practices for Big Data Presented By: Christan Grant cgrant@cise.uf.edu 1 Thursday, August 25, 11
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
MAD Skills: New Analysis Practices for Big Data Authors Jeff Cohen – Greenplum Brian Dolan – Fox Audience Network Mark Dunlap – Evergreen Technologies Joseph M Hellerstein – UC Berkeley Caleb Welton – Greenplum Presented at Very Large Database Conference 2009 in Lyon, France 2 Thursday, August 25, 11
Background image of page 2
If you are looking for a career where your services will be in high demand, you should Fnd something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data . And what is complementary to data? Analysis . – Prof. Hal Varian, Chief Economist at G o o g l e 3 Thursday, August 25, 11
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Storage is Cheap The worlds largest data warehouse of ~15 years ago can be stored on disks for about $2000. 4 Thursday, August 25, 11
Background image of page 4
Traditionally, a lot of time is spent building well structured data warehouses . Contains summaries of data Main analytics location Jealously guarded by IT 5 Thursday, August 25, 11
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Data analysis is common culture Culture is to collect and analyze data in difference business units. 6 Thursday, August 25, 11
Background image of page 6
Data analysis is common culture Culture is to collect and analyze data in difference business units. “In God we trust; all others must bring data” 6 Thursday, August 25, 11
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Data analysis is common culture Culture is to collect and analyze data in difference business units. “In God we trust; all others must bring data” Analytics are crucial – Analysts need more access and more permissions Hence M.A.D. 6 Thursday, August 25, 11
Background image of page 8
M agnetic – Attract data and practitioners. The Database must be painless and efFcient. A gile – Analysts need to ingest and analyze on the ±y. D eep – Sophisticated analytics at scale. Don’t force analyst to work on samples (where the long tail is lost). Lib rary of tools 7 Thursday, August 25, 11
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background FOX Audience Network Being MAD In-Database Operations MAD DBMS Conclusions Question Outline 8 Thursday, August 25, 11
Background image of page 10
Background OLAP and Data Cubes Data structure to provides descriptive statistics Simple summaries (sum, average, std) We want inferential/ inductive statistics Predictions, causality analysis, distributional comparison 9 Thursday, August 25, 11
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background Databases and Statistical Packages Many analysts download data to use in Excel/SAS/ Matlab/R or their favorite programming language? FORTRAN?? Use matrix/vector operations Most of these stat packages require data to Ft in RAM Taking samples from the full data to Ft into ram results in loss of precision External toolkits may also lack parallelism 10 Thursday, August 25, 11
Background image of page 12
Background MapReduce and Parallel Programming Strengths of MR are scalable batch
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 63

cis6930fa11_MADLib - MAD Skills: New Analysis Practices for...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online