09-dwdm-notes

09-dwdm-notes - Data Warehousing and Data Mining CPS 116...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Data Warehousing and Data Mining CPS 116 Introduction to Database Systems 2 Announcements (Thu. Sep. 29) ± Homework #2 due next Tuesday ² Sample solution available next Wednesday ± Midterm exam next Thursday in class ² Open book, open notes ² Sample midterm solution (from 2009) available today • Sample midterm (2009) was handed out on Tuesday ± Part of the lecture next Tuesday will be reserved for midterm review ² Feel free to bring your questions 3 Data integration ± Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources ² Sales, inventory, customer, … ² NC branch, NY branch, CA branch, … ± Need to support OLAP (On-Line Analytical Processing) over an integrated view of the data ± Possible approaches to integration ² Eager: integrate in advance and store the integrated data at a central repository called the data warehouse ² Lazy: integrate on demand; process queries over distributed sources—mediated or federated systems
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 4 OLTP versus OLAP OLTP ± Mostly updates ± Short, simple transactions ± Clerical users ± Goal: transaction throughput OLAP ± Mostly reads ± Long, complex queries ± Analysts, decision makers ± Goal: fast queries Implications on database design and optimization? 5 Eager versus lazy integration Eager (warehousing) ± In advance: before queries ± Copy data from sources Lazy ± On demand: at query time ± Leave data at sources 6 Maintaining a data warehouse ± The “ETL” process ² Extraction: extract relevant data and/or changes from sources ² Transformation: transform data to match the warehouse schema ² Loading: integrate data/changes into the warehouse ± Approaches ² Recomputation • Easy to implement; just take periodic dumps of the sources, say, every night ² Incremental maintenance • Compute and apply only incremental changes • Fast if changes are small • Not easy to do for complicated transformations • Need to detect incremental changes at the sources
Background image of page 2
3 7 “Star” schema of a data warehouse ± Big ± Constantly growing ± Stores measures (often aggregated in queries) Dimension table Dimension table
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 01/17/2012.

Page1 / 9

09-dwdm-notes - Data Warehousing and Data Mining CPS 116...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online