{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

09-dwdm - Announcements(Thu Sep 29 Data Warehousing and...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Data Warehousing and Data Mining CPS 116 Introduction to Database Systems 2 Announcements (Thu. Sep. 29) Homework #2 due next Tuesday Sample solution available next Wednesday Midterm exam next Thursday in class Open book, open notes Sample midterm solution (from 2009) available today • Sample midterm (2009) was handed out on Tuesday Part of the lecture next Tuesday will be reserved for midterm review Feel free to bring your questions 3 Data integration Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer, … NC branch, NY branch, CA branch, … Need to support OLAP (On-Line Analytical Processing) over an integrated view of the data Possible approaches to integration Eager: integrate in advance and store the integrated data at a central repository called the data warehouse Lazy: integrate on demand; process queries over distributed sources—mediated or federated systems 4 OLTP versus OLAP OLTP Mostly updates Short, simple transactions Clerical users Goal: transaction throughput OLAP Mostly reads Long, complex queries Analysts, decision makers Goal: fast queries Implications on database design and optimization? OLAP databases do not care much about redundancy “Denormalize” tables Many, many indexes Precomputed query results 5 Eager versus lazy integration Eager (warehousing) In advance: before queries Copy data from sources Lazy On demand: at query time Leave data at sources ) Answer could be stale ) Need to maintain consistency ) Query processing is local to the warehouse Faster Can operate when sources are unavailable ) Answer is more up-to-date ) No need to maintain consistency ) Sources participate in query processing Slower Interferes with local processing 6 Maintaining a data warehouse The “ETL” process Extraction: extract relevant data and/or changes from sources Transformation: transform data to match the warehouse schema Loading: integrate data/changes into the warehouse Approaches Recomputation • Easy to implement; just take periodic dumps of the sources, say, every night • What if there is no “night,” e.g., a global organization?
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}