C-Store: A Column-oriented DBMS
Mike Stonebraker, Daniel J. Abadi, Adam Batkin+, Xuedong Chen, Mitch Cherniack+, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth ONeil, Pat ONeil, Alex Rasin, Nga Tran+, Stan Zdonik
MIT CSAIL Cambridge, MA
Data Organization - B-trees
Data organization and retrieval
File organization can improve data retrieval time
Slides by Joe Hellerstein, UCB, with some material from Jim Gray,
Microsoft Research. See also:
Why Parallel Access To
At 10 MB/s
1.2 days to scan
CS505 Lab 2: Operators
Operators: Join, aggregate, filter, etc.
Add and delete tuples
Buffer pool eviction
SELECT * FROM table1, table2
WHERE table1.field1 =
AND table1.id > 5
Operators = Iterator
CS 505 Fall 2009 Homework #4
Due Monday Nov. 9, in Class
Chapter 15 and 16
CS 505 Fall 2009 Homework #2
Due Sep. 30, in Class
Note: For this problem, you need to read the pseudo-code in the textbook for B+ tree operations.
Problem 1 (14.11): Suppose that a B+-tree index on (branch-name, branch-city) is available on
relation branch. What would be the best way to handle the following selection?
Answer: Using t
17.9. Assume that immediate modification is used in a system. Show, by an example, how an
inconsistent database state could result if log records for a transaction are not output to
stable storage prior to data updated by the transaction being
CS 505 Fall 2009 Homework #5
Due Monday Nov. 30, in Class
Suppose there is a log-based recovery database that crashed during the execution. When it comes
back online, you find the log and it looks as follows.
CS505 Intermediate Topics in Database Systems
Meeting Time and Place: Mon, Wed, and Fri 9:00-9:50 in Anderson Tower Rm.263
Instructor: Tingjian Ge
Instructor Email: email@example.com
Course Web Page: http:/protocols.netlab.uky.edu/~ge/teaching.html
Data Analysis (a.k.a.
data warehousing) and
Column Oriented DBMS
Modified and Extended the slides from Silberschatz et al and from New England Database
Decision Support Systems
business decisions, often based on
data collected by on-line tran
MapReduce: Simplied Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
MapReduce is a programming model and an associated implementation for processing and generating large data sets
Storage and Disks
Now Something Different
CS 405G: Application Oriented
CS 505: Systems Oriented
What is Systems?
A: Not Programming
Not programming big things.
Systems = Efficient and safe use of limited resources (e.g., disks)
A Comparison of Approaches to Large-Scale Data Analysis
University of Wisconsin
firstname.lastname@example.org Daniel J. Abadi
Yale University Microsoft Inc.
email@example.com Samuel Madde
B+-tree is perfect, but.
to answer a selection query (ssn=10) needs to traverse a full path.
In practice, 3-4 block accesses (depending on the height of the tree,
Any better approach?
s File Structures
s Query Processing and Optimization
s Data Retrieval at the physical level:
Indices: data structures to help with some query evaluation:
Intermediate Topics in Database Systems
Prof. Tingjian Ge
with thanks to Prof. Stan Zdonik, Brown University
Prof. Sam Madden, MIT
Prof. Avi Silberschatz, Yale University
What is a Database System?
A very large collection of related
s Concurrency Control
Ensures interleaving of operations amongst
concurrent transactions result in serializable
transaction operations interleaved following a
Database System Co
What Does a DBMS Manage?
1. Data organization
2. Data Retrieval
3. Data Integrity
Updates in SQL
Storage and File Organization
General Overview - rel. model
Relational model - SQL
Formal & commercial query languages
Database System Concepts
DBMSs store data on d
Review: The ACID properties
A tomicity: All actions in the Xaction happen, or none happen.
C onsistency: If each Xaction is consistent, and the DB starts
consistent, it ends up consistent.
I solation: Execution of one
Complexity of Running a Data
is an example.
3 he Internet
Simplified Data Processing on
Problem 1: Suppose we have tables:
T1 (c11, c12, c13, c14)
T2 (c21, c22)
T3 (c31, c32, c33)
Draw a logical query plan (tree) for query:
SELECT c14 FROM T1, T2 WHERE c12 = c21
SELECT c32 FROM T2, T3 WHRE c33 = c21 AND c31 = 22