CSC-5120-09
CSC-5120 Assignment 2 (Fall 2009) Due Date and Time: 3:00 p.m. 12th Nov, 2009
1. In a classication problem we are given a relational table with a set of attributes a 1 , ., am where a1 is a special attribute for classes. The problem is to buil
CSC-5120-09
CSC-5120 Assignment 1 (Fall 2009) Due Date and Time: 3:00 p.m. 20th Oct, 2009
1.
(a) Suppose we modify the 2-Phase Commit protocol as follows. When a site s i votes NO, it sends the message N O to every site instead of only to the coordinator.
Model Answer for Assignment 2 1 Step 1: Find all the confident rules for a given minimum confidence. ( Given a minimum confidence minconf, a rule is confident if conf(xc) minconf, where conf(xc) is the confidence of rule xc ) Step 2: From association rule
Model Answer for Assignment 1 1(a)
q0
END Re ady ?0i (1im )
qi Re ady ?0i Yesi 0
Re ady ?0i Noij ( 0 jm )
w0
Or ( Noi 0 )
And (Yesi 0 ) And (Commit0i )
a0
Timeout
wi ai Or ( Noij ( j i ,1 j m ) Commit0i ci
c0
No, the protocol isnt non-blocking. Because
CSE-5120-Fall-2009
Client
Client
Data Warehousing
Decision support systems (DSS) in business Also called On line analytical Processing (OLAP) (vs OLTP : On line transaction processing) Many corporations use data warehouses for their analysis. Decision su
CSE-5120-Fall-2009
Stream Projects Amazon/Cougar (Cornell) - sensors
Querying and Mining Data Streams
Aurora (Brown/MIT) - sensor monitoring, dataow Hancock (AT&T) - telecom streams
Data Streams: You Only Get One Look
(adopted from tutorials, VLDB 2002,
CSE-5120-Fall-2009
To reduce the number of dimensions Eigenvalues and Eigenvectors Karhunen-Loeve Expansion A number is called an eigenvalue (or characteristic value) of a n n matrix A if there exists a vector x = 0 such that Ax = x The vector x is then c
CSE-5120-Fall-2009
An R-tree for data points
R-Trees: Index structure for Spatial Searching
Guttman, SIGMOD 1984
D
K F G J
B
I
A
R-tree : a height-balanced tree has some similarity to a B-tree records in its leaf nodes pointing to data objects.
N M
E H
L
CSE-5120-fall-2009
Sample queries primary key Find the employee record with emp = 123.
Indexing Multimedia Databases
secondary key Find the employee records with salary = 40K. text Find the documents containing the words multimedia, indexing.
Problem den
CSE-5120-Fall-2009
Income
u4 u5
Subspace Clustering
Agrawal, Gehrke, etc al, SIGMOD 98
u1 u2 u3
Age
Data : points in a multiple dimensional space. Each dimension is partitioned into intervals Unit: intersection of one interval from each attribute (dimensi
CSE-5120-Fall-2009
Frequent Pattern Tree (FP-tree)
root root
Mining Frequent Patterns without Candidate Generation
Jiawei Han, Jian Pei and Yiwen Yin, SIGMOD 2000
f:1
f:2
c:1
c:2
a:1
a:2
Motivations for not using Apriori method: It is costly to handle a l
CSE-5120-Fall-2009
Association Rule purchase(T, bread), purchase(T,butter) purchase(T,milk) Clustering Group a set of data based on the conceptual clustering principle: maximize the intraclass similarity and minimize the interclass similarity. Sequences:
CSE-5120-Fall-2009
To guarantee conict serializability: REPLICATED DATA 2-Phase Locking
Multiple copies of some data items are stored at multiple sites. One copy serializability: Multiple copies of an object must appear as a single logical object to the t
CSE-5120-Fall-2009
accesses an account from a site dierent from the initiation site or accesses accounts in several dierent sites
Distributed Databases
Chapter 18(19) of Book: Database Systems Concept, 3rd(4th) Ed
or
Chapter 22 of Book: Database Manageme
CSC 5120 Assignment 3 (Fall 2009) Due Date and Time: 3:00 p.m. 3 Dec, 2009 Question 1: Consider an R-Tree with maximum node size M=4 and minimum node size m=2. Assume that the database contains the following rectangles: Rectangle 1 2 3 4 5 6 7 Lower Corne