Problem 1: ER Diagram Design
The company you work for wants to digitize their time cards. You have been asked to design the
database for submitting and approving time cards. Draw the database ER diagram with the following
information:
A timecard should ha
Data Mining:
Concepts and
Techniques
Jiawei Han and Micheline Kamber
February 21, 2017
Data Mining: Concepts and
Techniques
1
Chapter 1. Introduction
Motivation: Why data mining?
What is data mining?
Data Mining: On what kind of data?
Data mining function
1/13/2017
Data Mining
CSC 6740 (CSC 4740), Spring 2017
The Department of Computer Science
CSC 6740/4740
Welcome to Class CSC 6740, CSC 4740!
DATA MINING
Who are you?
Which year are you in?
What is your basic idea of Data Mining?
Target of this course?
At
1/27/2017
Data Mining
CSC 6740 (CSC 4740), Spring 2017
The Department of Computer Science
Chapter 3: Data Preprocessing
Data Preprocessing: An Overview
Data Cleaning
Data Integration
Data Reduction and Transformation
Dimensionality Reduction
Summary
1/19/2017
Data Mining
CSC 6740 (CSC 4740), Spring 2017
The Department of Computer Science
Chapter 2. Getting to Know Your Data
Data Objects and Attribute Types
Basic Statistical Descriptions of Data
Data Visualization
Measuring Data Similarity and Dis
CSC 4740/6740 Data Mining
Assignment 4
Due Date: 6pm, Monday, November 14, 2016
William Gregory Johnson
9
0
9
0
Figure 1.
1. (15 points) Please illustrate the k -means algorithm on the dataset.
First k-means points: (2,7) and (7,3) in RED.
First Iteration
k-Means
Iteration 1
Data Obj
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
X
0
1
1
1
2
2
2
3
3
3
5
6
6
7
7
7
7
8
8
9
Y (2,7)
7 2.000
1 6.083
6 1.414
8 1.414
5 2.000
7 0.000
8 1.000
0 7.071
6 1.414
7 1.000
3 5.000
2 6.403
4 5.000
2 7.071
3 6.403
5 5.385
8 5.099
4. (18 points) Please illustrate the DBSCAN algorithm on the dataset in Figure 1.
Iteration 1: Choose point q as (2,6), epsilon=2.0, minpts=3, neighborhood = cfw_(0,7),(1,6),(1,8),(2,5),(2,7),(3,6) mark
all as visited.
Choose point p as (3,6), it a new co
d
(1,8)
d
(1,8)
g
(2,8)
g
(2,8)
i
(3,6)
i
(3,6)
f
(2,7)
f
(2,7)
j
(3,7)
j
(3,7)
e
(1,7)
e
(0,7)
a
(2,5)
a
(1,6)
c
(1,6)
c
(2,5)
l
(6,2)
l
(6,2)
n
(7,2)
n
(7,2)
o
(7,3)
o
(7,3)
r
(8,3)
r
(8,3)
s
(8,4)
s
(8,4)
k
(5,3)
m
(6,4)
p
(7,5)
9
0
0
9
k
(6,3)
m
(7,4)
Chapter 2: Data Preprocessing
Why preprocess the data?
Descriptive data summarization
Data cleaning
Data integration and transformation
Data reduction
Discretization and concept hierarchy generation
Summary
February 21, 2017
Data Mining: Concepts and
Tech
Chapter 5: Mining Frequent Patterns,
Association and Correlations
Basic concepts and a road map
Efficient and scalable frequent itemset
mining methods
Constraint-based association mining
Summary
February 21, 2017
Data Mining: Concepts and
Techniques
1
Wha
HW 2
CSC4740 Data Mining
2/20/2016
Ha K Hwang
1. (100 points)
A database has five transactions. Let min sup = 60% and min conf = 75%.
TID
Items-bought
T100
M, O, N, N, K, E, Y, Y
T200
D, D, O, N, K, E, Y
T300
M, M, A, K, E, E
T400
M, U, C, C, Y, C, E, O
T
Homework 2
Note: You have to submit a hardcopy of Homework 2 at the beginning the class on the due
date (Oct. 2).
Specify the following queries in Relational Algebra, Tuple Relational Calculus, and Domain
Relational Calculus, respectively, based on the CO
An Introduction to Data Mining
Why Data Mining
*
Creditratings/targetedmarketing:
*
Givenadatabaseof100,000names,whichpersonsaretheleastlikelyto
Identifylikelyresponderstosalespromotions
Frauddetection
*
defaultontheircreditcards?
Whichtypesoftransaction
The MSD databases
The MSD actually consists of two separate databases:
*
*
the archive database is highly normalized, with thousands of
relationships linking some 400 tables; the deposition database is the
definitive archive for all structural data at MSD
Classifying Galaxies
Clustering Definition
*
Given a set of data points, each having a set of attributes, and a similarity measure
among them, find clusters such that
*
*
*
Data points in one cluster are more similar to one another.
Data points in separat
Data Mining
*
*
*
*
New buzzword, old idea.
Inferring new information from already collected data.
Traditionally job of Data Analysts
Computers have changed this.
Far more efficient to comb through data using a machine than eyeballing statistical data.
Da
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
What is a data warehouse?
Data warehouse architecture
From data warehousing to data mining
February 21, 2017
Data Mining: Concepts and
Techniques
1
What is Data Warehouse?
Defined in many differ