FALL 2015 - CS 145 Homework Assignment #1
Due: Monday, Oct 12 at 12:00
Note: Only hard copy will be accepted.
TOTAL: 40 + 40 + 20 = 100pts
Q1: Level-wise Frequent Itemset Mining using Apriori and DIC (40pts)
1.1 Let I = cfw_i1, i2, , in be a set of n item
CS144 Final, Fall 2013, Section 1 Page:
1
UCLA
Computer Science Department
Fall 2013
Student Name and ID:
CS144 Final: Closed Book, 2 hours
(* IMPORTANT PLEASE READ *):
There are 4 + 1 problems on the exam to be completed in 2 hours. You should look
thro
CS 145 Project 2: Walmart Recruiting: Trip Type Classification
(1) Project Description
This is a competition on Kaggle. For this project, you are tasked with categorizing shopping trip
types based on the items that customers purchased. To give a few hypot
CS145 Homework Assignment #2 (Fall 2015)
Due: Monday, Oct 26 at noon (the beginning of the class)
Only hardcopy is accepted.
1. Constraint based Pattern Mining (10 pts)
Assume that the price of each item is non-negative, and we are only interested in item
CS144 Final, Fall 2011, Section 1 Page:
1
UCLA
Computer Science Department
Fall 2011
Student Name and ID:
CS144 Final: Closed Book, 2 hours
(* IMPORTANT PLEASE READ *):
There are 4 + 1 problems on the exam to be completed in 2 hours. You should look
thro
CS144 Final, Fall 2012, Section 1 Page:
1
UCLA
Computer Science Department
Fall 2012
Student Name and ID:
CS144 Final: Closed Book, 2 hours
(* IMPORTANT PLEASE READ *):
There are 4 problems on the exam to be completed in 2 hours. You should look through
Homework Assignment 1 (CS 145)
Due: Tuesday, Oct 21 at 16:00
Only hardcopy is accepted.
1. Apriori Property (10 pts)
1.1 What is the Apriori Rule? (3 pts)
The Apriori Rule claims that if an itemset is infrequent, then any of its superset is
infrequent.
Al
Homework Assignment 2 (CS 145)
Due: Tuesday, Nov 4 at 4:00pm (the beginning of the class)
Only hardcopy is accepted.
1. Sequential Pattern Mining (20 pts)
Consider the following database:
Customer ID
Data sequence
1
<cfw_a,dcfw_c,d>
2
<cfw_a,b,dcfw_a,ccfw
Bi-Clustering
CS 145
Fall 2015
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Data Mining: Clustering
k
dist ( x
i
, ct ) 2
t =1 ict
K-means clustering minimizes
Where dist ( x , c
i
t
)=
m
(x
ij
ctj ) 2
j =1
2
CS 145: Data Mining
The Curse of Dimensional
Homework Assignment 3 (CS 145)
Due: Monday, Nov 09 at 12:00
Only hardcopies are accepted.
1. Bi-Clustering (30 points)
1.1 If we are asked to do clustering in dataset with high dimensionalities, can we still use
Euclidean distance to measure the distance
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
VOL. 17,
NO. 8,
AUGUST 2005
1021
Efficiently Mining Frequent Trees in a
Forest: Algorithms and Applications
Mohammed J. Zaki, Member, IEEE
AbstractMining frequent trees is very useful in domains like bi
Coherent Cluster
For a 2x2 matrix containing two objects cfw_x,y and
two aXributes cfw_a,b
AXribute a AXribute b
Object x
Xa
Xb
D = (Xa Ya) (Xb Yb)
Yb
(to measure the shi[ in the data) Object y Ya
Clustering
CS 145
Fall 2015
Wei Wang
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Outline
What is clustering
Partitioning methods
Hierarchical methods
Density-based methods
Grid-based methods
Model-based clustering methods
Outlier analysis
2
CS 145: Data M
Association Rule Mining
CS145
Fall 2015
The UNIVERSITY of CALIFORNIA at LOS ANGELES
DHP: Reduce the Number of
Candidates
A hashing bucket count <min_sup every
candidate in the buck is infrequent
Candidates: a, b, c, d, e
Hash entries: cfw_ab, ad, ae cfw_b
Association Rule Mining
CS145
Fall 2015
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Mining Various Kinds of Rules
or Regularities
Multi-level, quantitative association rules,
correlation and causality, ratio rules,
sequential patterns, emerging patterns,
Bayesian Classification: Why?
Probabilistic learning: Calculate explicit probabilities for
hypothesis, among the most practical approaches to certain types
of learning problems
Incremental: Each training example can incrementally
increase/decrease the pro
CS144 Final, Fall 2012, Section 1 Page:
1
UCLA
Computer Science Department
Fall 2012
Student Name and ID:
CS144 Final: Closed Book, 2 hours
(* IMPORTANT PLEASE READ *):
There are 4 problems on the exam to be completed in 2 hours. You should look through
CS144 Final, Fall 2012, Section 1 Page:
1
UCLA
Computer Science Department
Fall 2012
Student Name and ID:
CS144 Final: Closed Book, 2 hours
(* IMPORTANT PLEASE READ *):
There are 4 problems on the exam to be completed in 2 hours. You should look through
CS144 Midterm, Fall 2013 Page:
UCLA
Computer Science Department
Fall 2013
1
Instructor: J. Cho
Student Name and ID:
CS144 Midterm: Closed Book, 90 minutes
(* IMPORTANT PLEASE READ *):
There are 5 problems on the exam for a total of 50 points to be comple
Association Rule Mining
CS145
Fall 2015
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Outline
What is association rule mining?
Methods for association rule mining
Extensions of association rule
2
CS145: Data Mining
What Is Association Rule
Mining?
Frequent
Frequent Subgraph Discovery*
Michihiro Kuramochi and George Karypis
Department of Computer Science/Army HPC Research Center
University of Minnesota
Minneapolis, MN 55455
{kuram, karypis} @cs.umn.edu
Abstract
As data mining techniques are being increasingl
Mining Frequent Subgraphs
CS 145
Fall 2014
The UNIVERSITY of CALIFORNIA at LOS ANGELES
DFS Lexicographic Order
2
12/9/2014
CS145: Data Mining
DFS Lexicographic Order
3
12/9/2014
CS145: Data Mining
DFS Lexicographic Order (Case 1)
4
12/9/2014
CS145: Data M
Mining Frequent Subgraphs
CS 145
Fall 2014
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Overview
Introduction
Finding recurring subgraphs from graph databases.
FSG
gSpan
FFSM
1L06
2
12/2/2014
CS145: Data Mining
Labeled Graph
We define a labeled graph G as
Paper-1: Topic Description and Discussion
Distributed Classification Based on Association rules (CBA) algorithm
By Shuanghui Luo
Introduction
Data mining refers to extracting useful information or knowledge from large amounts of
data. In recent years, dat
CS 145
Data Mining
Instructor: Wei Wang
Fall 2014
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Big Data are Everywhere
The UNIVERSITY of CALIFORNIA at LOS ANGELES
So are the Challenges
3
The UNIVERSITY of CALIFORNIA at LOS ANGELES
Welcome!
Instructor: Wei
IMB3-Miner: Mining Induced/Embedded Subtrees
by Constraining the Level of Embedding
Henry Tan1, Tharam S. Dillon1, Fedja Hadzic1, Elizabeth Chang2, and Ling Feng3
1
University of Technology Sydney, Faculty of Information Technology, Sydney, Australia
cfw_