CSE 881: Data Mining (Fall 2014) Exam 2 7
Name:
1. [5 points} Answer True or False for each question belOw.
(a) If the conﬁdence of the association rule {Bread, Milk} —-> {Coke} is
70%, then support of the rule must not exceed 70%.
Answer: ‘1‘“qu
(b) If {

CSE 881: Data Mining (Fall 2014) Homework 5 Solution
1. State the type of clustering for each method given below. You need to
indicate whether it is: (a) partitional (non-hierarchical) or hierarchical,
(b) exclusive (disjoint), overlapping, or fuzzy, (c)

CSE 881: Data Mining (Fall 2014) Homework 4 Solution
1. Apriori Algorithm
Consider the following set of candidate 3-itemsets:
cfw_a, b, c, cfw_a, b, d, cfw_a, b, e, cfw_a, c, d, cfw_a, c, e, cfw_a, c, f , cfw_a, d, e,
cfw_b, c, d, cfw_b, c, e, cfw_b, c, f

CSE 881: Data Mining
Lecture 10: Nonlinear Classifiers
1
Outline
Kernel (Nonlinear) SVM
Multi-layer neural networks
Deep learning
2
Kernel SVM
What if decision boundary is not linear?
3
Kernel SVM
Trick: Transform data into higher dimensional
space
Deci

CSE 881: Data Mining
Lecture 11: Classification Miscellaneous
1
Classification
This
lecture discusses miscellaneous issues in
classification and methods to address them
How to deal with imbalanced classes?
How to handle more than two classes?
How to h

CSE 881: Data Mining
Lecture 10: Ensemble Classifiers
1
Ensemble Classifiers
Given
a training set D = cfw_ (xi,yi) (i=1,2,N)
For 2-class problems, assume yi cfw_-1, +1
Construct
a set of classifiers f1(x), f2(x), , fk(x)
from the training set
Predict

CSE 881: Data Mining
Lecture 6: Regression Analysis
1
Regression
Dependent variable (y)
Regression
Independent variable (x)
Regression attempts to explain the variability in the
dependent (target) variable in terms of the variability in
independent (predi

CSE 881: Data Mining
Lecture 17: Frequent Subgraph Mining
1
Frequent Subgraph Mining
Extends association analysis to finding frequent
subgraphs
Useful for Web Mining, computational chemistry,
bioinformatics, spatial data sets, etc
Homepage
Research
Artif

CSE 881: Data Mining
Lecture 18: Introduction to Clustering
1
What is Cluster Analysis?
Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups

CSE 881: Data Mining
Lecture 5: Dimensionality Reduction
1
Curse of Dimensionality
When dimensionality
increases, data becomes
increasingly sparse in the
space that it occupies
The density and distance
between points, which are
critical for clustering and

CSE 881: Data Mining
Lecture 9: Probabilistic Classifiers
1
Overview
Let x be the set of attributes and y the class label
Given:
A training set, D = cfw_ (x1, y1), (x2, y2), , (xN, yN) ,
where
each attribute set xi consists of d attributes
(xi1, xi2, ,

CSE 881: Data Mining
Lecture 5: Dimensionality Reduction
1
Outline
Curse of Dimensionality
Dimensionality reduction
Linear method (PCA)
Nonlinear method (Kernel PCA)
2
Curse of Dimensionality
Vd(r) : Volume of a ball in d dimensions
r: radius of ball

CSE 881: Data Mining
Lecture 9: Linear Classifiers
1
Linear Classifier
Employs a linear separating hyperplane to
separate instances from different classes
Linear Model : f ( x ) w T x wo
Examples: perceptron, linear SVM, Fishers linear
discriminant analys

CSE 881: Data Mining
Lecture 3: Overview (Probability, Statistics,
Analysis of Algorithms)
1
Why Probability Theory?
Real-world data are often noisy and uncertain
Models are often incomplete, unable to fully explain the
observed data or deterministically

CSE 881: Data Mining
Lecture 2: Overview (Linear Algebra)
1
Outline
This lecture:
Linear algebra (a study of matrices and vectors)
Next lecture:
Probability and statistics
Analysis of algorithms
2
Why Linear Algebra?
Most data sets can be represented a

CSE 881: Data Mining
Lecture 7: Regression Analysis
1
Regression
Response variable (y)
Regression
Predictor variable (x)
Regression attempts to explain the variability in the
dependent (target) variable in terms of the variability in
independent (predicto

CSE 881: Data Mining
Lecture 4: Data Quality and Preprocessing
1
Garbage In, Garbage Out
Quality of data mining output depends on quality of input data
2
Data Quality Issues
Data mining is often applied to opportunistic
samples (data that have already be

CSE 881: Data Mining
Lecture 8 (Classification - Introduction)
1
Classification: Definition
Classification is the task of predicting a nominal-valued
attribute (known as class label) based on the values of
other attributes (known as predictor variables)
T

CSE 881: Data Mining
Lecture 1: Introduction
1
What is Data Mining? Definition 1
A field of study in computer science that focuses on how
to automatically draw interesting insights from data
Lies at the intersection of database system, artificial
intellig

CSE 881: Data Mining
Lecture 21: Graph-based Clustering
1
Graph-based Clustering
Let
G = (V, E) be a graph
V: set of vertices, E: set of edges
We can transform any data to a graph representation
Vertices are the data points to be clustered
Edges are we

CSE 881: Data Mining
Lecture 3: Probability and Data Structure
(review)
1
Why Probability Theory?
Real-world data are often noisy and uncertain
Models are often incomplete, unable to fully explain
characteristics of the observed data or deterministically

CSE 881
Lecture 13 (Large-Scale Predictive Modeling
Part 2)
1
Distributed/Parallel Approach
Previous lecture focuses on two strategies for
scaling up data mining algorithms
Sampling-based approach
Online/Incremental learning approach
This lecture focuse

CSE 881: Data Mining
Lecture 11: Ensemble Classifiers
1
Ensemble Classifiers
Given a training set D = cfw_ (xi,yi) (i=1,2,N)
For 2-class problems, assume yi cfw_-1, +1
Construct an ensemble of classification models
f1(x), f2(x), , fk(x) from the trainin