CSE 881: Data Mining
Lecture 10: Nonlinear Classifiers
1
Outline
Kernel (Nonlinear) SVM
Multi-layer neural networks
Deep learning
2
Kernel SVM
What if decision boundary is not linear?
3
Kernel SVM
Trick: Transform data into higher dimensional
space
Deci

CSE 881: Data Mining
Lecture 10: Ensemble Classifiers
1
Ensemble Classifiers
Given
a training set D = cfw_ (xi,yi) (i=1,2,N)
For 2-class problems, assume yi cfw_-1, +1
Construct
a set of classifiers f1(x), f2(x), , fk(x)
from the training set
Predict

CSE 881: Data Mining
Lecture 6: Regression Analysis
1
Regression
Dependent variable (y)
Regression
Independent variable (x)
Regression attempts to explain the variability in the
dependent (target) variable in terms of the variability in
independent (predi

CSE 881: Data Mining
Lecture 17: Frequent Subgraph Mining
1
Frequent Subgraph Mining
Extends association analysis to finding frequent
subgraphs
Useful for Web Mining, computational chemistry,
bioinformatics, spatial data sets, etc
Homepage
Research
Artif

CSE 881: Data Mining
Lecture 21: Graph-based Clustering
1
Graph-based Clustering
Let
G = (V, E) be a graph
V: set of vertices, E: set of edges
We can transform any data to a graph representation
Vertices are the data points to be clustered
Edges are we

CSE 881: Data Mining
Lecture 3: Probability and Data Structure
(review)
1
Why Probability Theory?
Real-world data are often noisy and uncertain
Models are often incomplete, unable to fully explain
characteristics of the observed data or deterministically

CSE 881: Data Mining (Fall 2014) Homework 4 Solution
1. Apriori Algorithm
Consider the following set of candidate 3-itemsets:
cfw_a, b, c, cfw_a, b, d, cfw_a, b, e, cfw_a, c, d, cfw_a, c, e, cfw_a, c, f , cfw_a, d, e,
cfw_b, c, d, cfw_b, c, e, cfw_b, c, f

CSE 881: Data Mining
Lecture 11: Classification Miscellaneous
1
Classification
This
lecture discusses miscellaneous issues in
classification and methods to address them
How to deal with imbalanced classes?
How to handle more than two classes?
How to h

CSE 881: Data Mining
Lecture 18: Introduction to Clustering
1
What is Cluster Analysis?
Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups

CSE 881: Data Mining
Lecture 5: Dimensionality Reduction
1
Outline
Curse of Dimensionality
Dimensionality reduction
Linear method (PCA)
Nonlinear method (Kernel PCA)
2
Curse of Dimensionality
Vd(r) : Volume of a ball in d dimensions
r: radius of ball

CSE 881: Data Mining
Lecture 9: Linear Classifiers
1
Linear Classifier
Employs a linear separating hyperplane to
separate instances from different classes
Linear Model : f ( x ) w T x wo
Examples: perceptron, linear SVM, Fishers linear
discriminant analys

CSE 881: Data Mining
Lecture 3: Overview (Probability, Statistics,
Analysis of Algorithms)
1
Why Probability Theory?
Real-world data are often noisy and uncertain
Models are often incomplete, unable to fully explain the
observed data or deterministically

CSE 881: Data Mining
Lecture 2: Overview (Linear Algebra)
1
Outline
This lecture:
Linear algebra (a study of matrices and vectors)
Next lecture:
Probability and statistics
Analysis of algorithms
2
Why Linear Algebra?
Most data sets can be represented a

CSE 881: Data Mining
Lecture 7: Regression Analysis
1
Regression
Response variable (y)
Regression
Predictor variable (x)
Regression attempts to explain the variability in the
dependent (target) variable in terms of the variability in
independent (predicto

CSE 881: Data Mining
Lecture 4: Data Quality and Preprocessing
1
Garbage In, Garbage Out
Quality of data mining output depends on quality of input data
2
Data Quality Issues
Data mining is often applied to opportunistic
samples (data that have already be

CSE 881: Data Mining
Lecture 8 (Classification - Introduction)
1
Classification: Definition
Classification is the task of predicting a nominal-valued
attribute (known as class label) based on the values of
other attributes (known as predictor variables)
T

CSE 881: Data Mining
Lecture 1: Introduction
1
What is Data Mining? Definition 1
A field of study in computer science that focuses on how
to automatically draw interesting insights from data
Lies at the intersection of database system, artificial
intellig

CSE 881: Data Mining
Lecture 9: Probabilistic Classifiers
1
Overview
Let x be the set of attributes and y the class label
Given:
A training set, D = cfw_ (x1, y1), (x2, y2), , (xN, yN) ,
where
each attribute set xi consists of d attributes
(xi1, xi2, ,

CSE 881: Data Mining
Lecture 5: Dimensionality Reduction
1
Curse of Dimensionality
When dimensionality
increases, data becomes
increasingly sparse in the
space that it occupies
The density and distance
between points, which are
critical for clustering and

CSE 881: Data Mining
Lecture 4: Data Preprocessing
1
Data Quality Issues
Data
mining is often applied to opportunistic
samples (data that have already been collected)
Thus, preventing data quality issues (noise, missing
values, duplicate data, etc) thro

CSE 881: Data Mining
Lecture 23: Large-Scale Clustering
1
Large Scale Clustering
What is large?
Large number of data points to be clustered
Streaming (high velocity) data
High dimensionality (large number of attributes)
Large number of clusters
Strategies

CSE 881
Lecture 13 (Large-Scale Predictive Modeling
Part 2)
1
Distributed/Parallel Approach
Previous lecture focuses on two strategies for
scaling up data mining algorithms
Sampling-based approach
Online/Incremental learning approach
This lecture focuse

CSE 881: Data Mining
Lecture 11: Ensemble Classifiers
1
Ensemble Classifiers
Given a training set D = cfw_ (xi,yi) (i=1,2,N)
For 2-class problems, assume yi cfw_-1, +1
Construct an ensemble of classification models
f1(x), f2(x), , fk(x) from the trainin