Overfitting and Cross-Validation
Le Song
Machine Learning I
CSE 6740, Fall 2013
Apartment hunting
Suppose you are to move to Atlanta
And you want to find the most
reasonably priced apartment satisfyin
EM Algorithm and
Abnormal Data Detection
Le Song
Machine Learning I
CSE 6740, Fall 2013
Multimodal distributions
What if we know the data consists of a few Gaussians
What if we want to fit parametric
Mixture of Gaussian
Le Song
Machine Learning I
CSE 6740, Fall 2013
Why do we need density estimation?
Learn the shape of the data cloud
Assess the likelihood of seeing a particular data point
Is this
Review of Last Lecture
Le Song
Machine Learning I
CSE 6740, Fall 2013
Learning nonlinear decision boundary
Linearly separable
Nonlinearly separable
The XOR gate
Speech recognition
2
Perceptron
From bi
Distribution/Density Estimation
Le Song
Machine Learning I
CSE 6740, Fall 2013
What special clustering is good for
Cluster or group data
reachable to each
other by walking in
the data cloud
Key idea:
Clustering Nodes in Graphs
Le Song
Machine Learning I
CSE 6740, Fall 2013
Clustering is a subjective task
What is consider similar/dissimilar?
2
You pick your similarity/dissimilarity
3
Distance funct
Nonlinear Dimensionality Reduction
Le Song
Machine Learning I
CSE 6740, Fall 2013
An example
Data vary more
in this direction
Data vary less
in this direction
Two features
are correlated
2
Reduce to 1
Dimensionality Reduction
Le Song
Machine Learning I
CSE 6740, Fall 2013
Unsupervised learning
Learning from raw (unlabeled, unannotated, etc) data, as
opposed to supervised data where a classification
Clustering
Le Song
Machine Learning I
CSE 6740, Fall 2013
Nonlinear dimensionality reduction
A: walking-distance over the data cloud
B: nearest neighbor graph and shortest path
C: two dimensional redu
Neural Networks
Le Song
Machine Learning I
CSE 6740, Fall 2013
Learning nonlinear decision boundary
Linearly separable
Nonlinearly separable
The XOR gate
Speech recognition
2
A decision tree for Tax F
Bayes Decision Rule and
Nave Bayes Classifier
Le Song
Machine Learning I
CSE 6740, Fall 2013
Gaussian Mixture model
A density model () may be multi-modal: model it as a
mixture of uni-modal distributi
SVM and Decision Tree
Le Song
Machine Learning I
CSE 6740, Fall 2013
Which decision boundary is better?
Suppose the training samples
are linearly separable
We can find a decision boundary
which gives
Markov Random Fields &
Conditional Random Fields
Le Song
Machine Learning I
CSE 6740, Fall 2013
From static to dynamic mixture models
Static mixture
Dynamic mixture
Y1
Y1
Y2
Y3
.
YT
A
X1
A
X1
A
X2
A
X
Kernel Methods
Le Song
Machine Learning I
CSE 6740, Fall 2013
Nonlinear regression
Find a nonlinear prediction from input feature to output
2
Ridge regression
Given data points, find that minimizes th
Regression
Le Song
Machine Learning I
CSE 6740, Fall 2013
Rationale: Combination of methods
There is no algorithm that is always the most accurate
We can select simple weak classification or regressio
Kernel Methods and HMM
Le Song
Machine Learning I
CSE 6740, Fall 2013
Nonlinear regression
Find a nonlinear prediction from input feature to output
implicitly map data to a new nonlinear feature space
Combining Classifiers
and boosting
Le Song
Machine Learning I
CSE 6740, Fall 2013
Rationale: Combination of methods
There is no algorithm that is always the most accurate
We can select simple weak cla
Support Vector Machines
Le Song
Machine Learning I
CSE 6740, Fall 2013
Nave Bayes classifier
Still use Bayes decision rule for classification
=
But assume = 1 is fully factorized
= 1 =
( | = 1)
Discriminative classifier and
logistic regression
Le Song
Machine Learning I
CSE 6740, Fall 2013
Classification
Input data feature
A label is provided for each data point, eg., 1, +1
Classifier
2
Cla
Regularization and Kernel Methods
Le Song
Machine Learning I
CSE 6740, Fall 2013
Nonlinear regression
Want to fit a polynomial regression model
= 0 + 1 + 2 2 + + +
Let = 1, , 2 , ,
and = 0 , 1 , 2