From static to dynamic mixture models
Static mixture
Markov Random Fields
Dynamic mixture
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2015
Y1
A1
X
N
Y1
Y2
Y3
.
YT
A1
X
A2
X
A3
X
.
XAT
2
HMM with discrete hidden states
Observation space for !"
Alp
What is Machine Learning (ML)
Study of algorithms that can discover patterns from uncertain
data, make prediction into future, and react to the environment.
Review
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2015
2
The need for machine learning
C
What is logistic regression model
Assume that the posterior distribution = 1 take a
particular form
1
= 1 , =
1 + exp( )
Logistic function =
1
1+exp
4
Ways to design classifier
Learning parameters in logistic regression
Bayes rule + assumption for = 1
F
Rationale: Combination of methods
There is no algorithm that is always the most accurate
We can select simple weak classification or regression
methods and combine them into a single strong method
Boosting
Different learners use different
Algorithms, Hype
From static to dynamic mixture models
Static mixture
Hidden Markov Models
Dynamic mixture
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2015
Y1
A1
X
N
Y1
Y2
Y3
.
YT
A1
X
A2
X
A3
X
.
XAT
2
Example: The Dishonest Casino
Hidden state transition diagra
Density Estimation
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2016
Limitation of PCA and SVD
Suitable when variables are linearly correlated
Not suitable when nonlinear structures are present
2
Whats a reasonable distance measure
long range dist
CS 7641 CSE/ISYE 6740 Mid-term Exam (2015 Fall) Q3 solution
December 5, 2015
1
Expectation Maximization [18 pts]
You have learned the Gaussian Mixture Model in class. For some discrete valued problems, like binary
images, Bernoulli Mixture Model (BMM) is
Machine learning for apartment hunting
Suppose you are to move to Atlanta
And you want to find the most
reasonably priced apartment satisfying
your needs:
Regression
square-ft., # of bedroom, distance to campus
Le Song
Machine Learning I
CSE 6740, Fall 2
CSE 6140:
NP-completeness
based partially on course slides from Jennifer
Welch, George Bebis, and Kevin Wayne
Basic reduction strategies.
Reduction by simple equivalence.
Reduction from special case to general case.
Reduction by encoding with gadgets.
Ind
CSE 6140:
NP-completeness
based partially on course slides from Jennifer
Welch, George Bebis, and Kevin Wayne
The Class NP
NP is the class of problems for which a candidate solution
can be verified in polynomial time
NP=nondeterministic polynomial
P is a
CSE 6140:
NP-completeness
based partially on course slides from Jennifer
Welch, George Bebis, and Kevin Wayne
CFN Satisfiability
CFN is a special case of SAT
F is in Conjunctive Normal Form (CNF)
AND of expressions (i.e., clauses)
Each clause contains o
CSE 6140/ CX 4140:
Computational Science and Engineering
ALGORITHMS
Instructor: Bistra Dilkina
Assistant Professor, CSE
Minimum Spanning Tree
Minimum spanning tree. Given a connected graph G = (V, E) with realvalued edge weights ce, an MST is a subset of
CS 7641 CSE/ISYE 6740 Homework 2 Solutions
October 11, 2016
1
EM for Mixture of Gaussians
Mixture of K Gaussians is represented as
p(x) =
K
X
k N (x|k , k ),
(1)
k=1
where k represents the probability
that a data point belongs to the kth component. As it
Mixture of Gaussian &
Feature Selection
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2016
Gaussian mixture model
A density model
may be multi-modal: model it as a
mixture of uni-modal distributions (e.g. Gaussians)
,
1
1
2
2
Consider a mixture of
Why recommendation?
Collaborative Filtering
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2015
Tapestry: [Goldberg1992]
2
Examples
Recommendation vs. Advertisement
Product recommendation
Q. Is advertisement recommendation?
A.
Yes, in broad sense.
F
Mixture of Gaussian
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2016
Why do we need density estimation?
Learn more about the shape of the data cloud
Assess the likelihood of seeing a particular data point
Is this a typical data point? (high densi
Nonlinear Dimensionality Reduction
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2016
Principal direction of variation in the dataset
Data vary more
in this direction
Data vary less
in this direction
Two features
are correlated
2
Principal componen
CSE 6140 Assignment 1
due Sept. 15, 2016 at 11:55pm EDT on T-Square
Please upload 1) a PDF with solutions of Problems 1, 2 and 3; 2) a PDF of
your report for Problem 4; 3) a single zip file of your code, README, results
for Problem 4.
1
Simple Complexity
CSE 6740 Lecture 12
How Do I Treat Temporal Data? II (Hidden Markov Models) Alexander Gray (Thanks to Nishant Mehta) agray@cc.gatech.edu
Georgia Institute of Technology
What Are Hidden Markov Models?
Hidden Markov models (HMMs) are discrete Markov process
CSE 6740 Lecture 13
How Do I Make Fancier Models? I (Graphical Models)
Alexander Gray (Thanks to Nishant Mehta) agray@cc.gatech.edu
Georgia Institute of Technology
What are graphical models?
For a set of random variables, a graphical model eciently exhibi
CSE 6740 Lecture 22
How Do I Evaluate Deeply-Nested Sums? (Graphical Model Inference)
Alexander Gray
agray@cc.gatech.edu
Georgia Institute of Technology
CSE 6740 Lecture 22 p. 1/4
Today
1. Graphical Model Computations 2. Exact Inference Algorithms 3. Appr
CSE 6740 Lecture 17
What Loss Function Should I Use? II (Estimation Theory)
Alexander Gray
agray@cc.gatech.edu
Georgia Institute of Technology
CSE 6740 Lecture 17 p. 1/3
Today
1. Robustness (How safe/stable is my loss function?) 2. Comparing Estimators (H
CSE 6740 Lecture 18
How Do I Ensure Generalization? (Model Selection and Combination)
Alexander Gray
agray@cc.gatech.edu
Georgia Institute of Technology
CSE 6740 Lecture 18 p. 1/2
Today
1. The bootstrap 2. Model combination methods
CSE 6740 Lecture 18 p.
CSE 6740 Lecture 14
How Can I Learn Fancier Models? II (Kernelization)
Alexander Gray
agray@cc.gatech.edu
Georgia Institute of Technology
CSE 6740 Lecture 14 p. 1/2
Today
1. How to make more complex models using kernels 2. Theory motivating kernels
CSE 67
CSE 6740 Lecture 9
How Can I Reduce/Relate the Data Points? (Association and Clustering)
Alexander Gray
agray@cc.gatech.edu
Georgia Institute of Technology
CSE 6740 Lecture 9 p. 1/3
Today
1. Clustering 2. Associations Central tasks for data mining.
CSE 67
CSE 6740 Lecture 15
What Error Guarantees Can We Make? (Learning Theory and Generalization)
Alexander Gray
agray@cc.gatech.edu
Georgia Institute of Technology
CSE 6740 Lecture 15 p. 1/2
Today
1. Statistical inequalities (How can we bound values that can a