Logistic Regression
Rong Jin
Logistic Regression
Generative models often lead to linear
decision boundary
Linear discriminatory model
Directly model the linear decision boundary
w is the parameter to be decided
Logistic Regression
Logistic Regression
Middle Term Exam
03/04, in class
Project
It is a team work
No more than 2 people for each team
Define a project of your own
Otherwise, I will assign you to a tough project
Important date
03/23: project proposal
04/27 and 04/29: presentation
05/02: f
Machine Learning
Spring 2013
Rong Jin
1
CSE847 Machine Learning
Instructor: Rong Jin
Office Hour:
Textbook
Tuesday 4:00pm-5:00pm
TA, Qiaozi Gao, Thursday 4:00pm-5:00pm
Machine Learning
The Elements of Statistical Learning
Pattern Recognition and Machine L
Online Learning
Rong Jin
Batch Learning
Given a collection of training examples D
Learning a classification model from D
What if training examples are received one
at each time ?
Online Learning
For t=1, 2, T
Receive an instance
Predict its class lab
Homework II
Due: 01/29/2015 (before class)
January 22, 2015
Problem 1 (10pt): Noise Model
In the class, we assume the following data generative model
t = y(x; w) +
where N ( |0;
distribution, i.e.,
1
. We now modify the data generative model by assuming t
Homework 3
Problem 1. Using the lasso regularization, the weights w for each is given in
Figure 1 and the test error is given in Figure 2.
Figure 1: Weights for each using lasso regularization
From Figure 1, we can see that as increases, the weights becom
Homework 2
Problem 1. Since
t = y(x, w) +
and
p(|) =
then we have
p(t|x, w, ) =
exp(|),
2
exp(|t y(x, w)|).
2
Given observed inputs, X = cfw_x1 , ., xN , and targets, t = [t1 , ., tN ]T , we obtain
the likelihood function
N
Y
exp(|tn wT (xn )|).
p(t|X, w
Homework 6
Due: 03/03/2015 (before class)
February 23, 2015
Problem 1 (20 pt) Regularized Logistic Regression
Let D = cfw_(x1 , y1 ), . . . , (xn , yn ) be the training examples, where xi Rd and yi cfw_1, +1. The negative
log-likelihood function of the re
Overview of Clustering
Rong Jin
Outline
K means for clustering
Expectation Maximization algorithm for clustering
Spectrum clustering (if time is permitted)
Clustering
Find out the underlying structure for given data
points
$
age
Application (I): Search Re
Homework 7
Due 03/31/2015 (before class)
March 23, 2015
Problem 1 (20pt): Train and Test Support Vector Machine
Download the SVM software from the website http:/svmlight.joachims.org/. Read the documentation
and the example that is provided in the webpage
Data Classification
Rong Jin
Classification Problems
Given input:
Predict the output (class label)
Binary classification:
Multi-class classification:
Learn a classification function:
Regression:
Examples of Classification Problem
Text categorization
Boosting
Rong Jin
Inefficiency with Bagging
Bagging
D
Inefficient boostrap sampling:
Every example has equal chance to be
sampled
No distinction between easy
examples and difficult examples
Boostrap Sampling
D1
D2
Dk
Inefficient model combination:
A co
Introduction to Probability
Theory
Rong Jin
Outline
Basic concepts in probability theory
Bayes rule
Random variable and distributions
Definition of Probability
Experiment: toss a coin twice
Sample space: possible outcomes of an experiment
Event: a subset
Expectation Maximization Algorithm
Rong Jin
A Mixture Model Problem
20
18
16
14
12
10
8
6
4
2
0
0
5
10
15
20
25
Apparently, the dataset consists of two modes
How can we automatically identify the two modes?
Gaussian Mixture Model (GMM)
Assume that the dat
Homework 3
Due: 02/10/2015 (before class)
February 1, 2015
Problem 1 (20pt): Experiment with Lasso Regularization
Data set A data set is provided in the file diabetes.mat that can be downloaded from http:/www.cse.
msu.edu/~cse847/assignments/diabetes.mat.
Homework 4
Problem 1.
Z
p(D|M) =
p(D|w)p(w)dw p(D|wM AP )
wposterior
wprior
By
~
p(t|x, w) = N (t|wT (x),
1 ),
we have
~ i ), 1 ).
p(ti |xi , w) = N (ti |wT (x
Then,
p(t|w) =
N
X
i=1
p(ti |xi , w) =
N
X
~ i ), 1 ) = N (t|w, 1 I)
N (ti |wT (x
i=1
where
~
Bayesian Learning
Rong Jin
Outline
r r
w w r
w
MAP learning vs. ML learning
Minimum description length principle
Bayes optimal classifier
Bagging
Maximum Likelihood Learning (ML)
Find the best model by maximizing the loglikelihood of the training data
Homework 9
Due: April 23, 2015 (before class)
April 16, 2015
Problem 1: Hidden Markov Model (20pt)
We denote by = hN, M, , a, bi the Hidden Markov Model, where
N : the number of states
M : the number of possible observations (or tokens)
pi = (1 , 2 , .
Homework 5
Problem 1. The classification accuracy over the test documents that I got is
0.8068 when = 0.1. Here is the code:
function [accuracy] = hw5_1()
train = dlmread(data/train.data);
trainLabel = dlmread(data/train.label);
test = dlmread(data/test.d
Homework 4
Due: 02/17/2015 (before class)
February 9, 2015
Problem 1 (20pt): Bayesian model selection
In this homework, you are asked to compute the result of Bayesian model selection for linear regression model.
~
Let M be a family of linear regression m
function hw10()
Data = dlmread('mixture_data.txt');
C = zeros(6,1);
for ia = 0:5
C(ia+1) = sum(Data = ia);
end
%Part A
c = sum(C(1:2);
d = sum(C(3:6);
p = 3*d/(2*(c+d);
disp('Part A. independent, no bias');
disp(' The probability to roll the die is:');
di
Homework 3
Due: 02/10/2015 (before class)
February 1, 2015
Problem 1 (20pt): Experiment with Lasso Regularization
Data set A data set is provided in the file diabetes.mat that can be downloaded from http:/www.cse.
msu.edu/~cse847/assignments/diabetes.mat.
Information Filtering
Rong Jin
1
Outline
Brief introduction to information filtering
Collaborative filtering
Adaptive filtering
2
Short vs. Long Term Info. Need
Short-term information need (Ad hoc retrieval)
Temporary need, e.g., info about used cars
Info
Semi-supervised Learning
Rong Jin
Spectrum of Learning Problems
What is Semi-supervised Learning
Learning from a mixture of labeled and unlabeled examples
Labeled Data
Unlabeled Data
L = f (xl1; y1); : : : ; (xln l ; yn l )g
Total number of examples:
N =
Homework II
Due: 01/29/2015 (before class)
January 22, 2015
Problem 1 (10pt): Noise Model
In the class, we assume the following data generative model
t = y(x, w) +
where N (|0,
distribution, i.e.,
1
. We now modify the data generative model by assuming t
Homework 4
Due: 02/17/2015 (before class)
February 9, 2015
Problem 1 (20pt): Bayesian model selection
In this homework, you are asked to compute the result of Bayesian model selection for linear regression model.
Let M be a family of linear regression mod
Homework 7
Due 03/31/2015 (before class)
March 23, 2015
Problem 1 (20pt): Train and Test Support Vector Machine
Download the SVM software from the website http:/svmlight.joachims.org/. Read the documentation
and the example that is provided in the webpage
Homework 8
Due: April 9, 2015 (before class)
March 31, 2015
Problem 1 (15pt) Hedge Algorithm
In class, we discussed the Hedge algorithm, which learns positive weights to combine the predictions from multiple
experts/classifiers. In this problem, you are a
Homework 5
Due: 02/24/2015 (before class)
February 15, 2015
Problem 1 (20pt) Naive Bayes Classifier
In this homework, you are asked to implement a Naive Bayes classifier for text categorization. In particular, given a
document x = (x1 , x2 , . . . , xm ),
Homework 6
Due: 03/03/2015 (before class)
February 23, 2015
Problem 1 (20 pt) Regularized Logistic Regression
Let D = cfw_(x1 , y1 ), . . . , (xn , yn ) be the training examples, where xi 2 Rd and yi 2 cfw_ 1, +1. The negative
log-likelihood function of t