CS195f Homework 3
Mark Johnson and Erik Sudderth
Homework due at 2pm, 5th November 2009
This problem set asks you to investigate exponential or “Maximum Entropy” classifiers.
These involve probability distributions of the form:
P(
y

x
)
=
1
Z
x
(
w
)
exp (
w
·
f
(
y, x
))
,
where
Z
x
(
w
)
=
X
y
0
∈Y
exp (
w
·
f
(
y
0
, x
))
where:
•
y
∈ Y
is the class label we want to predict,
•
x
∈ X
are the conditioning or predictive variables,
•
f
(
y, x
)
∈
IR
m
is an
m
dimensional feature vector for pair (
y, x
), and
•
w
∈
IR
m
is an
m
dimensional weight vector, where
w
j
is the weight corresponding to
feature
f
j
(
y, x
).
Learning MaxEnt classifiers involves finding the weight vectors
w
given training data
D
and
the vector of feature functions
f
.
We’ll use a uniform Gaussian prior on the feature weights
w
, i.e.:
P(
w
)
∝
exp (

α
w
·
w
)
where
α
is a usersettable parameter that controls the degree of regularization.
Question 1:
1. Give an expression for the regularized negative log conditional likelihood of a generic
data set
D
= ((
x
1
, y
1
)
, . . . ,
(
x
n
, y
n
))
, ignoring any terms and factors that do not depend
on
w
.
2. Give an expression for the derivative of the regularized negative log likelihood with
respect to a feature weight
w
j
.
Now we will construct an estimator for the feature weights
w
. We will use the Nursery
data set that was used in previous exercises, which you can find in
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
/course/cs195f/asgn/naive_bayes/handout/nursery/nursery.mat
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 johnson
 Derivative, Machine Learning, Likelihood function, log conditional likelihood

Click to edit the document details