CS229 Practice Midterm Solutions
1
CS 229, Autumn 2009
Practice Midterm Solutions
Notes:
1. The midterm will have about 56 long questions, and about 810 short questions. Space
will be provided on the actual midterm for you to write your answers.
2. The midterm is meant to be educational, and as such some questions could be quite
challenging. Use your time wisely to answer as much as you can!
3. For additional practice, please see CS 229 extra problem sets available at
http://see.stanford.edu/see/materials/aimlcs229/assignments.aspx
1.
[13 points] Generalized Linear Models
Recall that generalized linear models assume that the response variable
y
(conditioned on
x
) is distributed according to a member of the exponential family:
p
(
y
;
η
) =
b
(
y
) exp(
ηT
(
y
)

a
(
η
))
,
where
η
=
θ
T
x
. For this problem, we will assume
η
∈
R
.
(a)
[10 points]
Given a training set
{
(
x
(
i
)
, y
(
i
)
)
}
m
i
=1
, the loglikelihood is given by
‘
(
θ
) =
m
X
i
=1
log
p
(
y
(
i
)

x
(
i
)
;
θ
)
.
Give a set of conditions on
b
(
y
),
T
(
y
), and
a
(
η
) which ensure that the loglikelihood is
a concave function of
θ
(and thus has a unique maximum). Your conditions must be
reasonable, and should be as weak as possible. (E.g., the answer “any
b
(
y
),
T
(
y
), and
a
(
η
) so that
‘
(
θ
) is concave” is not reasonable. Similarly, overly narrow conditions,
including ones that apply only to specific GLMs, are also not reasonable.)
Answer:
The loglikelihood is given by
‘
(
θ
) =
M
X
k
=1
log(
b
(
y
)) +
η
(
k
)
T
(
y
)

a
(
η
(
k
)
)
where
η
(
k
)
=
θ
T
x
(
k
)
. Find the Hessian by taking the partials with respect to
θ
i
and
θ
j
,
∂
∂θ
i
‘
(
θ
) =
M
X
k
=1
T
(
y
)
x
(
k
)
i

∂
∂η
a
(
η
(
k
)
)
x
(
k
)
i
∂
2
∂θ
i
∂θ
j
‘
(
θ
) =
M
X
k
=1

∂
2
∂η
2
a
(
η
(
k
)
)
x
(
k
)
i
x
(
k
)
j
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
CS229 Practice Midterm Solutions
2
=
H
i,j
H
=

M
X
k
=1
∂
2
∂η
2
a
(
η
(
k
)
)
x
(
k
)
x
(
k
)
T
z
T
Hz
=

M
X
k
=1
∂
2
∂η
2
a
(
η
(
k
)
)(
z
T
x
(
k
)
)
2
If
∂
2
∂η
2
a
(
η
)
≥
0
for all
η
, then
z
T
Hz
≤
0
. If H is negative semidefinite, then the original
optimization problem is concave.
(b)
[3 points]
When the response variable is distributed according to a Normal distribu
tion (with unit variance), we have
b
(
y
) =
1
√
2
π
e

y
2
2
,
T
(
y
) =
y
, and
a
(
η
) =
η
2
2
. Verify
that the condition(s) you gave in part (a) hold for this setting.
Answer:
∂
2
∂η
2
a
(
η
) = 1
≥
0
.
2.
[15 points] Bayesian linear regression
Consider Bayesian linear regression using a Gaussian prior on the parameters
θ
∈
R
n
+1
.
Thus, in our prior,
θ
∼ N
(
~
0
, τ
2
I
n
+1
), where
τ
2
∈
R
, and
I
n
+1
is the
n
+1by
n
+1 identity
matrix. Also let the conditional distribution of
y
(
i
)
given
x
(
i
)
and
θ
be
N
(
θ
T
x
(
i
)
, σ
2
), as
in our usual linear leastsquares model.
1
Let a set of
m
IID training examples be given
(with
x
(
i
)
∈
R
n
+1
). Recall that the MAP estimate of the parameters
θ
is given by:
θ
MAP
= arg max
θ
m
Y
i
=1
p
(
y
(
i
)

x
(
i
)
, θ
)
!
p
(
θ
)
Find, in closed form, the MAP estimate of the parameters
θ
. For this problem, you should
treat
τ
2
and
σ
2
as fixed, known, constants. [Hint: Your solution should involve deriving
something that looks a bit like the Normal equations.]
Answer:
θ
MAP
=
arg max
θ
m
Y
i
=1
p
(
y
(
i
)

x
(
i
)
, θ
)
!
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Normal Distribution, Machine Learning, Optimization, Practice Midterm Solutions

Click to edit the document details