This preview shows page 1. Sign up to view the full content.
Unformatted text preview: assume that the noise is distributed normally with zero mean and
some known variance σ 2 : E ∼ N (0, σ 2 ). This, together with our assumption of
independence, allows us to express the likelihood function for the observations
y = {y1 , . . . , ym }:
p(y x, θ) =
= m
m
i=1
m
m
i=1 p(yi xi , θ) = m
m N (yi ; θT xi , σ 2 ) i=1 1
1
√
exp − 2 (yi − θT xi )2 ,
2σ
2πσ 2 where the ﬁrst line follows from independence. As usual, it is more convenient
to work with the loglikelihood:
�m
�
m1
1
√
exp − 2 (yi − θT xi )2
R(θ) = log
2σ
2πσ 2
i=1
m m
1m
= − log(2πσ 2 ) − 2
(yi − θT xi )2 .
2
2σ i=1
The ML estimate is
m 1m
ˆ
θML ∈ arg max R(θ) = arg max − 2
(yi − θT xi )2
θ
θ
2σ i=1
= arg min
θ m
m (yi − θT xi )2 . (10)
(11) i=1 The ML estimate is equivalent to ordinary least squares!
Now let us try the MAP estimate. We saw in our coin toss example that the
MAP estimate acts as a regularized ML estimate. For probabilistic regression,
we will use a multiva...
View Full
Document
 Spring '12
 CynthiaRudin

Click to edit the document details