This preview shows page 1. Sign up to view the full content.
Unformatted text preview: riate normal prior (we will see later why this is a good
idea) with mean 0. The covariance matrix will be diagonal, it is I (the
8 identity matrix) times σ 2 /C , which is the known, constant variance of each
component of θ. So the prior is:
θ ∼ N (0, Iσ 2 /C ) = �−1
1
1 � exp − θT Iσ 2 /C
θ . 2 (2π )d/2 σ 2 /C 1/2 Here we are saying that our prior belief is that the “slope” of the regression
line is near 0. Now,
ˆ
θMAP ∈ arg max log p(y x, θ) + log p(θ)
θ m 1 m
(yi − θT xi )2 + log p(θ)
= arg max − 2
θ
2σ i=1
from (10). At this point you can really see how MAP is a regularized ML.
Continuing,
m 1 m
d
1 σ 2 1 T
T
2
(yi − θ xi ) − log 2π − log
= arg max
− 2
− θ (IC/σ 2 )θ
θ
2 2 C
2 2σ i=1
m 1 m
1
(yi − θT xi )2 − 2 C θT θ
= arg max − 2
θ
2σ i=1
2σ
m
m
= arg min
(yi − θT xi )2 + C IθI2 .
2
θ (12) i=1 We see that the MAP estimate corresponds exactly to R2 regularized linear
regression (ridge regression), and that the R2 reg...
View
Full
Document
This note was uploaded on 03/24/2014 for the course MIT 15.097 taught by Professor Cynthiarudin during the Spring '12 term at MIT.
 Spring '12
 CynthiaRudin

Click to edit the document details