lecture_03

# Linear approximation algorithm 0 0 let i1 j

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: d X ∗∗ = (I − P 1 )X ∗ . 2 2 2 j Apply the LARS algorithm to solve 1 ˆ y ∗ − X ∗∗ β ∗ β ∗ = arg max − 2 β∗ 2N 3 4 2 − λ β∗ 1 . ˆ ˆ Compute β ◦ = (X 1 X 1 )−1 X 1 (y − X ∗ β ∗ ). 2 We use I1 to index the components of β ◦ , and I2 to index the components of β ∗ . The ﬁnal estimate of (13) is given by ˆ(1) βj = ˆ◦ βj ˆ∗ βj · λ ˆ(0) pλ (|βj |) when j ∈ I1 ; when j ∈ I2 . q q 4 q q q q q q q q q q 2 q q q q q q q q q q q q 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −2 q q q q q q −4 Coefficients q 0 1 2 λ 3 4 q q q 4 q q q q q q q q q q q q q q q q 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −2 q q q q q −4 Coefficients 2 q 0 1 2 λ 3 4 Penalized Likelihood If we assume i in (2) are i.i.d. N (0, σ 2 ), then N (2σ 2 )−1 i=1 (yi − xi β )2 is the logarithm of the conditional likelihood of y give X , and hence the penalized least squares can also be viewed as penalized likelihood. In general, the penalized likelihood function takes the form Q(β ) = 1 N p N i (β ) i=1 − pλ (|βj |), j =1 where i (β ) := i (xi β, yi , φ) is the log likelihood of the i-th training point (xi , yi ), with φ being some dispersion parameter. N ˆ Let (β ) = i=1 i (β ). For a given initial value β (0) (e.g. MLE), the log likelihood function can be locally approximated by a quadratic function ˆ (β ) ≈ (β (0) )+ 1 ˆ ˆ ˆ (β (0) )(β − β (0) )+ (β − β (0) ) 2 2 ˆ ˆ (β (0) )(β − β (0) ). Local Linear Approximation ˆ At the MLE β (0) , the gradient estimate is given by 1 ˆ ˆ (β − β (0) ) β (1) = arg max β ∈Rp 2 Write µi = xi β and written as i = ˆ (β (0) ) = 0, and hence the LLA p 2 ˆ ˆ (β (0) )(β − β (0) ) − j =1 i (µi , yi ), 2 ˆ(0) pλ (|βj |)|βj | . then the Hessian matrix can be ˆ (β (0) ) = X D X , where D is a N × N diagonal matrix with D ii = ∂ 2 i (µi ) ∂µ2 i , (0) (0) µi ˆ ˆ = xi β (0) . µi ˆ The LLA estimate can also be obtained using LARS algorithm....
View Full Document

## This note was uploaded on 10/01/2013 for the course FSRM 588 taught by Professor Xiao during the Fall '13 term at Rutgers.

Ask a homework question - tutors are online