lecture_04

Paths 4 scad lla lasso estimates 3 06 2 04 1 02 0 00

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: orithm. 1 For j ∈ I2 , let x∗ = j 2 λ (0) ˆ pλ (|βj |) · xj , y ∗ = (I − P 1 )y , X ∗ be the matrix with columns {x∗ : j ∈ I2 } and X ∗∗ = (I − P 1 )X ∗ . 2 2 2 j Apply the LARS algorithm to solve 1 ˆ y ∗ − X ∗∗ β ∗ β ∗ = arg max − 2 β∗ 2N 3 4 2 − λ β∗ 1 . ˆ ˆ Compute β ◦ = (X 1 X 1 )−1 X 1 (y − X ∗ β ∗ ). 2 We use I1 to index the components of β ◦ , and I2 to index the components of β ∗ . The final estimate of (14) is given by ˆ(1) βj = ˆ◦ βj ˆ∗ βj · λ ˆ(0) pλ (|βj |) when j ∈ I1 ; when j ∈ I2 . q q 4 q q q q q q q q q q 2 q q q q q q q q q q q q 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −2 q q q q q q −4 Coefficients q 0 1 2 λ 3 4 q q q 4 q q q q q q q q q q q q q q q q 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −2 q q q q q −4 Coefficients 2 q 0 1 2 λ 3 4 Penalized Likelihood If we assume i in (2) are i.i.d. N (0, σ 2 ), then N (2σ 2 )−1 i=1 (yi − xi β )2 is the logarithm of the conditional likelihood of y give X , and hence the penalized least squares can also be viewed as penalized likelihood. In general, the penalized likelihood function takes the form Q(β ) = 1 N p N i (β ) i=1 − pλ (|βj |), j =1 where i (β ) := i (xi β, yi , φ) is the log likelihood of the i-th training point (xi , yi ), with φ being some dispersion parameter. N ˆ Let (β ) = i=1 i (β ). For a given initial value β (0) (e.g. MLE), the log likelihood function can be locally approximated by a quadratic function ˆ (β ) ≈ (β (0) )+ 1 ˆ ˆ ˆ (β (0) )(β − β (0) )+ (β − β (0) ) 2 2 ˆ ˆ (β (0) )(β − β (0) ). Local Linear Approximation ˆ At the MLE β (0) , the gradient estimate is given by 1 ˆ ˆ (β − β (0) ) β (1) = arg max β ∈Rp 2 Write µi = xi β and written as i = ˆ (β (0) ) = 0, and hence the LLA p 2 ˆ ˆ (β (0) )(β − β (0) ) − j =1 i (µi , yi ), 2 ˆ(0) pλ (|βj |)|βj | . then the Hessian matrix can be ˆ (β (0) ) = X D X , where D is a N × N diagonal matrix with D ii = ∂ 2 i (µi ) ∂µ2 i , (0) (0) µi ˆ ˆ = xi β (0) . µi ˆ The LLA estimate can also be obtained using LARS algorithm. Outline 1 Penalized Least Squares 2 Principal Component Analysis Supervised Learning Predictions based on the training sample (x1 , y1 ), . . . , (xN , yN ). The student presents an answer yi for each xi in the training sample. ˆ The supervisor or “teacher” provides either the correct answer and/or an error associated with the student’s answer, usually given by some loss function L(y, y ). ˆ If one supposes that (X, Y ) are random variables represented by some joint probability density Pr(X, Y ), then supervised learning can be formally characterized as a density estimation problem where one is concerned with determining proper...
View Full Document

This note was uploaded on 10/01/2013 for the course FSRM 588 taught by Professor Xiao during the Fall '13 term at Rutgers.

Ask a homework question - tutors are online