This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat 315A Homework 2 Solutions November 13, 2008 1 Problem 1 (a) The classification rule for LDA is to classify as class 2 if 2 ( x ) > 1 ( x ). This gives x T  1 2 1 2 2 T  1 2 +log( N 2 N ) > x T  1 1 1 2 T 1  1 1 +log( N 1 N ) x T  1 ( 2 1 ) > 1 2 2 T  1 2 1 2 1 T  1 1 log( N 2 N ) + log( N 1 N ) where k = g i = k x i /N k and = K k =1 g i = k ( x i k )( x i k ) T / ( N K ). (b) Recall from linear regression that = ( X T X ) 1 X T Y . Note here that both X and Y are centered. (( X x ) T ( Y y )) j = N X i =1 ( x ij x j ) y i y = 0 . = X g i =1 ( x ij x j )( N N 1 ) + X g i =2 ( x ij x j )( N N 2 ) = N ( 2 1 ) + N x N x = N ( 2 1 ) . 1 ( N 2) + N 1 N 2 N B = X g i =1 ( x i 1 ) x T i + N 2 N ( 1 2 ) x T i + X g i =2 ( x i 2 ) x T i + N 1 N ( 1 2 ) x T i + N 1 N ( 2 1 ) x T i = X g i =1 ( x i x ) x T i + X g i =2 ( x i x ) x T i = N X i =1 ( x i x )( x i x ) T = ( X x )( X x ) T where x = 1 N ( N 1 1 + N 2 2 ). (c) Notice that ( 2 1 ) T is a scalar and we denote this by . Then from part b, ( N 2) = N ( 2 1 ) N 1 N 2 N B = ( 2 1 ) N N 1 N 2 N  1 ( 2 1 ) . Thus, the regression coefficient is proportional to the LDA coefficient. (d) To show that this holds for any distinct coding of Y , we must show that X T Y ( 2 1 ). WLOG we can consider Y and X to be centered, then, assume Y is coded as a/N 1 and a/N 2 . X T Y = N X i =1 x i y i = a N 1 X g i =1 x i + a N 2 X g i =2 x ij = a ( 2 1 ) . Thus,  1 ( 2 1 ). (e) From part d, =  1 ( 2 1 ) for some , and = x T , then f = + T X = 1 N ( N 1 1 + N 2 2 ) T  1 ( 2 1 ) + ( 2 1 ) T  1 X Thus, we classify to class 2 if f > 0, ( 2 1 ) T  1 X > 1 N ( N 1 1 + N 2 2 ) T  1 ( 2 1 ) 2 But, this is not the same as the LDA rule (below) unless N 1 = N 2 . ( 2 1 ) t  1 X > N 1 N h 2 T  1 2 1 T  1 1 i . 2 Problem 2 (a) Note that we have an optimization problem of the form minimize + , L ( ) subject to X j  j   t  + j ,  j , for j = 1 ...p. (1) Thus, we have 2 p + 1 inequality constraints. The Lagrangian dual function for (1) is L ( ) + ( X ( + j +  j ) t ) X + j + j X  j  j or equivalently because t does not depend on L ( ) + X ( + j +  j ) X + j + j X  j  j (2) where ( + j  j ) = ....
View Full
Document
 Winter '10
 TIBSHIRANI,R
 Statistics

Click to edit the document details