note12 - STAT5044 Regression and Anova Inyoung Kim

STAT5044: Regression and Anova Inyoung Kim

Outline 1 Variable selection
Added variable plot Scatter plot of residual ( Y | x 1 ,..., x k - 1 ) = Y axis and residual ( x k | x 1 ,..., x k - 1 ) = X axis Y axis =The residual ( Y | x 1 ,..., x k - 1 ) vs X axis =the residual ( x k | x 1 ,..., x k - 1 ) If there is a nice linear relationship in added variable plot, one should add x k into the model

Variable selection methods Sequential variable selction: forward selection backword elimination stepwise regression All possible subset selection using these criterians, adjusted R 2 , Mallot’s C p
Forward selection Adding variable (covariate) one at a time Step1: Put the covariate associated with the largest R 2 into the model Step2: Suppose x 2 ,..., x k - 1 are already in the model. choose the covariate with r 2 y , x k | x 1 ,..., x k - 1 ,the largest partial correlation coeff., as the next candiate. Step3: Evaluate F = SSE ( x 1 , x 2 ,..., x k - 1 ) - SSE ( x 1 , x 2 ,..., x k ) SSE ( x 1 ,..., x k ) / ( n - k - 1 ) add x k if F > F in (see next page) usually choose “large” α , ex, α = 0 .

