Chapter09

Chapter09 - Variable Selection and Model Building STAT 563...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Variable Selection and Model Building STAT 563 Spring 2007 Selection of final equation Two opposing views To make the equation useful for prediction purposes, we want to include as many predictors (original, transformed etc) as possible Because of the costs involved in obtaining information on a large number of transformed variables and subsequently monitoring them, we should like the equation to include as few predictors as possible There is no unique statistical procedure for doing this Compromise via selecting the best regression equation Dataset Supervisor Performance Data Y: overall rating of job being done by supervisor X1: handles employee complaints X2: does not allow special privileges X3: opportunity to learn new things X4: raises based on performance X5: too critical of poor performance X6: rate of advancing to better jobs Criteria for evaluating equations Mean square error (MSE) Between two equations, usually the one with smaller MSE is preferred especially the objective is forecasting Recall that MSE related to R 2 . We can use the original R 2 or the adjusted R 2 for judging the adequacy of a fit Recall 1 , + =- = k p p n SSE MSE p p R-square Recall Note that R 2 (adjusted) is more appropriate when comparing models with different number of predictors because it adjusts (penalizes) for the number of predictors in the model ) 1 ( 1 1 ) 1 ( 1 ) ( 1 2 2 2 p p a d j p p R p n n S S T M S E n R S S T M S E p n R---- =-- =-- = Mallows C p Recall from the earlier chapter that the mean square error of a fitted value as Define the total squared bias for a p-term equation as ) ( )] ( ) ( [ )] ( [ 2 2 i i i i i y Var y E y E y E y E +- =- Squared of bias Variance component 2 1 )] ( ) ( [ ) ( i n i i B y E y E p SSE- = = Mallows C p Define the standardized total mean square error as Recall that we have shown ) ( 1 ) ( ) ( )] ( ) ( [ 1 1 2 2 1 1 2 2 = = = + = +- = n i i B n i n i i i i p y Var p SS y Var y E y E 2 2 1 ) ( ) ( ] [ ) ( p n p SS SSE E p y Var B p n i i- + = = = Mallows C p Substituting we get Suppose is a good estimate of 2 . Then replacing E[SSE p ] by the observed value SSE p produces an estimate of p , say, { } p n SSE E p p n SSE E p p p 2 ] [ ) ( ] [ 1 2 2 2 2 +- = +-- = 2 p n SSE C p p 2 2 +- = Note If the p-term model has negligible bias, then SS B (p)=0. As a result, Plot C p versus p and look for models with C p values falling near the line C p =p (models with...
View Full Document

This note was uploaded on 03/08/2009 for the course 960 563 taught by Professor Unknown during the Spring '07 term at Rutgers.

Page1 / 56

Chapter09 - Variable Selection and Model Building STAT 563...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online