# LEC4 - Model Selection Assessment and Decision Making...

This preview shows pages 1–4. Sign up to view the full content.

Model Selection, Assessment and Decision Making 1 Section 1 Introduction Page 02 Section 2 Loss Functions Page 03 Section 2.1 Loss Functions for Quantitative Responses Page 03 Section 2.2 Loss Functions for Qualitative Responses Page 04 Section 3 Selection and Assessment Page 05 Section 3.1 Data Splitting and Extensions Page 05 3.1.1 Cross-Validation Page 06 3.1.2 K-Fold Cross Validation Page 06 Section 3.2 Resampling Plans Page 07 3.2.1 Jackknife Page 07 3.2.2 Bootstrap Page 08 Section 4 AIC and BIC Page 11 Section 5 Evaluation of Two-Class Rules Page 14 Section 5.1 Confusion Matrix Page 14 Section 5.2 Bayes Decision Principle Page 15 Section 5.3 ROC Curve and c-Statistics Page 17 Section 6 Evaluation of Multiple-Class Rules Page 18 Appendix 1 References Page 21

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Model Selection, Assessment and Decision Making 2 Section 1 Introduction The model performance should not be assessed by the data that are used to build the model. Instead, it should be assessed by it prediction power, the performance on the data that were not used to developed the model. Loss function is a criterion that can be used to estimate the performance of a model obtained from a given data sample. The loss function for a simple model has high bias, i.e., the difference between the true model and the best possible model that we can possible to obtain in this model family. At the same time, the variance of this model is small since there are very few models available in this family of models. On the other hand, a model from a complex family of models has much higher variance and very low bias. This relationship can be depictured using the following Figure (taken from Chapter 7 of The Elementary of statistical Learning). From the figure above, we can see that
Model Selection, Assessment and Decision Making 3 the prediction error from training data set (the lower solid line) is a monotone decreasing function and it can approach to zero if the model becomes very complicated the prediction error from the testing data set (the upper solid line) has an optimal model complicity that gives the minimum testing error a simple model has high bias and low variance a complicated model has low bias and high variance bias and variance trade-off is the key for selecting a model with smallest overall prediction error Section 2 Loss Functions Loss function is a criterion that can be used to measure the quality of an estimate. Since the target variable to be estimated can be quantitative or categorical, loss function used to measure the quality of an estimate for these two types of target are very different. Section 2.1 Loss Functions for Quantitative Responses Assume that Y be a quantitative target variable and X be a set of predictors. Suppose that the fitted model based on training sample is ̂ ±²³ . The loss function for measuring errors between unknown target Y and an estimator ̂ ±²³ is ´(µ¶ ̂ ±²³) . The most common used loss functions are L 2 loss function (squared error) and L 1 loss function (absolute error). Squared loss function

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 09/22/2011 for the course STA 6714 taught by Professor Staff during the Spring '11 term at University of Central Florida.

### Page1 / 21

LEC4 - Model Selection Assessment and Decision Making...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online