Large-Scale Prediction

# Large-Scale Prediction - Empirical Bayes Estimates for...

This preview shows pages 1–3. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Empirical Bayes Estimates for Large-Scale Prediction Problems Bradley Efron * † Abstract Classical prediction methods such as Fisher’s linear discriminant function were designed for small-scale problems, where the number of predictors N is much smaller than the number of observations n . Modern scientific devices often reverse this situation. A microarray analysis, for example, might include n = 100 subjects measured on N = 10 , 000 genes, each of which is a potential predictor. This paper proposes an empirical Bayes approach to large-scale prediction, where the optimum Bayes prediction rule is estimated employing the data from all the predictors. Microarray examples are used to illustrate the method. The results show a close connection with the shrunken centroids algorithm of Tibshirani et al. (2002), a frequentist regularization approach to large-scale prediction, and also with false discovery rate theory. Keywords: microarray prediction, empirical Bayes, shrunken centroids, effect size estimation, correlated predictors, local fdr. 1 Introduction An important class of prediction problems begins with the observation of n independent vectors, ( x j ,y j ) j = 1 , 2 ,...,n. (1.1) Here x j is a N-vector of predictors, while y j is a real-valued response, taken to be dichotomous in most of what follows. For example, x j might include age, height, weight, gender, etc. for person j , while y j indicates whether or not that person later developed cancer. Given a newly observed N-vector X , we would like to predict its corresponding Y value. Our task is to use the “training data” (1.1) to construct an effective prediction rule. Classic prediction methods, such as Fisher’s linear discriminant function, were fashioned for problems where N is much smaller than n , that is, where the number of predictors is less than the number of training cases. Current high-throughput scientific technology tends to produce just the opposite situation, with N n ; modern equipment may permit thousands of measurements on a single individual, but recruiting new subjects remains as difficult as ever. Microarrays offer the iconic example. Here x j is a vector of genetic expression measurements on subject j , one for each of N genes, where N is typically several thousand. In the prostate cancer data (Singh et al., 2002) we will use for motivation, there are N = 6033 genes measured * Department of Statistics, Stanford University † This work was supported in part by NIH grant 8R01 EB002784 and NSF grant DMS0505673. 1 on each of n = 102 men, n 1 = 50 healthy controls and n 2 = 52 prostate cancer patients. Given a new microarray measuring the same 6033 genes, we would like to predict whether or not that man develops prostate cancer....
View Full Document

## This note was uploaded on 12/26/2011 for the course ECON 245a taught by Professor Staff during the Fall '08 term at UCSB.

### Page1 / 25

Large-Scale Prediction - Empirical Bayes Estimates for...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online