{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}



Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 BIAS, VARIANCE , AND ARCING CLASSIFIERS Leo Breiman [email protected] Statistics Department University of California Berkeley, CA 94720 ABSTRACT Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. To study this, the concepts of bias and variance of a classifier are defined. Unstable classifiers can have universally low bias. Their problem is high variance. Combining multiple versions is a variance reducing device. One of the most effective is bagging (Breiman [1996a] ) Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting . Freund and Schapire [1995,1996] propose an algorithm the basis of which is to a daptively r esample and c ombine (hence the acronym--arcing) so that the weights in the resampling are increased for those cases most often missclassified and the combining is done by weighted voting. Arcing is more sucessful than bagging in variance reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. 1. Introduction Some classification and regression methods are unstable in the sense that small perturbations in their training sets or in construction may result in large changes in constructed predictor. Subset selection methods in regression, decision trees in regression and classification, and neural nets are unstable (Breiman [1996b]). Unstable methods can have their accuracy improved by perturbing and combining. That is--by generating multiple versions of the predictor by perturbing the training set or construction method and then combining these versions into a single predictor. For instance Ali [1995] generates multiple classification trees by choosing randomly from among the best splits at a node and combines trees using maximum likelihood. Breiman [1996b] adds noise to the response variable in regression to generate multiple subset regressions and then averages these. We use the generic of P&C (perturb and combine) to designate this group of methods. One of the most effective of the P&C methods is bagging (Breiman [1996a]). Bagging perturbs the training set repeatedly to generate multiple predictors and combines these by simple voting (classification) or averaging (regression). Let the training set T consist of N cases (instances) labeled by n = 1, 2, ..., N. Put equal probabilities p(n) = 1/N on each case, and using these probabilities, sample with replacement (bootstrap) N times from the training set T forming the resampled training set T (B) . Some cases in T may not appear in T (B) , some may appear more than once. Now use T (B) to construct the predictor, repeat the procedure and combine. Bagging applied to CART gave dramatic decreases in test set errors.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}