This preview shows pages 1–2. Sign up to view the full content.
1
BIAS, VARIANCE , AND ARCING CLASSIFIERS
Leo Breiman
[email protected]
Statistics Department
University of California
Berkeley, CA 94720
ABSTRACT
Recent work has shown that combining multiple versions of unstable classifiers
such as trees or neural nets results in reduced test set error.
To study this, the
concepts of bias and variance of a classifier are defined.
Unstable classifiers
can have universally low bias.
Their problem is high variance.
Combining
multiple versions is a variance reducing device.
One of the most effective is
bagging (Breiman [1996a] )
Here, modified training sets are formed
by
resampling from the original training set, classifiers constructed using these
training sets and then combined by
voting .
Freund and Schapire [1995,1996]
propose an algorithm the basis of which is to
a
daptively
r
esample and
c
ombine
(hence the acronymarcing) so that the weights in the resampling are
increased for those cases most often missclassified and the combining is done by
weighted voting.
Arcing
is more sucessful than bagging in variance reduction.
We explore two arcing algorithms, compare them to each other and to bagging,
and try to understand how arcing works.
1.
Introduction
Some classification and regression methods are unstable in the sense that small perturbations in
their training sets or in construction may result in large changes in constructed predictor.
Subset
selection methods in regression, decision trees in regression and classification, and neural nets
are unstable (Breiman [1996b]).
Unstable methods can have their accuracy improved by perturbing and combining.
That isby
generating multiple versions of the predictor by perturbing the training set or construction
method and then combining these versions into a single predictor.
For instance Ali [1995]
generates multiple classification trees by choosing randomly from among the best splits at a
node and combines trees using maximum likelihood.
Breiman [1996b] adds noise to the response
variable
in regression to generate multiple subset regressions and then averages these.
We use
the generic of P&C
(perturb and combine) to designate this group of methods.
One of the most effective of the P&C methods is bagging (Breiman [1996a]).
Bagging perturbs
the training set repeatedly to generate multiple predictors and combines these by simple voting
(classification) or averaging (regression).
Let the training set
T
consist of N cases (instances)
labeled by n = 1, 2, .
.., N.
Put equal probabilities p(n) = 1/N
on each case, and using these
probabilities, sample
with replacement (bootstrap) N times from the training set T forming the
resampled training set T
(B)
.
Some cases in T may not appear in T
(B)
,
some may appear more
than once.
Now use T
(B)
to construct the predictor, repeat the procedure and combine.
Bagging
applied to CART gave dramatic decreases in test set errors.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 11/17/2010 for the course ST 333 taught by Professor Breiman during the Spring '10 term at University of California, Berkeley.
 Spring '10
 breiman
 Statistics, Variance

Click to edit the document details