This preview shows pages 1–7. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Bagging and Boosting 9.520 Class 10, 13 March 2006 Sasha Rakhlin Plan Bagging and subsampling methods BiasVariance and stability for bagging Boosting and correlations of machines Gradient descent view of boosting Bagging (Bootstrap AGGregatING) Given a training set D = { ( x 1 ,y 1 ) ,... ( x n ,y n ) } , sample T sets of n elements from D (with replacement) D 1 ,D 2 ,...D T T quasi replica training sets; train a machine on each D i , i = 1 ,...,T and obtain a sequence of T outputs f 1 ( x ) ,...f T ( x ). Bagging (cont.) The final aggregate classifier can be for regression f ( x ) = T f i ( x ) , i =1 the average of f i for i = 1 , ..., T ; for classification f ( x ) = sign( T f i ( x )) i =1 or the majority vote T f ( x ) = sign( sign( f i ( x ))) i =1 Variation I: Subsampling methods Standard bagging: each of the T subsamples has size n and created with replacement. Subbagging: create T subsamples of size only ( < n ). No replacement: same as bagging or subbagging, but using sampling without replacement Overlap vs nonoverlap: Should the T subsamples over n lap? i.e. create T subsamples each with T training data. Bias Variance for Regression (Breiman 1996) Let I [ f ] = ( f ( x ) y ) 2 p ( x , y ) d x dy be the expected risk and f the regression function. With f ( x ) = E S f S ( x ), if we define the bias as...
View
Full
Document
This note was uploaded on 11/11/2011 for the course BIO 9.07 taught by Professor Ruthrosenholtz during the Spring '04 term at MIT.
 Spring '04
 RuthRosenholtz

Click to edit the document details