mar23 - STA 414/2104 Mar 23, 2010 Notes I Class on...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STA 414/2104 Mar 23, 2010 Notes I Class on Thursday, Mar 25 I Takehome MT due Mar 25 I Trees and forests; Nearest neighbours and prototypes (Ch. 13) I Unsupervised Learning: Cluster analysis and Self-Organizing Maps (Ch. 14) I Netflix Prize: some details on the models and methods I www.fields.utoronto.ca/programs/scientific/ 1/30 STA 414/2104 Mar 23, 2010 A Decision Tree (Ripley, 1996) | vis=no error=SS stability=stab magn=Light,Medium,Strong error=MM stability=stab sign=pp magn=Light,Medium wind=tail magn=Strong vis=yes error=LX,MM,XL stability=xstab magn=Out error=LX,XL stability=xstab sign=nn magn=Out,Strong wind=head magn=Out auto 145/108 auto 128/0 noauto 17/108 noauto 12/20 auto 12/4 auto 12/0 noauto 0/4 noauto 0/16 noauto 5/88 noauto 5/24 noauto 5/8 auto 5/3 auto 4/0 noauto 1/3 auto 1/1 auto 1/0 noauto 0/1 noauto 0/2 noauto 0/5 noauto 0/16 noauto 0/64 2/30 STA 414/2104 Mar 23, 2010 Shuttle lander decision tree > library(MASS) > library(rpart) > data(shuttle) > shuttle[1:10,] stability error sign wind magn vis use 1 xstab LX pp head Light no auto 2 xstab LX pp head Medium no auto 3 xstab LX pp head Strong no auto 4 xstab LX pp tail Light no auto 5 xstab LX pp tail Medium no auto 6 xstab LX pp tail Strong no auto 7 xstab LX nn head Light no auto 8 xstab LX nn head Medium no auto 9 xstab LX nn head Strong no auto 10 xstab LX nn tail Light no auto > ?shuttle 3/30 STA 414/2104 Mar 23, 2010 ... shuttle lander > shuttle.rp = rpart(use ., data = shuttle, minbucket = 0, + xval = 0, maxsurrogate = 0, cp=0, subset = 1:253) > # from the MASS scripts; the default tree is much simpler > post(shuttle.rp,horizontal = F, height = 10, width = 8, + title = "", pointsize = 8, pretty = 0) #finally a nice looki > summary(shuttle.rp) Call: rpart(formula = use ., data = shuttle, subset = 1:253, minbucket = 0, xval = 0, maxsurrogate = 0, cp = 0) n= 253 CP nsplit rel error 1 0.84259259 0 1.00000000 2 0.03703704 1 0.15740741 3 0.00925926 4 0.04629630 4 0.00462963 8 0.00925926 5 0.00000000 10 0.00000000 Reference: Chapter 9 of Venables & Ripley, MASS 4/30 STA 414/2104 Mar 23, 2010 Random Forests Ch. 15 I trees are highly interpretable, but also quite variable I bagging (bootstrap aggregation) resamples from the data to build B trees, then averages I if X 1 ,..., X N independent ( , 2 ) , then var ( X ) = 2 / B I if corr ( X i , X j ) = > 0, then var ( X ) = 2 + 1- B 2 I 2 as B ; no benefit from aggregation I 2 B { 1 + ( B- 1 ) } I average many trees as in bagging, but reduce correlation using a trick: use only a random sample of m of the p input variables each time a node is split I m = O ( p ) , for example, or even smaller 5/30 STA 414/2104 Mar 23, 2010 ... random forests 6/30 STA 414/2104 Mar 23, 2010 ... random forests I email spam example in R I Figures 15.1, 4, 5 > spam2 = spam > names(spam2)=c(spam.names,"spam") > spam.rf = randomForest(x=as.matrix(spam2[spamtest==0,1:57]), y=spam2[spamtest==0,58] , importance=T)...
View Full Document

This document was uploaded on 08/12/2010.

Page1 / 30

mar23 - STA 414/2104 Mar 23, 2010 Notes I Class on...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online