{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

mar23 - STA 414/2104 Notes Class on Thursday Mar 25...

Info icon This preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
STA 414/2104 Mar 23, 2010 Notes I Class on Thursday, Mar 25 I Takehome MT due Mar 25 I Trees and forests; Nearest neighbours and prototypes (Ch. 13) I Unsupervised Learning: Cluster analysis and Self-Organizing Maps (Ch. 14) I Netflix Prize: some details on the models and methods I www.fields.utoronto.ca/programs/scientific/ 1 / 30
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
STA 414/2104 Mar 23, 2010 A Decision Tree (Ripley, 1996) | vis=no error=SS stability=stab magn=Light,Medium,Strong error=MM stability=stab sign=pp magn=Light,Medium wind=tail magn=Strong vis=yes error=LX,MM,XL stability=xstab magn=Out error=LX,XL stability=xstab sign=nn magn=Out,Strong wind=head magn=Out auto 145/108 auto 128/0 noauto 17/108 noauto 12/20 auto 12/4 auto 12/0 noauto 0/4 noauto 0/16 noauto 5/88 noauto 5/24 noauto 5/8 auto 5/3 auto 4/0 noauto 1/3 auto 1/1 auto 1/0 noauto 0/1 noauto 0/2 noauto 0/5 noauto 0/16 noauto 0/64 2 / 30
Image of page 2
STA 414/2104 Mar 23, 2010 Shuttle lander decision tree > library(MASS) > library(rpart) > data(shuttle) > shuttle[1:10,] stability error sign wind magn vis use 1 xstab LX pp head Light no auto 2 xstab LX pp head Medium no auto 3 xstab LX pp head Strong no auto 4 xstab LX pp tail Light no auto 5 xstab LX pp tail Medium no auto 6 xstab LX pp tail Strong no auto 7 xstab LX nn head Light no auto 8 xstab LX nn head Medium no auto 9 xstab LX nn head Strong no auto 10 xstab LX nn tail Light no auto > ?shuttle – 256 possible combinations of factors: 253 have been 3 / 30
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
STA 414/2104 Mar 23, 2010 ... shuttle lander > shuttle.rp = rpart(use ˜ ., data = shuttle, minbucket = 0, + xval = 0, maxsurrogate = 0, cp=0, subset = 1:253) > # from the MASS scripts; the default tree is much simpler > post(shuttle.rp,horizontal = F, height = 10, width = 8, + title = "", pointsize = 8, pretty = 0) #finally a nice looki > summary(shuttle.rp) Call: rpart(formula = use ˜ ., data = shuttle, subset = 1:253, minbucket = 0, xval = 0, maxsurrogate = 0, cp = 0) n= 253 CP nsplit rel error 1 0.84259259 0 1.00000000 2 0.03703704 1 0.15740741 3 0.00925926 4 0.04629630 4 0.00462963 8 0.00925926 5 0.00000000 10 0.00000000 Reference: Chapter 9 of Venables & Ripley, MASS 4 / 30
Image of page 4
STA 414/2104 Mar 23, 2010 Random Forests Ch. 15 I trees are highly interpretable, but also quite variable I bagging (bootstrap aggregation) resamples from the data to build B trees, then averages I if X 1 , . . . , X N independent ( μ, σ 2 ) , then var ( ¯ X ) = σ 2 / B I if corr ( X i , X j ) = ρ > 0, then var ( ¯ X ) = ρσ 2 + 1 - ρ B σ 2 I ρσ 2 as B → ∞ ; no benefit from aggregation I σ 2 B { 1 + ρ ( B - 1 ) } I average many trees as in bagging, but reduce correlation using a trick: use only a random sample of m of the p input variables each time a node is split I m = O ( p ) , for example, or even smaller 5 / 30
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
STA 414/2104 Mar 23, 2010 ... random forests 6 / 30
Image of page 6
STA 414/2104 Mar 23, 2010 ... random forests I email spam example in R I Figures 15.1, 4, 5 > spam2 = spam > names(spam2)=c(spam.names,"spam") > spam.rf = randomForest(x=as.matrix(spam2[spamtest==0,1:57]), y=spam2[spamtest==0,58] , importance=T) > varImpPlot(spam.rf) > table(predict(spam.rf, newdata = as.matrix(spam2[spamtest==1,])),spam2[spamtest==1,58]) email spam email 908 38 spam 33 557 > .Last.value/sum(spamtest) email spam email 0.591146 0.024740 spam 0.021484 0.362630 > .0247+.02148 [1] 0.04618 7 / 30
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
STA 414/2104 Mar 23, 2010 ... random forests ; mail font receive email 650 pm internet will money meeting 000 business hpl ( you re our 1999 total george edu your free longest hp average remove $ !
Image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern