{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

boost2 - REGULARIZATION M T> training error i Not true...

Info icon This preview shows pages 1–21. Sign up to view the full content.

View Full Document Right Arrow Icon
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 14
Image of page 15

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 16
Image of page 17

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 18
Image of page 19

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 20
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: REGULARIZATION M T :> training error i Not true for FUTURE prediction error M T :> prediction error J, then T (overfitting) => optimal value for M : M* < oo Estimate M of M* with independent "test" set. Early stopping (neural nets). General theory: bias—variance trade-off. £u+w ch‘llfl“ Al‘s l< twin»: (”‘3 01¢?»er Seleachm M075 ML"! w‘afl £0 MgmlaAl-EJL GJIQQLL-em‘t beefi‘l‘ima DLIOMX{M“"'L€A: A} (awn-m) :- wwm ZLttst, 5.4-. C553+QTQELH Q: TALK.) L=l IQQM-F-flZLLm-j‘ Fwtgg) :' Fm—LC£3+QNV\ TMC3§5 ME”! high: £1946” jMeoLchW.mo~€Q Wow“ WW3 = SHRINKAGE Update: Fm(x) = Fm_1(x) + u - Tm (x) “learning rate" 0 < 1/ g 1 Absolute error 0 0 0 2 0 4 0 6 0 8 -2|og-J ikeli hood 04 ‘ID 12 0.8 00 200 200 LS_TreeBoost E a 3 2 8 .0 < 400 600 800 1000 Iterations L2_TreeBoost 2 E E» as 3 c_u E E 400 600 800 1000 Iterations .. LAD_TreeBoost O (D O {D 0 VP 0 N O O D 0 20.0 400 600 800 1000 flerations L2_TreeBoost O c; O o‘ 2 cs 0 O 0 200 400 600 800 100.0 Iterations SHRINKAGE Empirical fact (all boosting procedures): 1/ l :> future prediction error i (a lot!) provided M'*(1/) T as 1/ l diminishing return for Small 1/ < 0.1 WHY? C (3mg {04% 3 LS—regression on all possible (J—term. node) trees: m 2k - A 2' {am} = arg mm{am} ny [y — gamTMXH (pop) {am} = arg min{am}2§:1(yi * gamTsz-Dz —l—)\ - P({am}) (data) P ({am}) 2 z aEn “ridge" SVM P ({am}) : Z | amI ”lasso” “NOOSE" (lea st—sq ua res) Initialize: {am 2 0}, l: 0 Loop: ([256?) Z . 2 arg m'"k,a gill (192' — Z: Gme(Xz‘) — aTk(X2‘)) Zn.” CLUE a]; <— al} + 5 - Signmz) lr-l-i-l afl FACT As a —> 0, noose % lasso, with lmax m 1/)\ noose E lasso: C0" (TMXi): Tm(Xi)) = 0 51mm : monotone Therefore: noose (lmax : 1, 00) generates all lasso sol’ns (00 g A S 0) See 84AM ) Has-Erie, K, TEESLUAML (2003) Lt LEMt Ovvxgle MWlMH bays-l PMyMln/l'é PMS‘La—EE’. Car/mom. 04.06111. (”1:01) ,1qu7) Forward Stagewise mm a! A 0 q- ._ c5 gvi W 2! :2 5 g or , mph '5 '5 D E E a) a: D O O O D _ CS gleasm age :3! , C.’ lop I 4 I r I I 0.0 0.5 1.0 1.5 2.0 2.5 0 50 100 150 200 250 sum(abs(cuefficients)) Iteration Figure 1: MART (with shrinkage) 2 NOOSE Learning rate 1/ ~ increment a Number of trees M N lmax m l/A Forward stagewise (boosting) with shrinkage '1 regression on all possible (J— node) trees with L1 penalty/constraint on coeff’s. MART (with shrinkage) m SVM fiTx) = z ambm(x), Basis fun's Loss Penalty Trick SVM polyn's, RBF, NN class: (1 ~ yF)+ reg= (l’y — Fl — 8)+ 2 Earn kernel A - P({am}) MART any (fast LS) Trees differentiable class: —|0g like reg: Huber M Elaml stage/shrink ”Right—sized" trees for boosting “Boosting": build and prune at each step — NOT GOOD expensive —I— inaccurate “MART”: fixed tree size J at each step Friedman 1999: J : small, but not too small 2 4 — 8' (6) (insensitive) “Right—sized" trees (L) ANOVA expansion "l‘uJ'D-‘M ~H4MQ— AM' M-MmCL-l" 1M4 ,«Lm-lolac-E-i MS F*(X) = 23' 13633:) + 2ch fjkCL‘j, 1130+ $2ch ammo) mam caozo11+1ve) 9,1292 E—(l-Fec-l-s + " ’ ' Interaction order of MART 13'(x) Mgiulialuml m S. min(L — 1,71) :1) I 441233 “stumps" L = 2 => additive (main effects) model . (45:40..th Choose L to match Int—order of F*(x) apmymrx lot-1 Quito/10W MWac-l—IM Usually unknown :> “meta"—parameter for model selection — but usually small empirical: L S 6. TAL1 MOLD Uotluw: MAD/104p lea-l- m-l- 0L4 Xamlloiocl: {M +0 Sanity Mal—(:5 WHY? J controls interaction order of fi(X) J = 2 (“stumps") : additive (main effects) J > 2 permits interactions (to order J — 1) Empirical: 4 S J g 8 (6) DATA MINING Breiman: ”Boosted trees best off-the—shelf classifiers" MART —> regression. Inherited tree advantages: (1) naturally handle mixed variable types and missing values (2) invariant to monotone transformations of :53- (3) immune to bad mj—distributions (4) internal variable subset selection (5) robust to irrelevant inputs. Tree disadvantage - inaccuracy: (1') coarse piecewise—constant approx. (2) instability :> high variance (,QaAcge JAIL-11A) 2v AQaAcH 104461.4di (3) high interaction order approx. (at) anwmfa-El‘m MAR-Ff Boos£ima) (1) piecewise—constant approx. much finer (2) small trees + averaging :> stable (3) interaction order controlled by L. In additiomf CIA/H4 .QOM Mt‘lmiml (4) LAD_MART immune, and (5) M_MART resistant to y—outliers 6003+lmg :) £01129 me’tagtllld‘fl -_____.—- ___——- INTERPRETATION Relative importance of input variables Trees (Breiman et al 1983): jfcr) = 25:11 f3. - 1M = :0 MART: average over trees "2 1 M "2 A A lo/xwtaxii Jj :MZmzle(Tm)3 3;“ I) D M A works better than for single tree J ustification: Fried ma n 1999 Clmi—pllgatliovtf ave/moat W {Ages Wall Claw/ma (Kl Foh. thi‘cl-CQ'EIMJ Cot/v1 QLSO do 10,8 Makfi 9M Each Clo-ma: M Auk/449.42: m5 Ham 8- DOWN.“ clot/JAM 0C0 pah‘l’l-cqlm WI'QL/‘ed M pub/MH—S o-F ummm“ WhiEt—L'LC m/LoS‘ll ‘EeuJaAc/[s fl-QJYQMOL-ELEW N‘ C-[MA chf-Clocm-H'm.‘ IT‘S—514k fi:14ol< Mlmkl% .3 x1 x1 - - KM UMimlalg C1 X C A Lia/Mk z 3.3%. _ Jam. ‘0‘) 7 Mfg-04, ‘2 I . {vi/1 Mix 3 . WPM+MLP ()2) x5 °‘ :3 ' M Mmtima CM CK Com £xm£mz cum—J «swung! _' mks C 1) Aow— m cm mdL—HN aw um 01 04 A . A 3a {a}. H . MORE INTERPRETATION Joint dependence of fi(x) on relevant {583-}1‘] 401015 60512) Visualization most powerful tool ,u-{l 59 E 5 BUT, limited to J S, 3 [Wow/ti N Wlmlo (Q AMBM Plot partial (average) dependence of F(X) on selected small subsets of {mflf Incomplete - but useful if subsets chosen carefully AM(4 black- Ioo>< Mao/(e! (SUMJM/ujefig.) PARTIAL DEPENDENCE FUNCTIONS $¥b¢u§6fi ZZZ-{Zl,---,ZZ} CX, Z\lUZl=X F(X) = F(Zl> Z\z) = NZ; | Z\;) = FZ\;(ZZ) : function of 2; with ”parameters" ZV ”Partial" (average) dependence of F(x) on 2;: 52(21) = Ezv, [F (X)] = f F (X) p\z(Z\z) dzv dL-p-EW'F E Wit/LEA 0-?- ‘75“. a PDEM'Ew—L'A—Q Magi :: PM4—L‘o—Q M. 11‘3— Describes complete dependence if F(X) = Fl(Zl) + F\1(Z\i) F(x) 2 Fl(zl) 'F\1(Z\l) (diagnostics) ER -15 :: LmJAUo- o-i- £2 £0 FLA) (344% aCCoaM‘El/ida NOT equivalent to -EA(QUQW)%C4f ”i. :"-Z\-Q FM) — Ex[F(X) I21]: I 1-‘*“(X)p(X|Zz)dX omMiLchim 5&3?! gflfli/I/l g aw Trees: Fl(zl) m weighted tree traversal MART: average over trees. W: auuagx WM. {Lem Aepma‘keivt “Gm €61ch dam :1) Perich My». dz) F65) (M ii 810-1 LLCA/LC) M. ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern