X 1 X 1 a X 1 a Turning to the X 1 a partition partitioning now on X 2 gives us

X 1 x 1 a x 1 a turning to the x 1 a partition

This preview shows page 45 - 56 out of 57 pages.

X 1 X 1 >= a X 1 < a Turning to the X 1 a partition, partitioning now on X 2 gives us the greatest information gain: X 1 X 2 X 2 X 1 X 2 b COS424/SML 302 Classification methods February 20, 2019 45 / 57
Image of page 45

Subscribe to view the full document.

Decision tree: illustration X 1 X 2 X 1 X 2 X 2 X 1 X 1 >= a b X 1 < a X 2 X 2 >= b X 2 < b We can build the second node to partition the data in the tree, and stop there because we have no uncertainty in any leaf nodes: X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 1 >= a a b X 1 < a X 2 X 2 >= b X 2 < b COS424/SML 302 Classification methods February 20, 2019 46 / 57
Image of page 46
Example: predicting gender from height, siblings 0 2 4 6 60 65 70 75 80 85 Height (inches) Siblings as.factor(sex) 0 1 COS424/SML 302 Classification methods February 20, 2019 47 / 57
Image of page 47

Subscribe to view the full document.

Example: Decision tree 0 1 2 3 4 5 7 57 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 77 84 Sampled height (inches) Siblings 0.00 0.25 0.50 0.75 1.00 round(sex) factor(round 0 1 accuracy = 0 . 89 COS424/SML 302 Classification methods February 20, 2019 48 / 57
Image of page 48
Example: Decision tree height p < 0.001 1 66 > 66 height p = 0.004 2 64 > 64 Node 3 (n = 42) 1 0 0 0.2 0.4 0.6 0.8 1 Node 4 (n = 32) 1 0 0 0.2 0.4 0.6 0.8 1 height p = 0.001 5 69 > 69 Node 6 (n = 61) 1 0 0 0.2 0.4 0.6 0.8 1 Node 7 (n = 109) 1 0 0 0.2 0.4 0.6 0.8 1 Where is siblings ? COS424/SML 302 Classification methods February 20, 2019 49 / 57
Image of page 49

Subscribe to view the full document.

Decision trees: summary For K number of classes, n training samples, p features: Classifier NB KNN SVMs DT Model based? Y N N N Classifier type? gen cluster discr discr Kernelizeable? N Y Y N Additive? Y non-linear dist non-linear kern N Parameters? N K c N Multiclass? Y Y N Y Interpretable? Y N linear kern Y Missing data? Y N N N Training? O ( np ) None O ( n 2 ) O ( np 2 ) Test? O ( Kp ) O ( np ) O ( | SVs | ) O (log p ) COS424/SML 302 Classification methods February 20, 2019 50 / 57
Image of page 50
Random forests: Ensemble learning Random forests build K decision trees; during classification, each tree gets one vote on the class label (predicted class label is the mode). But each tree is built a bit differently than the DT method Each tree is fit to a subset of the samples sampled with replacement Each split in a tree is selected from a random subset of the features Now every tree has few guarantees, and they are all different. Why is this a good idea? Decision trees may overfit training data Multiple weak hypotheses combine to create a strong hypothesis Linear classifiers may not be appropriate COS424/SML 302 Classification methods February 20, 2019 51 / 57
Image of page 51

Subscribe to view the full document.

Example: predicting gender from height, siblings 0 2 4 6 60 65 70 75 80 85 Height (inches) Siblings as.factor(sex) 0 1 COS424/SML 302 Classification methods February 20, 2019 52 / 57
Image of page 52