e how much information do we need to tell which branch an instance belongs to

E how much information do we need to tell which

This preview shows page 28 - 35 out of 59 pages.

i.e. how much information do we need to tell which branch an instance belongs to Example: Intrinsic information of Day attribute: Observation: Attributes with higher intrinsic information are less useful 1 14 IntI Day = 14 × − log 1 14 = 3.807 IntI S , A =− i S i S log S i S 2 8 © J. Fürnkranz
Image of page 28

Subscribe to view the full document.

Gain Ratio 2 9 © J. Fürnkranz modification of the information gain that reduces its bias towards multi-valued attributes takes number and size of branches into account when choosing an attribute corrects the information gain by taking the intrinsic information of a split into account Definition of Gain Ratio: Example: Gain Ratio of Day attribute GR S , A = Gain S , A IntI S , A 3,807 GR Day = 0.940 = 0.246
Image of page 29
Gain ratios for weather data 3 0 © J. Fürnkranz Outlook Temperature Info: 0.693 Info: 0.911 Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029 Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.557 Gain ratio: 0.247/1.577 0.157 Gain ratio: 0.029/1.557 0.019 Humidity Windy Info: 0.788 Info: 0.892 Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048 Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985 Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049 Day attribute would still win... one has to be careful which attributes to add... Nevertheless: Gain ratio is more reliable than Information Gain
Image of page 30

Subscribe to view the full document.

GiniIndex Many alternative measures to Information Gain Most popular altermative: Gini index used in e.g., in CART (Classification And Regression Trees) impurity measure (instead of entropy) average Gini index (instead of average entropy / information) Gini Gain could be defined analogously to information gain but typically avg. Gini index is minimized instead of maximizing Gini gain Gini S = 1 i p 2 i Gini S , A = i S i S 3 1 © J. Fürnkranz i Gini S
Image of page 31
Numeric attributes 3 2 © J. Fürnkranz Standard method: binary splits E.g. temp < 45 Unlike nominal attributes, every attribute has many possible split points Solution is straightforward extension: Evaluate info gain (or other measure) for every possible split point of attribute Choose “best” split point Info gain for best split point is info gain for attribute Computationally more demanding
Image of page 32

Subscribe to view the full document.

Example 3 3 © J. Fürnkranz Assume a numerical attribute for Temperature First step: Sort all examples according to the value of this attribute Could look like this: One split between each pair of values E.g. Temperature < 71.5: yes/4, no/2 Temperature ≥ 71.5: yes/5, no/3 Split points can be placed between values or directly at values 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No 14 6 8 I Temperature @ 71.5 = 14 E Temperature 71.5  E Temperature 71.5 = 0.939
Image of page 33
Binary vs. Multiway Splits 3 4 © J. Fürnkranz Splitting (multi-way) on a nominal attribute exhausts all information in that attribute Nominal attribute is tested (at most) once on any path in the tree Not so for binary splits on numeric attributes!
Image of page 34

Subscribe to view the full document.

Image of page 35
  • Fall '17
  • Mark Isbel

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask 0 bonus questions You can ask 0 questions (0 expire soon) You can ask 0 questions (will expire )
Answers in as fast as 15 minutes