Information Gain IG Now we want to measure how informative an attribute is with

Information gain ig now we want to measure how

This preview shows page 79 - 92 out of 145 pages.

Information Gain (IG) Now we want to measure how informative an attribute is with respect to the target class How much gain in information it gives us about the value of the target class The information we gain by splitting the set on all values of a single attribute Parent set : the original set of examples Children set : the result of splitting on the attribute values The entropy for each child (c i ) is weighted by the proportion of instances belonging to that child, p(c i ) Example:
Image of page 79
Two class problem ( circle = good credit and star = bad credit By looking at the figure in previous slide, the children sets seem purer than the parent set
Image of page 80
This split (by Balance) reduces entropy substantially The Balance attribute provides a lot of information on the value of the target class: good credit vs. bad credit The results of splitting by the Residence variable The Residence variable has a positive information gain However, it is lower than that of the Balance (which was 0.37) Therefore, the Balance variable is more informative than the Residence Why trees? Decision trees (DTs), or classification trees, are one of the most popular data mining tools (along with linear and logistic They are:
Image of page 81
Easy to understand Easy to implement Easy to use Computationally cheap Almost all data mining packages include DTs Trees as Sets of Rules ▪ The classification tree is equivalent to this rule set ▪ Each rule consists of the attribute tests along the path connected with AND ▪ IF (Employed = Yes) THEN Class=No Write-off ▪ IF (Employed = No) AND (Balance < 50k) THEN Class=No Write-off ▪ IF (Employed = No) AND (Balance ≥ 50k) AND (Age < 45) THEN Class=No Write-off ▪ IF (Employed = No) AND (Balance ≥ 50k) AND (Age ≥ 45) THEN Class=Write-off
Image of page 82
Image of page 83
2+ = 2 “yes” 3- = 3 “no” So you should follow the gath with the most pure gains (largest number); follow the most pure path If entropy = 0 means “yes” The closer entropy is to zero -> IG will be larger If close to 1, then it means “yes”; if it’s closer to Visualizing Segmentations – Probabilities circle = Write-off plus = No Write-off
Image of page 84
Geometric Interpretation of a Model Split over income Split overage Pattern: IF Balance >= 50K & Age > 45 THEN Default = ‘no’ ELSE Default = ‘yes’ Decision tree cannot do this because it’s linear and this would have to use regression
Image of page 85
What alternatives are there to partitioning this way? “True” boundary may not be closely approximated by a linear boundary! Different Faces of Classification Classification Problem Most general case: The target takes on discrete values that are NOT ordered Most common: binary classification where the target is either 0 or 1 Solutions to Classification Classifier model: Model predicts the same set of discrete value as the data had Probability estimation: Model predicts a score between 0 and 1 that is meant to be the probability of being in that class MegaTelCo
Image of page 86
Predicting Churn with Tree Induction
Image of page 87
Image of page 88
Cornell Notes Topic: Course Class 5 Date Oct 2, 2019 Essential Question: [check chapter] Read B1 - C34 DUE: ASSIGNMENT 1 ON OCTOBER 25 Questions/Cues: Notes
Image of page 89
Pro-Tip –––––– Highlight what’s important! Recap: Data Mining vs. Use of the Model
Image of page 90
Summary
Image of page 91
Image of page 92

You've reached the end of your free preview.

Want to read all 145 pages?

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture