Information Gain (IG) ● Now we want to measure how informative an attribute is with respect to the target class ● How much gain in information it gives us about the value of the target class ● The information we gain by splitting the set on all values of a single attribute ● Parent set : the original set of examples ● Children set : the result of splitting on the attribute values ● The entropy for each child (c i ) is weighted by the proportion of instances belonging to that child, p(c i ) Example:
● Two class problem ( circle = good credit and star = bad credit ● By looking at the figure in previous slide, the children sets seem purer than the parent set
● This split (by Balance) reduces entropy substantially ● The Balance attribute provides a lot of information on the value of the target class: ○ good credit vs. bad credit ● The results of splitting by the Residence variable ● The Residence variable has a positive information gain ● However, it is lower than that of the Balance (which was 0.37) ● Therefore, the Balance variable is more informative than the Residence Why trees? ● Decision trees (DTs), or classification trees, are one of the most popular data mining tools (along with linear and logistic ● They are:
○ Easy to understand ○ Easy to implement ○ Easy to use ○ Computationally cheap ○ Almost all data mining packages include DTs Trees as Sets of Rules ● ▪ The classification tree is equivalent to this rule set ● ▪ Each rule consists of the attribute tests along the path connected with AND ● ▪ IF (Employed = Yes) THEN Class=No Write-off ● ▪ IF (Employed = No) AND (Balance < 50k) THEN ● Class=No Write-off ● ▪ IF (Employed = No) AND (Balance ≥ 50k) AND ● (Age < 45) THEN Class=No Write-off ● ▪ IF (Employed = No) AND (Balance ≥ 50k) AND ● (Age ≥ 45) THEN Class=Write-off
● 2+ = 2 “yes” ● 3- = 3 “no” ● So you should follow the gath with the most pure gains (largest number); follow the most pure path ● If entropy = 0 means “yes” ● The closer entropy is to zero -> IG will be larger ● If close to 1, then it means “yes”; if it’s closer to Visualizing Segmentations – Probabilities circle = Write-off plus = No Write-off
Geometric Interpretation of a Model ● Split over income ● Split overage ● Pattern: ○ IF Balance >= 50K & Age > 45 ○ THEN Default = ‘no’ ○ ELSE Default = ‘yes’ ● Decision tree cannot do this because it’s linear and this would have to use regression
● What alternatives are there to partitioning this way? ● “True” boundary may not be closely ● approximated by a linear boundary! Different Faces of Classification Classification Problem ● Most general case: The target takes on discrete values that are NOT ordered ● Most common: binary classification where the target is either 0 or 1 ● Solutions to Classification ● Classifier model: Model predicts the same set of discrete value as the data had ● Probability estimation: Model predicts a score between 0 and 1 that is meant to be the ● probability of being in that class MegaTelCo
Predicting Churn with Tree Induction
Cornell Notes Topic: Course Class 5 Date Oct 2, 2019 Essential Question: ● [check chapter] ● Read B1 - C34 ● DUE: ASSIGNMENT 1 ON OCTOBER 25 Questions/Cues: ● Notes
Pro-Tip –––––– Highlight what’s important! Recap: Data Mining vs. Use of the Model
You've reached the end of your free preview.
Want to read all 145 pages?
- Fall '19
- Data Mining, Data Opportunities