Information Gain IG Now we want to measure how informative an attribute is with

# Information gain ig now we want to measure how

This preview shows page 79 - 92 out of 145 pages.

Information Gain (IG) Now we want to measure how informative an attribute is with respect to the target class How much gain in information it gives us about the value of the target class The information we gain by splitting the set on all values of a single attribute Parent set : the original set of examples Children set : the result of splitting on the attribute values The entropy for each child (c i ) is weighted by the proportion of instances belonging to that child, p(c i ) Example: Two class problem ( circle = good credit and star = bad credit By looking at the figure in previous slide, the children sets seem purer than the parent set This split (by Balance) reduces entropy substantially The Balance attribute provides a lot of information on the value of the target class: good credit vs. bad credit The results of splitting by the Residence variable The Residence variable has a positive information gain However, it is lower than that of the Balance (which was 0.37) Therefore, the Balance variable is more informative than the Residence Why trees? Decision trees (DTs), or classification trees, are one of the most popular data mining tools (along with linear and logistic They are: Easy to understand Easy to implement Easy to use Computationally cheap Almost all data mining packages include DTs Trees as Sets of Rules ▪ The classification tree is equivalent to this rule set ▪ Each rule consists of the attribute tests along the path connected with AND ▪ IF (Employed = Yes) THEN Class=No Write-off ▪ IF (Employed = No) AND (Balance < 50k) THEN Class=No Write-off ▪ IF (Employed = No) AND (Balance ≥ 50k) AND (Age < 45) THEN Class=No Write-off ▪ IF (Employed = No) AND (Balance ≥ 50k) AND (Age ≥ 45) THEN Class=Write-off  2+ = 2 “yes” 3- = 3 “no” So you should follow the gath with the most pure gains (largest number); follow the most pure path If entropy = 0 means “yes” The closer entropy is to zero -> IG will be larger If close to 1, then it means “yes”; if it’s closer to Visualizing Segmentations – Probabilities circle = Write-off plus = No Write-off Geometric Interpretation of a Model Split over income Split overage Pattern: IF Balance >= 50K & Age > 45 THEN Default = ‘no’ ELSE Default = ‘yes’ Decision tree cannot do this because it’s linear and this would have to use regression What alternatives are there to partitioning this way? “True” boundary may not be closely approximated by a linear boundary! Different Faces of Classification Classification Problem Most general case: The target takes on discrete values that are NOT ordered Most common: binary classification where the target is either 0 or 1 Solutions to Classification Classifier model: Model predicts the same set of discrete value as the data had Probability estimation: Model predicts a score between 0 and 1 that is meant to be the probability of being in that class MegaTelCo Predicting Churn with Tree Induction  Cornell Notes Topic: Course Class 5 Date Oct 2, 2019 Essential Question: [check chapter] Read B1 - C34 DUE: ASSIGNMENT 1 ON OCTOBER 25 Questions/Cues: Notes Pro-Tip –––––– Highlight what’s important! Recap: Data Mining vs. Use of the Model Summary  #### You've reached the end of your free preview.

Want to read all 145 pages?

• • • 