This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Problem L81: Using the golf data from Figure 2.1 of Ch. 2 of Quinlan’s book, assume that the outlook values of the last two tuples in the dataset are unknown, viz. ? 68 80 false Play ? 70 96 false Play a.) Compute gain (outlook) [See Ch. 3] Solution: Outlook Temp (F) Humidity (%) Windy? Class Sunny 75 70 TRUE Play Sunny 80 90 TRUE Don't Play Sunny 85 85 FALSE Don't Play Sunny 72 95 FALSE Don't Play Sunny 69 70 FALSE Play Overcast 72 90 TRUE Play Overcast 83 78 FALSE Play Overcast 64 65 TRUE Play Overcast 81 75 FALSE Play Rain 71 80 TRUE Don't Play Rain 65 70 TRUE Don't Play Rain 75 80 FALSE Play ? 68 80 FALSE Play ? 70 96 FALSE Play Pivot Table: Count of Outlook Class Outlook Don't Play Play Grand Total Overcast 4 4 Rain 2 1 3 Sunny 3 2 5 Grand Total 5 7 12 The gain of unknown values is calculated as: gain(X) = F * (info(T) – info X (T)) Total number of tuples = 14. Non empty attributes = 12. F = 12/14 C1 = “Don’t Play”, C1 = 5, C2 = “Play”, C2 = 7 Info (T) = 5/12 * log 2 (5/12)7/12 * log 2 (7/12) = 0.9798 T1 = “Overcast”, T1 = 4, T2 = “Rain”, T2 = 3, T3 = “Sunny”, T3 = 5 Info Outlook (T) = Info X (T) = 4/12 * (0/4 * log 2 (0/4) – 4/4 * log 2 (4/4)) + 3/12 * (2/3 * log 2 (2/3) – 1/3 * log 2 (1/3)) + 5/12 * (3/5 * log 2 (3/5) – 2/5 * log 2 (2/5)) = 0.6341 Gain (Outlook) = F * (info (T) – info X (T)) = 12/14 * (0.9798 – 0.6341) = 0.2963 b.) For the case where outlook =” rain,” partition the set further by testing on humidity where humidity <= 75% creates one branch and humidity > 75% is the second branch. [Note: this is different than the test chosen after rain in Figure 22....
View
Full Document
 Fall '09
 Merz
 Overcast

Click to edit the document details