09-Decision_Trees - 10/2/2009 Compacting Instances:...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 10/2/2009 Compacting Instances: Creating models Decision Trees 1 2 3 4 Food Chat Speedy ( ) (3) ( ) (2) ( ) (2) great yes yes great no yes mediocre yes no great yes yes Price ( ) (2) adequate adequate high adequate Bar BigTip ( ) (2) no yes no yes no no yes yes Decision Tree Example: BigTip great Speedy yes yes yes no Price adequate Food mediocre no no yikes TopDown Induction of DT (simplified) Training Data: D {(x , y ),, (x , y )} 1 1 n n TDIDF(D,cdef) IF(all examples in D have same class c) ELSE IF(no attributes left to test) ELSE IF(no attributes left to test) Return leaf with class c (or class cdef, if D is empty) Return leaf with class c of majority in D high no ELSE 1 2 3 4 Food Chat Speedy (3) (2) (2) great yes yes great no yes mediocre yes no great yes yes Price (2) adequate adequate high adequate Bar BigTip (2) no yes no yes no no yes yes Pick A as the "best" decision attribute for next node FOR each value vi of A create a new descendent of node RETURN tree with A as root and ti as subtrees D i {(x, y) D : attribute A of x has value v i } Subtree ti for vi is TDIDT(Di,cdef) Example: Text Classification Task: Learn rule that classifies Reuters Business News Class +: "Corporate Acquisitions" Class : Other articles 2000 training instances Should we wait? Representation: Boolean attributes, indicating presence of a keyword in article 9947 9947 such keywords (more accurately, word "stems") hk d ( t l d" t ") LAROCHE STARTS BID FOR NECO SHARES Investor David F. La Roche of North Kingstown, R.I., said he is offering to purchase 170,000 common shares of NECO Enterprises Inc at 26 dlrs each. He said the successful completion of the offer, plus shares he already owns, would give him 50.5 pct of NECO's 962,016 common shares. La Roche said he may buy more, and possible all NECO shares. He said the offer and withdrawal rights will expire at 1630 EST/2130 gmt, March 30, 1987. + SALANT CORP 1ST QTR FEB 28 NET Oper shr profit seven cts vs loss 12 cts. Oper net profit 216,000 vs loss 401,000. Sales 21.4 mln vs 24.9 mln. NOTE: Current year net excludes 142,000 dlr tax credit. Company operating in Chapter 11 bankruptcy. 1 10/2/2009 Maximum Separation Example: TDIDT TDIDF(D,cdef) Training Data D: IF(all examples in D have same class c) Return leaf with class c (or cdef, if D=) ELSE IF(no attributes left to test) Return leaf with class c of majority in D ELSE A "best" decision attribute for node FOR each value vi of A create a new descendent of node D i {(x, y) D : attrib. A of x has val. v i } Subtree ti for vi is TDIDT(Di,cdef) RETURN tree with A as root and ti as subtrees Which is the best decision variable? A=F, B=S, C=P TDIDT Example Picking the Best Attribute to Split Ockham's Razor: All other things being equal, choose the simplest explanation Decision Tree Induction: Find the smallest tree that classifies the training data correctly Problem Finding the smallest tree is computationally hard Approach Use heuristic search (greedy search) Maximum information Information in a set of choices E.g. Information in a flip of a fair coin Information in an unfair (99:1) coin: I(1/100, 99/100) = 0.08 Maximum information After classification by attribute A Information Gain by attribute A Information in full classification of (p,n) samples 2 10/2/2009 Learning curve Use cross validation Continuous variables? Look for optimal split point. Spurious attributes? Cross validation Which Attribute is "Best"? Heuristics Pick split that decreases training error the most Pick split that maximizes information (Information Gain) Other statistical tests Which Attribute is "Best"? ... Decision Tree for "Corporate Acq." vs = 1: vs = 0: | export = 1: | export = 0: | | rate = 1: | | | stake = 1: + | | | stake = 0: | | | | debenture = 1: + | | | | debenture = 1: + | | | | debenture = 0: | | | | | takeover = 1: + | | | | | takeover = 0: | | | | | | file = 0: | | | | | | file = 1: | | | | | | | share = 1: + | | | | | | | share = 0: ... and many more Total size of tree: 299 nodes Information Gain Idea: Measure how much information an attribute conveys Idea: Measure how much information an attribute conveys Entropy: Number of bits to transmit one label (~disorder) (n= fract. neg examples in D / p= fract. pos examples in D) Information Gain: Reduction in entropy, if attribute value known Note: word stems expanded for improved readability. 3 10/2/2009 How Expressive are Decision Trees? What functions h: X Y can a decision tree represent? Assume that X is finite (only finite number of instances) D i i t Decision trees can represent any function over a t f ti finite instance space X. What if X is not finite (e.g. integervalued attributes)? What if X is not discrete (e.g. realvalued attributes)? What if the data contains noise? In the most extreme case, examples can have the same attribute values, but different labels. TDIDT Extensions Numerical (continuous) attributes Use > and < in attribute tests < 40 young age 40 ancient Finite attributes with many values Example: Target concept is "brakes defect" I t Instances: all cars in the US ll i th US Attributes: Manufacturer (3 values), VIN (100.000.000 values) Which attribute will Information Gain select? GainRatio Numerical (continuous) target attribute (regression) E.g. pick attribute test so that target values become more similar E.g. predict mean value of examples in each leaf Early stopping and Pruning 4 ...
View Full Document

This note was uploaded on 05/30/2010 for the course CS 4700 taught by Professor Joachims during the Fall '07 term at Cornell University (Engineering School).

Ask a homework question - tutors are online