recitation3 Decision Trees

recitation3 Decision Trees - Decision Trees and Information...

This preview shows pages 1–7. Sign up to view the full content.

Copyright © Andrew W. Moore Slide 1 Decision Trees and Information Gain Slides are modified based on Prof. Andrew Moore’s tutorial at http://www.cs.cmu.edu/~awm/tutorials Slide 2 Entropy and Information Gain Learning an unpruned decision tree recursively Training Set Error Test Set Error Overfitting Information Gain of a real valued input

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Slide 3 Entropy • The term is used in a lot of fields • In information theory, Entropy H(X) of a random variable X is… …the expected number of bits needed to encode a randomly drawn value of X (under most efficient code) Slide 4 Bits You are watching a set of independent random samples of X You see that X has four possible values So you might see: BAACBADCDADDDA… You transmit data over a binary serial link. You can encode each reading with two bits (e.g. A = 00, B = 01, C = 10, D = 11) 0100001001001110110011111100… Average number of bits = 2 P(X=C) = 1/4 P(X=B) = 1/4 P(X=D) = 1/4 P(X=A) = 1/4
Slide 5 Fewer Bits Someone tells you that the probabilities are not equal It’s possible… …to invent a coding for your transmission that only uses fewer bits on average per symbol. Avg bits = 1*1/2+2*1/4+3*1/8+3*1/8 = 1.75 (This is just one of several ways) P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2 111 D 110 C 10 B 0 A Slide 6 Suppose X can have one of m values… V 1, 2, What’s the smallest possible number of bits, on average, per symbol, needed to transmit a stream of symbols drawn from X’s distribution? It’s H(X) = The entropy of X “High Entropy” means X is from a uniform (boring) distribution “Low Entropy” means X is from varied (peaks and valleys) distribution General Case m m p p p p p p X H 2 2 2 2 1 2 1 log log log ) ( = …. P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m = = m j j j p p 1 2 log

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Slide 7 Recap: Entropy Low Entropy means sth. is almost certainly true High Entropy means high uncertainty If X is a binary variable, when is H(X) = 1? when is = 0? Slide 8 Specific Conditional Entropy H(Y|X=v) Suppose I’m trying to predict output Y and I have input X Let’s assume this reflects the true probabilities E.G. From this data we estimate P(LikeG = Yes) = 0.5 P(Major = Math & LikeG = No) = 0.25 P(Major = Math) = 0.5 P(LikeG = Yes | Major = History) = 0 Note: H(X) = 1.5 H(Y) = 1 X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Slide 9 Definition of Specific Conditional Entropy: H(Y | X=v) = The entropy of Y among only those records in which X has value v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X Specific Conditional Entropy H(Y|X=v) Slide 10 Definition of Specific Conditional Entropy: | = The entropy of among only those records in which has value Example: H(Y|X=Math) = 1 H(Y|X=History) = 0 H(Y|X=CS) = X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X Specific Conditional Entropy H(Y|X=v)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Slide 11 Conditional Entropy Definition of Conditional Entropy: H(Y | X) = The average conditional entropy of Y = Σ j Prob(X=v ) H(Y
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 30

recitation3 Decision Trees - Decision Trees and Information...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online