lecture6

lecture6 - CSE 6740 Lecture 6 How Do I Predict a Discrete...

This preview shows pages 1–11. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 6740 Lecture 6 How Do I Predict a Discrete Variable? (Classification) Alexander Gray agra[email protected] Georgia Institute of Technology CSE 6740 Lecture 6 – p. 1/3 9 Today 1. Classification (How can I predict a discrete variable?) 2. Generative classification (What if I reduce classification to density estimation?) 3. Discriminative classification (Can I do classification while avoiding density estimation?) CSE 6740 Lecture 6 – p. 2/3 9 Classification How can I predict a discrete variable? CSE 6740 Lecture 6 – p. 3/3 9 Classification CSE 6740 Lecture 6 – p. 4/3 9 Classification Loss The most common loss function for classification is zero-one loss : L ( Y, hatwide c ( X )) = I ( Y negationslash = hatwide c ( X )) . (1) We can generalize this to specify arbitrary costs for misclassifying one class as another: L ( Y, hatwide c ( X )) = C ab (2) where C is a K × K matrix, a = Y , and b = hatwide c ( X ) . CSE 6740 Lecture 6 – p. 5/3 9 Classification Loss The test error, or expected loss, called the error rate in this case, is E = E [ L ( Y, hatwide c ( X ))] (3) = E X,Y [ L ( Y, hatwide c ( X ))] (4) = E X E Y | X [ L ( Y, hatwide c ( X ))] (5) or, for a given x , E ( x ) = E Y | X [ L ( Y, hatwide c ( x ))] . (6) CSE 6740 Lecture 6 – p. 6/3 9 Discriminant Function Suppose Y = { , 1 } . The value hatwide f ∗ ( x ) that minimizes E ( x ) is the regression function, which we now call the discriminant function : g ( x ) = E ( Y | X = x ) (7) = integraldisplay yf ( y | x ) dy (8) = 1 · P ( Y = 1 | X = x ) + 0 · P ( Y = 0 | X = x ) (9) = P ( Y = 1 | X = x ) (10) = f ( x | Y = 1) P ( Y = 1) f ( x | Y = 1) P ( Y = 1) + f ( x | Y = 0) P ( Y = 0) (11) = π 1 f 1 ( x ) π 1 f 1 ( x ) + π f ( x ) . (12) CSE 6740 Lecture 6 – p. 7/3 9 Bayes Classifier Making this yield a binary prediction gives the Bayes classifier , or Bayes rule : c ( x ) = braceleftBigg 1 if g ( x ) > 1 / 2 otherwise (13) = braceleftBigg 1 if P ( Y = 1 | X = x ) > P ( Y = 0 | X = x ) otherwise (14) = braceleftBigg 1 if π 1 f 1 ( x ) > π f ( x ) otherwise. (15) This is easily generalized to any number of classes K . CSE 6740 Lecture 6 – p. 8/3 9 Optimal Classification Keep in mind that this is “Bayes” only in the sense of conditional distributions, not in the sense of Bayesian inference. The Bayes classifier is optimal, i.e. if c ′ ( x ) is any other classification rule then E [ L ( Y,c ′ ( X ))] > E [ L ( Y,c ( X ))] . CSE 6740 Lecture 6 – p. 9/3 9 Generative vs. Discriminative The set { x : P ( Y = 1 | X = x ) = P ( Y = 0 | X = x ) } (16) is called the decision boundary . In a generative classifier, we’ll model the class-conditional densities f k ( x ) explicitly. This means we’ll be doing two separate density estimates ....
View Full Document

{[ snackBarMessage ]}

Page1 / 39

lecture6 - CSE 6740 Lecture 6 How Do I Predict a Discrete...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online