This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Lecture 2: Classifcation. Perceptron. Sigmoid classifers. • Classifcation problems. Error Functions • Perceptron • Sigmoid classifers September 10, 2007 1 COMP652 Lecture 2 Classifcation • Given a data set D ⊂ X × Y where Y is a discrete set (usually with a smallish number oF values), fnd a hypothesis h ∈ H which predicts “well” the existing data • IF Y has two possible values, e.g. Y = { 1 , 1 } or Y = { , 1 } , this is called binary classifcation. • Can we develop methods For classifcation as we did For regression? • What does it take to develop a learning algorithm? September 10, 2007 2 COMP652 Lecture 2 Recall: Three decisions • What should be the error function? • What should be the hypothesis class? • How are we going to Fnd the best hypothesis in the class (the one that minimizes the error function)? September 10, 2007 3 COMP652 Lecture 2 Error functions for binary classiFcation • One worthy goal is to minimize the number of misclassified examples • Suppose Y = { 1 , 1 } and the hypotheses h w ∈ H also output a +1 or 1 • An example x , y is misclassiFed if yh w ( x ) is negative. • So a reasonable error function is just counting the number of examples correctly classiFed: J ( w ) = X i ∈ MisclassiFed y i h w ( x i ) This is called 01 loss • This function is not differentiable, so often we will still use the meansquared error. September 10, 2007 4 COMP652 Lecture 2 Choosing the hypothesis class • For regression, we used linear hypotheses (simple, nice) • Is there an analogue for classi¡cation? • What about linear hypotheses? September 10, 2007 5 COMP652 Lecture 2 Example: Wisconsin data 10 15 20 25 30 0.2 0.4 0.6 0.8 1 tumor size (mm?) non ! recurring (0) / recurring (1) What is the meaning of the output in this case? September 10, 2007 6 COMP652 Lecture 2 Output of a classiFer • Useful predictions could be: – The predicted class – The probability that the example belongs to a given class • Just applying linear regression as is gives us neither September 10, 2007 7 COMP652 Lecture 2 Perceptron w 1 w 2 w n w x 1 x 2 x n x =1 . . . ! ! w i x i n i =0 1 if > 01 otherwise { o = ! w i x i n i =0 • We can take a linear combination and threshold it: h w ( x ) = sgn ( w T x ) = 8 < : +1 if w T x > 1 otherwise This is called a perceptron ....
View
Full
Document
This note was uploaded on 09/04/2008 for the course COMP 652 taught by Professor Preicup during the Fall '07 term at McGill.
 Fall '07
 PREICUP

Click to edit the document details