This preview shows pages 1–5. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Lecture 2: Classifcation. Perceptron. Sigmoid classifers. Classifcation problems. Error Functions Perceptron Sigmoid classifers September 10, 2007 1 COMP652 Lecture 2 Classifcation Given a data set D X Y where Y is a discrete set (usually with a smallish number oF values), fnd a hypothesis h H which predicts well the existing data IF Y has two possible values, e.g. Y = { 1 , 1 } or Y = { , 1 } , this is called binary classifcation. Can we develop methods For classifcation as we did For regression? What does it take to develop a learning algorithm? September 10, 2007 2 COMP652 Lecture 2 Recall: Three decisions What should be the error function? What should be the hypothesis class? How are we going to Fnd the best hypothesis in the class (the one that minimizes the error function)? September 10, 2007 3 COMP652 Lecture 2 Error functions for binary classiFcation One worthy goal is to minimize the number of misclassified examples Suppose Y = { 1 , 1 } and the hypotheses h w H also output a +1 or 1 An example x , y is misclassiFed if yh w ( x ) is negative. So a reasonable error function is just counting the number of examples correctly classiFed: J ( w ) = X i MisclassiFed y i h w ( x i ) This is called 01 loss This function is not differentiable, so often we will still use the meansquared error. September 10, 2007 4 COMP652 Lecture 2 Choosing the hypothesis class For regression, we used linear hypotheses (simple, nice) Is there an analogue for classication? What about linear hypotheses? September 10, 2007 5 COMP652 Lecture 2 Example: Wisconsin data 10 15 20 25 30 0.2 0.4 0.6 0.8 1 tumor size (mm?) non ! recurring (0) / recurring (1) What is the meaning of the output in this case? September 10, 2007 6 COMP652 Lecture 2 Output of a classiFer Useful predictions could be: The predicted class The probability that the example belongs to a given class Just applying linear regression as is gives us neither September 10, 2007 7 COMP652 Lecture 2 Perceptron w 1 w 2 w n w x 1 x 2 x n x =1 . . . ! ! w i x i n i =0 1 if > 01 otherwise { o = ! w i x i n i =0 We can take a linear combination and threshold it: h w ( x ) = sgn ( w T x ) = 8 < : +1 if w T x > 1 otherwise This is called a perceptron ....
View Full
Document
 Fall '07
 PREICUP

Click to edit the document details