This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Sec. 2.5] Costs of misclassification 13 2. An implicit or explicit criterion for separating the classes: we may think of an un derlying input/output relation that uses observed attributes to distinguish a random individual from each class. 3. The cost associated with making a wrong classification. Most techniques implicitly confound components and, for example, produce a classifi cation rule that is derived conditional on a particular prior distribution and cannot easily be adapted to a change in class frequency. However, in theory each of these components may be individually studied and then the results formally combined into a classification rule. We shall describe this development below. 2.5.1 Prior probabilities and the Default rule We need to introduce some notation. Let the classes be denoted , and let the prior probability for the class be: It is always possible to use the nodata rule: classify any new observation as class , irrespective of the attributes of the example . This nodata or default rule may even be adopted in practice if the cost of gathering the data is too high. Thus, banks may give credit to all their established customers for the sake of good customer relations: here the cost of gathering the data is the risk of losing customers. The default rule relies only on knowledge of the prior probabilities, and clearly the decision rule that has the greatest chance of success is to allocate every new observation to the most frequent class. However, if some classification errors are more serious than others we adopt the minimum risk (least expected cost) rule, and the class is that with the least expected cost (see below). 2.5.2 Separating classes Suppose we are able to observe data on an individual, and that we know the probability distribution of within each class to be . Then for any two classes the likelihood ratio provides the theoretical optimal form for discriminating the classes on the basis of data . The majority of techniques featured in this book can be thought of as implicitly or explicitly deriving an approximate form for this likelihood ratio. 2.5.3 Misclassification costs Suppose the cost of misclassifying a class object as class is . Decisions should be based on the principle that the total cost of misclassifications should be minimised: for a new observation this means minimising the expected cost of misclassification. Let us first consider the expected cost of applying the default decision rule: allocate all new observations to the class , using suffix as label for the decision class. When decision is made for all new examples, a cost of is incurred for class examples and these occur with probability . So the expected cost of making decision is: The Bayes minimum cost rule chooses that class that has the lowest expected cost. To see the relation between the minimum error and minimum cost rules, suppose the cost of 14 Classification [Ch. 2 misclassifications to be the same for all errors and zero when a class is correctly identified,...
View
Full Document
 Spring '11
 nevermind
 Conditional Probability, Normal Distribution, probability density function, Likelihood function

Click to edit the document details