{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

BayesRiskNotes

# BayesRiskNotes - So far we have assumed that the penalty of...

This preview shows pages 1–4. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: So far we have assumed that the penalty of misclassifying a class 031 example as class 032 is the same as the reciprocal. In general, this is not the case: o For example, misclassifying a cancer sufferer as a healthy patient is a much more serious problem than the other way around This concept can be formalized in terms of a cost function C”- 0 CU- represents the cost of choosing class mi when class 0)] is the true class We define the Bayes Risk as the expected value of the cost 2 2 2 2 93=E[C]=ZZCij-P[choose mi and XEwJJZZZCii-P[X6Ri |wj]-P[wj] i=1 I=1 i=1 I=1 What is the decision rule that minimizes the Bayes Risk? 0 First notice that P[XE PIi m]: [P(x|wj)dx o We can express the Bayes Risk as m= [[011-PIwII-PIxIwII—CQ-PIng-P(xIw2)ldx+ R1 [[021 -P[w1]-P(x I w1)+ c22 -P[w2] - P(x I w2)]dx R: o Then we note that, for either likelihood, one can write: [P(x|wi)dx+ [P(x|wi)dx= IP(x|wi)dx=1 14R: o Merging the last equation Into the Bayes Risk expression yields SR = c11P[w1][P(x |w1)dx :+ C12P[w2]JP(x I (.02 mp R- R1 —— CaPIwII [PIX I an )dx R2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __‘ I — szlezi [PIX I wz )dx:+ Rn ___:‘______ +c22PIw21lF=Ix|w2Idx ___________________________________ —021PIwII_|'PIx I wIIdx R1 o Now we cancel out all the integrals over R2 3" ______ __ —i(012 -C2;)P[w2] l Plx I ngdxi— (C21 :CIIIPIwII i m IwI Idx I R‘ I o The first two terms are constant as far as our minimization is concerned since they do not depend on R1, so we will be seeking a decision region R1 that minimizes: R1 = arg min! [[{012 _ 022)P[w2]P{X| E-":'2}_(C21_ CI1)P[U‘J1]P{X I ('01 )ldx . I] = argminJI Ig(x)dx >- . Let’s forget about the actual expression of g(x) to develop some intuition for what kind of decision region R1 we are looking for o lntuitively, we will select for R1 those regions that minimize the integral [g(x)dx I In other words, those regions where glx)<0 R1 QlX) R1=R1AU R1BU Ru: o 80 we will choose R1 such that (C21— C11)P[w1]P(X | L01):1(C12 —CZ2 )P[w2]P(x | 0.12) o And rearranging P{X|UJ1}>1(C12 — C22) P[w2] P(x|w2) < {021—011} P[w1] Li1'2 o Therefore, minimization of the Bayes Risk also leads to a Likelihood Ratio Test I Consider a classification problem with two classes 02 defined by the following likelihood functions 0.13— 1 _1_><2 0.15. P x w = e ._e 23 ( I 1) Navy's 0.14. 1 -l(-21 1:, ' K i P(x|w2): I,_e 2X n.1— '\-"2'|T : ' a Sketch the two densities c What is the likelihood ratio? . Assume P[031]=P[032]=0.5, c..1=022=0_. (312:1 and 021:3“? “4' Determine a decision rule that minimizes the probability of 0.02- 0.05 - error 1 _1x‘ 0 e —3 w- 5 Am 2 -"2"~"'3 3 1 1 Ir 1 '5‘lX—2i2 <3 V3 0.2 i' 6 W2 @211 0.1a- —15 w' 0.15- 2 3 e 3' 1 0.14. 1 '3 . e—th—zr 012- M 0.1- 1 X2 1 3" 0.08- — —— + — (x — 2)2 0 3 at 0.05. “’3 0.04- L*31 > 0.02- 2x2 —12x+12 0:» x =4.73,1.27 Lug X . The LRT decision rule that minimizes the Bayes Risk is commonly called the Bayes Criterion 3" (C12 _ C:22 ) Fiﬁ-“2] Ba es criterion 21 ‘C11] P[UJ1] y - Many times we will simply be interested in minimizing the probability of error, which is a special case of the Bayes Criterion that uses the so-called symmetrical or zero-one cost function. This version ofthe LRT decision rule is referred to as the Maximum A Posteriori Criterion, since it seeks to maximize the posterior P(m,|x) £01 C) le1|XJ 3-‘1 Maximum A Posteriori w (MAP) Criterion . Finally, for the case of equal priors lei1=1i2, and the zero-one cost function the LTR decision rule is called the Maximum Likelihood Criterion, since it will minimize the likelihood P(x|o}i) wt )2 Ptxlwil >1 Maximum Likelihood (ML) Criterion Luz . Two more decision rules are commonly cited in the related literature o The Neyman-Pearson Criterion, used in Detection and Estimation Theory, which also leads to an LRT decision rule, fixes one class error probabilities, say s1<0L, and seeks to minimize the other I For instance, for the sea-basslsalmon classification problem of Lecture 1, there may be some kind of government regulation that we must not misclassify more than 1% of salmon as sea bass The Neyman-Pearson Criterion is very attractive since it does not require knowledge of priors and cost function The Minimax Criterion, used in Game Theory. is derived from the Bayes criterion, and seeks to Mmize the maximum Bayes Risk The Minimax Criterion does nor require knowledge ofthe priors, but it needs a cost function For more information on these methods, the reader is referred to “Detection, Estimation and Modulation Theory”, by H.L. van Trees, the classical reference in this ﬁeld I The decision rule that minimizes P[error] generalizes very easily to multi-class problems o For clarity in the derivation, the probability of error is better expressed in terms of the probability of making a correct assignment P[error] =1— P[correct] - The probability of making a correct assignment is c . P[correct] = Z PMi ) | P(x |wi)dx i=1 El o The problem of minimizing P[error] is equivalent to that of maximizing P[correct]. Expressing P[correct] in terms of the posteriors: i=1|:ai I‘IR P[correct] = immjmx max = i J'P(x pt )P(wi)dx = i [PM |xP(x)dx o In order to maximize P[correct], we will have to maximize each of the integrals '3'. In turn, each integral 3 will be maximized by choosing the class (oi that yields the maximum P[(oi|X] :> we will define Ri to be the region where P[coi|x] is maximum Probability I Therefore, the decision rule that minimizes P[error] is the MAP Criterion I To determine which decision rule yields the minimum Bayes Risk for the multi-class problem we will use a slightly different formulation 0 We will denote by oi the decision to choose class mi, 0 We will denote by a(x) the overall decision rule that maps features x into classes col: (“maﬁa1 . a2. go} I The (conditional) risk 93(uilx) of assigning a feature x to class (oi is slam —> oi]: Miami | x]: ion.ij | x) I And the Bayes Risk associated with the decision Jrifle 0L(x) is sn(c:(x))= [9i(u(x)|x}P(x)dx I In order to minimize this expression,we will have to minimize the conditional risk 92(o,(x)|x) at each point x in the feature space, which in turn is equivalent to choosing (oi such that 91(0ti|x) is minimum ‘ Risk i b'rd . I I I I I ...
View Full Document

{[ snackBarMessage ]}