This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: So far we have assumed that the penalty of misclassifying a class 031 example
as class 032 is the same as the reciprocal. In general, this is not the case: o For example, misclassifying a cancer sufferer as a healthy patient is a much more serious
problem than the other way around This concept can be formalized in terms of a cost function C” 0 CU represents the cost of choosing class mi when class 0)] is the true class We define the Bayes Risk as the expected value of the cost 2 2 2 2
93=E[C]=ZZCijP[choose mi and XEwJJZZZCiiP[X6Ri wj]P[wj] i=1 I=1 i=1 I=1 What is the decision rule that minimizes the Bayes Risk?
0 First notice that P[XE PIi m]: [P(xwj)dx o We can express the Bayes Risk as
m= [[011PIwIIPIxIwII—CQPIngP(xIw2)ldx+ R1 [[021 P[w1]P(x I w1)+ c22 P[w2]  P(x I w2)]dx R:
o Then we note that, for either likelihood, one can write: [P(xwi)dx+ [P(xwi)dx= IP(xwi)dx=1 14R: o Merging the last equation Into the Bayes Risk expression yields SR = c11P[w1][P(x w1)dx :+ C12P[w2]JP(x I (.02 mp
R R1 —— CaPIwII [PIX I an )dx
R2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __‘
I — szlezi [PIX I wz )dx:+
Rn ___:‘______ +c22PIw21lF=Ixw2Idx
___________________________________ —021PIwII_'PIx I wIIdx R1
o Now we cancel out all the integrals over R2 3" ______ __
—i(012 C2;)P[w2] l Plx I ngdxi— (C21 :CIIIPIwII i m IwI Idx I
R‘ I o The first two terms are constant as far as our minimization is concerned since they do not
depend on R1, so we will be seeking a decision region R1 that minimizes: R1 = arg min! [[{012 _ 022)P[w2]P{X E":'2}_(C21_ CI1)P[U‘J1]P{X I ('01 )ldx . I] = argminJI Ig(x)dx > . Let’s forget about the actual expression of g(x) to develop some intuition for
what kind of decision region R1 we are looking for o lntuitively, we will select for R1 those regions that minimize the integral [g(x)dx
I In other words, those regions where glx)<0 R1 QlX) R1=R1AU R1BU Ru: o 80 we will choose R1 such that (C21— C11)P[w1]P(X  L01):1(C12 —CZ2 )P[w2]P(x  0.12) o And rearranging P{XUJ1}>1(C12 — C22) P[w2]
P(xw2) < {021—011} P[w1] Li1'2 o Therefore, minimization of the Bayes Risk also leads to a Likelihood Ratio Test I Consider a classification problem with two classes 02 defined by the following likelihood functions 0.13—
1 _1_><2 0.15.
P x w = e ._e 23
( I 1) Navy's 0.14.
1 l(21 1:, ' K i
P(xw2): I,_e 2X n.1— '\"2'T : ' a Sketch the two densities
c What is the likelihood ratio? . Assume P[031]=P[032]=0.5, c..1=022=0_. (312:1 and 021:3“? “4'
Determine a decision rule that minimizes the probability of 0.02 0.05  error 1 _1x‘ 0
e —3 w 5
Am 2 "2"~"'3 3 1
1 Ir
1 '5‘lX—2i2 <3 V3 0.2
i' 6 W2
@211 0.1a
—15 w' 0.15
2 3
e 3' 1 0.14.
1 '3 .
e—th—zr 012
M 0.1
1 X2 1 3" 0.08
— —— + — (x — 2)2 0
3 at 0.05.
“’3 0.04
L*31
> 0.02
2x2 —12x+12 0:» x =4.73,1.27 Lug X . The LRT decision rule that minimizes the Bayes Risk is commonly called the
Bayes Criterion 3" (C12 _ C:22 ) Fiﬁ“2] Ba es criterion
21 ‘C11] P[UJ1] y  Many times we will simply be interested in minimizing the probability of error,
which is a special case of the Bayes Criterion that uses the socalled
symmetrical or zeroone cost function. This version ofthe LRT decision rule is referred to as the Maximum A Posteriori Criterion, since it seeks to maximize
the posterior P(m,x) £01 C) le1XJ 3‘1 Maximum A Posteriori w (MAP) Criterion
. Finally, for the case of equal priors lei1=1i2, and the zeroone cost function the LTR decision rule is called the Maximum Likelihood Criterion, since it will
minimize the likelihood P(xo}i) wt )2 Ptxlwil >1 Maximum Likelihood (ML) Criterion Luz . Two more decision rules are commonly cited in the related literature o The NeymanPearson Criterion, used in Detection and Estimation Theory, which also leads to an LRT decision rule, fixes one class error probabilities, say s1<0L, and seeks to minimize
the other I For instance, for the seabasslsalmon classification problem of Lecture 1, there may be some kind of government regulation that we must not misclassify more than 1% of salmon as sea bass The NeymanPearson Criterion is very attractive since it does not require knowledge of priors and
cost function The Minimax Criterion, used in Game Theory. is derived from the Bayes criterion, and
seeks to Mmize the maximum Bayes Risk The Minimax Criterion does nor require knowledge ofthe priors, but it needs a cost function For more information on these methods, the reader is referred to “Detection, Estimation and
Modulation Theory”, by H.L. van Trees, the classical reference in this ﬁeld I The decision rule that minimizes P[error] generalizes very easily to multiclass
problems o For clarity in the derivation, the probability of error is better expressed in terms of the
probability of making a correct assignment P[error] =1— P[correct]
 The probability of making a correct assignment is
c .
P[correct] = Z PMi )  P(x wi)dx
i=1 El o The problem of minimizing P[error] is equivalent to that of maximizing P[correct]. Expressing
P[correct] in terms of the posteriors: i=1:ai I‘IR P[correct] = immjmx max = i J'P(x pt )P(wi)dx = i [PM xP(x)dx o In order to maximize P[correct], we will have to
maximize each of the integrals '3'. In turn, each
integral 3 will be maximized by choosing the
class (oi that yields the maximum P[(oiX] :> we will define Ri to be the region where
P[coix] is maximum Probability I Therefore, the decision rule that minimizes P[error] is the MAP Criterion I To determine which decision rule yields the minimum Bayes Risk for the multiclass
problem we will use a slightly different formulation 0 We will denote by oi the decision to choose class mi,
0 We will denote by a(x) the overall decision rule that maps features x into classes col: (“maﬁa1 . a2. go} I The (conditional) risk 93(uilx) of assigning a feature x to class (oi is
slam —> oi]: Miami  x]: ion.ij  x)
I And the Bayes Risk associated with the decision Jrifle 0L(x) is
sn(c:(x))= [9i(u(x)x}P(x)dx I In order to minimize this expression,we will have to minimize the conditional risk 92(o,(x)x)
at each point x in the feature space, which in turn is equivalent to choosing (oi such that 91(0tix) is minimum ‘ Risk i
b'rd
. I
I
I
I
I ...
View
Full Document
 Fall '10
 BahadirGunturk
 Conditional Probability, Estimation theory, Likelihood function, Admissible decision rule

Click to edit the document details