1
Computing Probabilities from Data
•
Various probabilities you will need to compute for
Naive Bayesian Classifier (using MLE here):
m
i
i
m
i
i
m
Y
X
P
Y
X
P
Y
X
X
X
P
Y
P
1
1
2
1
)

(
log
)

(
log
)

,...
,
(
log
)

(
log
X
)])
(
)

(
(log[
max
arg
)
(
)

(
max
arg
ˆ
Y
P
Y
P
Y
P
Y
P
y
y
y
X
X
instances
#
total
0
class
in
instances
#
)
0
(
ˆ
Y
P
)
0
(
ˆ
)
0
,
0
(
ˆ
)
0

0
(
ˆ
Y
P
Y
X
P
Y
X
P
i
i
)
1
(
ˆ
)
1
,
0
(
ˆ
)
1

0
(
ˆ
Y
P
Y
X
P
Y
X
P
i
i
instances
#
total
0
class
and
0
where
instances
#
)
0
,
0
(
ˆ
i
i
X
Y
X
P
)
0

0
(
ˆ
1
)
0

1
(
ˆ
Y
X
P
Y
X
P
i
i
From Naive Bayes to Logistic Regression
•
Recall the Naive Bayes Classifier
Predict
Use assumption that
We are really modeling joint probability P(
X
, Y)
•
But for classification, really care about P(Y 
X
)
Really want to predict
Modeling full joint probability P(
X
, Y) is just proxy for this
•
So, how do we model P(Y 
X
) directly?
Welcome our friend: logistic regression!
)
(
)

(
max
arg
)
,
(
max
arg
ˆ
Y
P
Y
P
Y
P
Y
y
y
X
X
m
i
i
m
Y
X
P
Y
X
X
X
P
Y
P
1
2
1
)

(
)

,...
,
(
)

(
X
)

(
max
arg
ˆ
X
Y
P
y
y
Logistic Regression
•
Model
conditional
likelihood P(Y 
X
) directly
Model this probability with
logistic
function:
For simplicity define
Since P(Y = 0 
X
) + P(Y = 1 
X
) = 1, we obtain:
Note: logodds is
linear
function of inputs X
j
:
m
j
j
j
z
z
X
z
e
e
Y
P
Y
P
0
log
1
log
)

0
(
)

1
(
log
X
X
m
j
j
j
z
X
z
e
Y
P
1
where
1
1
)

1
(
X
m
j
j
j
z
z
X
z
e
e
Y
P
0
where
1
)

0
(
X
m
j
j
j
X
z
X
0
0
0
,
1
so
and
The Logistic Function
Note: inflection point at z = 0.
f
(0) = 0.5
z
e
z
f
1
1
)
(
z
Want to distinguish y = 1 (blue)
points from y = 0 (red) points
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document