This preview shows page 1. Sign up to view the full content.
Unformatted text preview: e not conditionally independent, i.e. P (XY ) 6= P (x1 Y )P (x2 Y ), . . . , P (xd Y ) Class +
+
+
+
°
°
°
° And yet, one of the most widely used classifiers. Easy to train!
It often performs well even when the assumption is violated. (left) A small email data set described by count vectors. (right) The same data set
described by bit vectors. Peter Flach (University of Bristol) What are the parameters of the model?
Machine Learning: Making Sense of Data August 25, 2012 P(+) = 0.5, P() = 0.5 277 / 349 P(a+) = 0.5, P(a) = 0.75
P(b+) = 0.75, P(b)= 0.25 Domingos, P., & Pazzani, M. (1997). Beyond Independence: Conditions
for the Optimality of the Simple Bayesian Classifier. Machine {i : yi = y }
ˆ
P (y ) =
n Learning. 29, 103130. ˆ
P ( xi , y )
{i : Xij = xi , yi = y }/n
ˆ
P ( xi  y ) =
=
ˆ
{i : yi = y }/n
P (y ) P(c+) = 0.25, P(c)= 0.25
21 When there are few training examples 22 When there are few training examples What if you never see a training example where x1=a when
y=spam? What if you never see a training example where x1=a when
y=spam? P(x  spam) = P(a  spam) P(b  spam) P(c  spam) = 0 P(x  spam) = P(a  spam) P(b  spam) P(c  spam) = 0 What to do? What to do?
Add “virtual” examples for which x1=a when y=spam. 23 24 6 10/29/13 Naïve Bayes for continuous variables Continuous Probability Distributions
The probability of the random variable assuming a value within
some given interval from x1 to x2 is defined to be the area under the
graph of the probability density function between x1 and x2. Need to talk about continuous distributions! f (x) Uniform x1 x2 f (x) x Normal/Gaussian x1 x2 x 25 Expectations
Discrete variables The Gaussian (normal) distribution Continuous variables Conditional expectation
(discrete) Approximate expectation
(discrete and continuous) 7 10/29/13 Properties of the Gaussian distribution A random variable having a normal distribution
with a mean of 0 and a standard deviation of 1 is
said to have a standard normal probability
distribution. 99.72%
95.44%
68.26% µ – 3σ µ – 1σ µ µ + 3σ µ + 1σ µ – 2σ x µ + 2σ Standard Normal Probability Distribution
Standard Normal Distribution Gaussian Parameter Estimation Converting to the Standard Normal Distribution z= x−µ σ Likelihood function We can think of z as a measure of the number of
standard deviations x is from µ. 8 10/29/13 Maximum (Log) Likelihood Example 34 Gaussian models Gaussian Naïve Bayes Assume we have data that belongs to three classes, and assume
a likelihood that follows a Gaussian distribution Likelihood function: P ( X i = x  Y = yk ) = p 1
2⇡ exp
ik ✓ µik )2 (x
2 2
ik ◆ Need to estimate mean and variance for each feature in each
class. 35 36 9 10/29/13 Summary
Naïve Bayes classifier:
²་ What’s the assumption ²་ Why we make it ²་ How we learn it Naïve Bayes for discrete data
Gaussian naïve Bayes 37 10...
View
Full
Document
 Fall '08
 Anderson,C
 Machine Learning

Click to edit the document details