bayesian_filtering - 1 Introductory Comments First I would...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Graham’s website at, and the second was a paper by I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, titled An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages , which appeared in the Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in In- formation Retrieval (pages 160-167). The Graham paper is interesting, but is written more for those with almost no mathematical background, and it doesn’t explain the math behind the algorithm; and, even though Graham’s paper gives a link to a page describing the math, that linked page also does not do an adequeate job, since it does not place the result proved and used in its proper Bayesian context. Here in these notes I will give a more for- mal treatment, and will be explicit about the “conditional independence” assumptions that one makes. 2 Bayesian Probability In this section I will prove a few basic results that we will use. Some of these results are proved in your book, but I will prove them here again anyway, to make these notes self-contained. First, we have Bayes’s Theorem: Theorem (Bayes’s Theorem). Suppose that S is a sample space, and Σ is a σ-algebra on S having probability measure P . Further, suppose that we have a partition of S into (disjoint) events C 1 , C 2 , ..., C k ; that is, S = k [ i =1 C i , and , for i 6 = j, C i ∩ C j = ∅ . Then, for any i = 1 , 2 , ..., k , we have P ( C i | A ) = P ( A | C i ) P ( C i ) ∑ k j =1 P ( A | C j ) P ( C j ) . 1 Proof. The proof is really obvious, once you know what everything means. First, we note that A can be partitioned as follows: A = k [ j =1 A ∩ C j , where we notice that the sets A ∩ C j are all disjoint, since the sets C j are disjoint. Thus, we have P ( A ) = k X j =1 P ( A ∩ C j ) = k X j =1 P ( A | C j ) P ( C j ) . (1) The second equality here just follows from the definition of conditional prob- ability P ( C | D ) = P ( C ∩ D ) P ( D ) ....
View Full Document

This note was uploaded on 10/23/2011 for the course MATH 3225 taught by Professor Staff during the Spring '08 term at Georgia Tech.

Page1 / 6

bayesian_filtering - 1 Introductory Comments First I would...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online