bayesian_filtering - 1 Introductory Comments First, I would...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Grahams website at, and the second was a paper by I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, titled An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages , which appeared in the Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in In- formation Retrieval (pages 160-167). The Graham paper is interesting, but is written more for those with almost no mathematical background, and it doesnt explain the math behind the algorithm; and, even though Grahams paper gives a link to a page describing the math, that linked page also does not do an adequeate job, since it does not place the result proved and used in its proper Bayesian context. Here in these notes I will give a more for- mal treatment, and will be explicit about the conditional independence assumptions that one makes. 2 Bayesian Probability In this section I will prove a few basic results that we will use. Some of these results are proved in your book, but I will prove them here again anyway, to make these notes self-contained. First, we have Bayess Theorem: Theorem (Bayess Theorem). Suppose that S is a sample space, and is a -algebra on S having probability measure P . Further, suppose that we have a partition of S into (disjoint) events C 1 , C 2 , ..., C k ; that is, S = k [ i =1 C i , and , for i 6 = j, C i C j = . Then, for any i = 1 , 2 , ..., k , we have P ( C i | A ) = P ( A | C i ) P ( C i ) k j =1 P ( A | C j ) P ( C j ) . 1 Proof. The proof is really obvious, once you know what everything means. First, we note that A can be partitioned as follows: A = k [ j =1 A C j , where we notice that the sets A C j are all disjoint, since the sets C j are disjoint. Thus, we have P ( A ) = k X j =1 P ( A C j ) = k X j =1 P ( A | C j ) P ( C j ) . (1) The second equality here just follows from the definition of conditional prob- ability P ( C | D ) = P ( C D ) P ( D ) ....
View Full Document

Page1 / 6

bayesian_filtering - 1 Introductory Comments First, I would...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online