This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Grahams website at www.paulgraham.com/ffb.html, and the second was a paper by I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, titled An Experimental Comparison of Naive Bayesian and KeywordBased AntiSpam Filtering with Personal Email Messages , which appeared in the Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in In formation Retrieval (pages 160167). The Graham paper is interesting, but is written more for those with almost no mathematical background, and it doesnt explain the math behind the algorithm; and, even though Grahams paper gives a link to a page describing the math, that linked page also does not do an adequeate job, since it does not place the result proved and used in its proper Bayesian context. Here in these notes I will give a more for mal treatment, and will be explicit about the conditional independence assumptions that one makes. 2 Bayesian Probability In this section I will prove a few basic results that we will use. Some of these results are proved in your book, but I will prove them here again anyway, to make these notes selfcontained. First, we have Bayess Theorem: Theorem (Bayess Theorem). Suppose that S is a sample space, and is a algebra on S having probability measure P . Further, suppose that we have a partition of S into (disjoint) events C 1 , C 2 , ..., C k ; that is, S = k [ i =1 C i , and , for i 6 = j, C i C j = . Then, for any i = 1 , 2 , ..., k , we have P ( C i  A ) = P ( A  C i ) P ( C i ) k j =1 P ( A  C j ) P ( C j ) . 1 Proof. The proof is really obvious, once you know what everything means. First, we note that A can be partitioned as follows: A = k [ j =1 A C j , where we notice that the sets A C j are all disjoint, since the sets C j are disjoint. Thus, we have P ( A ) = k X j =1 P ( A C j ) = k X j =1 P ( A  C j ) P ( C j ) . (1) The second equality here just follows from the definition of conditional prob ability P ( C  D ) = P ( C D ) P ( D ) ....
View
Full
Document
 Spring '08
 Staff
 Statistics, Probability

Click to edit the document details