Chapter 11~13.pdf - STAT 200 Chapters 11-13 Probability and Random Variables Do you ever wonder how email spam filters work In simple terms algorithms

Chapter 11~13.pdf - STAT 200 Chapters 11-13 Probability and...

This preview shows page 1 - 7 out of 29 pages.

STAT 200 Chapters 11-13 Probability and Random Variables Do you ever wonder how email spam filters work? In simple terms, algorithms are developed to screen incoming emails. Each email message is assigned a spam score (based on whether certain spammy words are present and their positions / occurrences in relation to other words in the message). You can think of the score as a measure of how likely a message is a spam. A threshold is chosen such that messages whose spam score exceeds the threshold will be classified as spams. Otherwise, the messages are classified as non-spams (or “ham”). Effective spam filters have low false positive (non-spam misclassified as spam) and low false negative (spam misclassified as non-spam) rates. Spam filtering methods are based on probability and statistical theories. Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 1
Image of page 1

Subscribe to view the full document.

STAT 200 For any incoming email message, it can be a true spam or a non-spam, but we cannot predict its kind with certainty until the message arrives. This is an example of a random phenomenon . What is the chance of the next email message is a spam? There is a probability associated with each possible outcome (spam or non-spam). Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 2
Image of page 2
STAT 200 Probability concepts (Chapter 12) A sample space S is the set of of all possible outcomes of a random phenomenon. e.g., For tossing a coin, the sample space is the set { Head, Tail } . For rolling a die, the sample space is the set { 1,2,3,4,5,6 } . An event is an outcome or some outcomes from a random phenomenon. We denote an event by an uppercase letter, e.g., A, B, C. e.g., Tossing a head is an event. Tossing a tail is another event. Tossing two heads in two tosses is also an event. Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 3
Image of page 3

Subscribe to view the full document.

STAT 200 The notation P ( A ) denotes the probability that an event A will occur. Properties of P ( A ) : 1. 0 P ( A ) 1 P ( A ) = 0 implies event A is impossible P ( A ) = 1 implies event A is certain The larger the P ( A ) , the more likely the event A will occur. 2. the sum of the probabilities of all the non-overlapping events in the sample space is equal to 1 Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 4
Image of page 4
STAT 200 Example 1 Consider a spam filter used to screen 1000 incoming email messages: True spam True non-spam Total Classified as spam 570 5 575 Classified as non-spam 30 395 425 600 400 1000 An email message is randomly chosen from the 1000 messages. What is the probability that 1. it is a spam? Answer: # true spams total # email messages = 600/1000 = 0.60 2. it is a non-spam? Answer: 3. it is misclassified? Answer: Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 5
Image of page 5

Subscribe to view the full document.

STAT 200 4. it is a spam given that it is classified as spam?
Image of page 6
Image of page 7
  • Spring '12
  • no
  • Probability theory, Eugenia Yu, UBC Department of Statistics

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask 0 bonus questions You can ask 0 questions (0 expire soon) You can ask 0 questions (will expire )
Answers in as fast as 15 minutes