Project_SpamClassification

Project_SpamClassification - MS&E 211 Fall 2011 Linear...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MS&E 211 Fall 2011 Linear and Nonlinear Optimization Nov 15, 2011 Prof. Yinyu Ye MS&E211 Project: Spam Classification using SVM There are m emails, each of which is either spam or non-spam. For each email i = 1 ,...,m , we are given c i = 1 (spam) or c i =- 1 (non-spam), and its attributes/features are stored as a column vector a i . Our objective is to find the plane, from the set of separating planes { ( x , x ) : ∀ i, a T i x + x > when c i = 1 , a T i x + x ≤ when c i =- 1 } , that maximizes the likelihood of the sequence just observed. We assume that, for each email i , independent of all others, the probability that it is spam is g ( a T i x + x ) , where g ( z ) = exp ( z ) / [1 + exp ( z )] . a) Formulate this is an optimization problem with a convex objective function. In this part we shall implement the SVM model developed above to classify a sample of real life emails as spam or non-spam. The data for this contains over 4,500 emails and has been obtained from http://archive.ics.uci.edu/ml/datasets/Spambasehttp://archive....
View Full Document

This note was uploaded on 01/16/2012 for the course MS&E 211 at Stanford.

Page1 / 2

Project_SpamClassification - MS&E 211 Fall 2011 Linear...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online