This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: MS&E 211 Fall 2011 Linear and Nonlinear Optimization Nov 15, 2011 Prof. Yinyu Ye MS&E211 Project: Spam Classification using SVM There are m emails, each of which is either spam or nonspam. For each email i = 1 ,...,m , we are given c i = 1 (spam) or c i = 1 (nonspam), and its attributes/features are stored as a column vector a i . Our objective is to find the plane, from the set of separating planes { ( x , x ) : ∀ i, a T i x + x > when c i = 1 , a T i x + x ≤ when c i = 1 } , that maximizes the likelihood of the sequence just observed. We assume that, for each email i , independent of all others, the probability that it is spam is g ( a T i x + x ) , where g ( z ) = exp ( z ) / [1 + exp ( z )] . a) Formulate this is an optimization problem with a convex objective function. In this part we shall implement the SVM model developed above to classify a sample of real life emails as spam or nonspam. The data for this contains over 4,500 emails and has been obtained from http://archive.ics.uci.edu/ml/datasets/Spambasehttp://archive....
View
Full
Document
This note was uploaded on 01/16/2012 for the course MS&E 211 at Stanford.
 '07
 YINYUYE

Click to edit the document details