Computer Science Efficient_Support_Vector_Machines_for_Sp.pdf

This preview shows page 1 - 2 out of 18 pages.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 13, No. 1, January 2015 Efficient Support Vector Machines for Spam Detection: A Survey Zahra S. Torabi Department of Computer Engineering, Najafabad branch, Islamic Azad University, Isfahan, Iran. [email protected] Mohammad H. Nadimi-Shahraki Department of Computer Engineering, Najafabad branch, Islamic Azad University, Isfahan, Iran. [email protected] Akbar Nabiollahi Department of Computer Engineering, Najafabad branch, Islamic Azad University, Isfahan, Iran. [email protected] Abstract Nowadays, the increase volume of spam has been annoying for the internet users. Spam is commonly defined as unsolicited email messages, and the goal of spam detection is to distinguish between spam and legitimate email messages. Most of the spam can contain viruses, Trojan horses or other harmful software that may lead to failures in computers and networks, consumes network bandwidth and storage space and slows down email servers. In addition it provides a medium for distributing harmful code and/or offensive content and there is not any complete solution for this problem, then the necessity of effective spam filters increase. In the recent years, the usability of machine learning techniques for automatic filtering of spam can be seen. Support Vector Machines (SVM) is a powerful, state-of-the-art algorithm in machine learning that is a good option to classify spam from email. In this article, we consider the evaluation criterions of SVM for spam detection and filtering. Keywords- support vector machines (SVM); spam detection; classification; spam filtering; machine learning; I. I NTRODUCTION Influenced by the global network of internet, time and place for communication has decreased by emails. As a result the users prefer to use email in order to communicate with others and send or receive information. In fact spam filtering is an application for classification of emails, and has a high probability of recognizing the spam. Spam is an ongoing issue that has no perfect solution and there is no complete solution technique about spam problem [1]. According to the recent researches done by Kaspersky Laboratory (2014), almost 65.7% of all emails were considered as spam, respectively in January. In this regards, a huge amount of bandwidth is wasted and an overflow occurs while sending the emails. According to reported statistics United State of America, China and South Korea are among the main sources of these spam respectively with 21.9%, 16.0% and 12.5% . Fig. 1 shows the spam sources for each country [2]. Fig. 2 shows the spam sources according to geographical area. In figure 2, Asia and North America are the greatest sources for the spam, respectively with 49.1 and 22.7 percentage [2]. Recently, separating legitimate emails from spasm has been considerably increased and developed. In fact, separating spam from legitimate emails can be considered as a kind of text classification, because the form of all emails is generally textual and by receiving the spam, the type has to be defined.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture