This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Data Mining Methods for Detection of New Malicious Executables Matthew G. Schultz and Eleazar Eskin Department of Computer Science Columbia University mgs,eeskin @cs.columbia.edu Erez Zadok Department of Computer Science State University of New York at Stony Brook [email protected] Salvatore J. Stolfo Department of Computer Science Columbia University [email protected] Abstract A serious security threat today is malicious executables, especially new, unseen malicious executables often arriv- ing as email attachments. These new malicious executables are created at the rate of thousands every year and pose a serious security threat. Current anti-virus systems attempt to detect these new malicious programs with heuristics gen- erated by hand. This approach is costly and oftentimes inef- fective. In this paper, we present a data-mining framework that detects new, previously unseen malicious executables accurately and automatically. The data-mining framework automatically found patterns in our data set and used these patterns to detect a set of new malicious binaries. Com- paring our detection methods with a traditional signature- based method, our method more than doubles the current detection rates for new malicious executables. 1 Introduction A malicious executable is defined to be a program that per- forms a malicious function, such as compromising a sys- tem’s security, damaging a system or obtaining sensitive information without the user’s permission. Using data min- ing methods, our goal is to automatically design and build a scanner that accurately detects malicious executables be- fore they have been given a chance to run. Data mining methods detect patterns in large amounts of data, such as byte code, and use these patterns to detect future instances in similar data. Our framework uses clas- sifiers to detect new malicious executables. A classifier is a rule set, or detection model, generated by the data mining algorithm that was trained over a given set of training data. One of the primary problems faced by the virus com- munity is to devise methods for detecting new malicious programs that have not yet been analyzed . Eight to ten malicious programs are created every day and most cannot be accurately detected until signatures have been generated for them . During this time period, systems protected by signature-based algorithms are vulnerable to attacks. Malicious executables are also used as attacks for many types of intrusions. In the DARPA 1999 intrusion detection evaluation, many of the attacks on the Windows platform were caused by malicious programs . Recently, a mali- cious piece of code created a hole in a Microsoft’s internal network . That attack was initiated by a malicious ex- ecutable that opened a back-door into Microsoft’s internal network resulting in the theft of Microsoft’s source code....
View Full Document
- Fall '09
- Data Mining, Type I and type II errors, Binary file, Executable, malicious executables