Improving Digital Forensics Through Data Mining

Improving Digital Forensics Through Data Mining - Improving...

This preview shows page 1 - 2 out of 8 pages.

Improving Digital Forensics Through Data Mining Chrysoula Tsochataridou, Avi Arampatzis, Vasilios Katos Department of Electrical and Computer Engineering Democritus University of Thrace Xanthi, Greece {chrytsoc, avi, vkatos} Abstract —In this paper, we reflect upon the challenges a forensic analyst faces when dealing with a complex investigation and develop an approach for handling and analyzing large amounts of data. As traditional digital forensic analysis tools fail to identify hidden relationships in complex modus operandi of perpetrators, in this paper, we employ data mining techniques in the digital forensics domain. We consider as a vehicle the Enron scandal, which is recognized to be the biggest audit failure in the U.S. corporate history. In particular, we focus on the textual analysis of the electronic messages sent by Enron employees, using clustering techniques. Our goal is to produce a methodology that could be applied by other researchers, who work on projects that involve email analysis. Preliminary findings show that it is possible to use clustering techniques in order to effectively identify malicious collaborative activities. Keywords-Digital Forensics; Email Analysis;Text Mining; Clustering; Weka; Simple K-means. I. I NTRODUCTION The incorporation of computer technology in modern life has increased the productivity and the efficiency in several aspects of it. However, computer technology is not only used as a helpful tool that enhances traditional methodologies. In unethical hands, it can be used as a crime committing tool as well. Particularly, technically skilled criminals exploit its computing power and its accessibility to information, in order to perform, hide or aid unlawful or unethical activities. Nowadays, the number of information security incidents is increasing globally. Considering the fact that a big percentage of the total information produced is digital, arises the need of retrieving electronic evidence in a manner that does not affect its value and integrity [1]. Most of the collected digital evidence is often in the form of textual data, such as e-mails, chat logs, blogs, webpages and text documents. Due to the unstructured nature of such textual data, investigators, during the stage of analysis usually employ searching tools and techniques to identify and extract useful information from a text. Textual information represents one of the core data sources that may contain significant information. The amount of textual data, even on a single personal digital device, is usually very large, in the order of thousands of texts or short messages. The analyst, in this context, encounters objective difficulties in data content analysis and in finding important investigational patterns.
Image of page 1

Subscribe to view the full document.

Image of page 2

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern