Outliers_01 - Programming and Computer Software Vol 29 No 4...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
0361-7688/03/2904- $25.00 © 2003 åÄIä “Nauka /Interperiodica” 0228 Programming and Computer Software, Vol. 29, No. 4, 2003, pp. 228–237. Translated from Programmirovanie, Vol. 29, No. 4, 2003. Original Russian Text Copyright © 2003 by Petrovskiy. 1. INTRODUCTION With the development of information technologies, the number of databases, as well as their dimension and complexity, grow rapidly, resulting in the necessity of automated analysis of great amount of heterogeneous structured information. For this purposes, data mining systems are used. The goal of these systems is to reveal hidden dependences in databases [1]. The analysis results are then used for making a decision by a human or program, such that the quality of the decision made evidently depends on the quality of the data mining. One of the basic problems of data mining (along with classification, prediction, clustering, and associa- tion rules mining problems) is that of the outlier detec- tion [1–3]. The outlier detection is searching for objects in the database that do not obey laws valid for the major part of the data. The identification of an object as an outlier is affected by various factors, many of which are of interest for practical applications. For example, an unusual flow of network packages, revealed by analyz- ing the system log, may be classified as an outlier, because it may be a virus attack or an attempt of an intrusion. Another example is automatic systems for preventing fraudulent use of credit cards. These sys- tems detect unusual transactions and may block such transactions on earlier stages, preventing, thus, large losses. The detection of an object–outlier may be an evidence that there appeared new tendencies in data. For example, a data mining system can detect changes in the market situation earlier than a human expert. The outlier detection problem is similar to the clas- sification problem. A specific feature of the former, however, is that the great majority of the database objects being analyzed are not outliers. Moreover, in many cases, it is not a priori known what objects are outliers. In this work, we consider basic approaches used currently in data mining systems for solving the outlier detection problem. Methods based on kernel functions are considered in more detail, and their basic advan- tages and disadvantages are discussed. A new algo- rithm for detecting outliers is suggested, which pos- sesses a number of advantages compared to the existing methods. It makes use of kernel functions and relies on methods of fuzzy set theory. The performance of the suggested algorithm is examined by the example of the applied problem of anomaly detection, which arises in computer protection systems, the so-called intrusion detection systems [4, 5]. 2. STATISTICAL OUTLIER DETECTION METHODS A traditional approach to solving the outlier detec- tion problem is based on the construction of a probabi- listic data model and the use of mathematical methods of applied statistics and probability theory. A probabi-
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/22/2011 for the course STA 6714 taught by Professor Staff during the Spring '11 term at University of Central Florida.

Page1 / 10

Outliers_01 - Programming and Computer Software Vol 29 No 4...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online