This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ng for
health-care management. In a totally different type of application, planetary geologists
sift through remotely sensed images of planets and asteroids, carefully locating and cataloging such geologic objects of interest as impact craters. Be it science, marketing, ﬁnance,
health care, retail, or any other ﬁeld, the classical approach to data analysis relies fundamentally on one or more analysts becoming Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00 FALL 1996 37 Articles There is an
for a new
and tools to
data. intimately familiar with the data and serving
as an interface between the data and the users
For these (and many other) applications,
this form of manual probing of a data set is
slow, expensive, and highly subjective. In
fact, as data volumes grow dramatically, this
type of manual data analysis is becoming
completely impractical in many domains.
Databases are increasing in size in two ways:
(1) the number N of records or objects in the
database and (2) the number d of ﬁelds or attributes to an object. Databases containing on
the order of N = 109 objects are becoming increasingly common, for example, in the astronomical sciences. Similarly, the number of
ﬁelds d can easily be on the order of 102 or
even 103, for example, in medical diagnostic
applications. Who could be expected to digest millions of records, each having tens or
hundreds of ﬁelds? We believe that this job is
certainly not one for humans; hence, analysis
work needs to be automated, at least partially.
The need to scale up human analysis capabilities to handling the large number of bytes
that we can collect is both economic and scientiﬁc. Businesses use data to gain competitive advantage, increase efﬁciency, and provide more valuable services to customers.
Data we capture about our environment are
the basic evidence we use to build theories
and models of the universe we live in. Because computers have enabled humans to
gather more data than we can digest, it is only natural to turn to computational techniques to help us unearth meaningful patterns and structures from the massive
volumes of data. Hence, KDD is an attempt to
address a problem that the digital information era made a fact of life for all of us: data
overload. Data Mining and Knowledge
Discovery in the Real World
A large degree of the current interest in KDD
is the result of the media interest surrounding
successful KDD applications, for example, the
focus articles within the last two years in
Business Week, Newsweek, Byte, PC Week, and
other large-circulation periodicals. Unfortunately, it is not always easy to separate fact
from media hype. Nonetheless, several welldocumented examples of successful systems
can rightly be referred to as KDD applications
and have been deployed in operational use
on large-scale r...
View Full Document
This document was uploaded on 02/15/2014.
- Spring '14