This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ble information-retrieval tools supporting interactivity
and iteration. In this way, TASA offers pruning,
grouping, and ordering tools to reﬁne the results of a basic brute-force search for rules.
Data cleaning: T he M ERGE - PURGE system
was applied to the identiﬁcation of duplicate
welfare claims (Hernandez and Stolfo 1995).
It was used successfully on data from the Welfare Department of the State of Washington.
In other areas, a well-publicized system is
IBM’s ADVANCED SCOUT, a specialized data-mining system that helps National Basketball Association (NBA) coaches organize and interpret data from NBA games (U.S. News 1995).
ADVANCED SCOUT was used by several of the
NBA teams in 1996, including the Seattle Supersonics, which reached the NBA ﬁnals.
Finally, a novel and increasingly important
type of discovery is one based on the use of intelligent agents to navigate through an information-rich environment. Although the idea
of active triggers has long been analyzed in the
database ﬁeld, really successful applications of
this idea appeared only with the advent of the
Internet. These systems ask the user to specify
a proﬁle of interest and search for related information among a wide variety of public-domain and proprietary sources. For example,
FIREFLY is a personal music-recommendation
agent: It asks a user his/her opinion of several
music pieces and then suggests other music
that the user might like (<http://
www.fﬂy.com/>). CRAYON (http://crayon.net/>)
allows users to create their own free newspaper
(supported by ads); NEWSHOUND (<http://www.
sjmercury.com/hound/>) from the San Jose
Mercury News and FARCAST (<http://www.farcast.com/> automatically search information
from a wide variety of sources, including
newspapers and wire services, and e-mail relevant documents directly to the user.
These are just a few of the numerous such
systems that use KDD techniques to automatically produce useful information from large
masses of raw data. See Piatetsky-Shapiro et
al. (1996) for an overview of issues in developing industrial KDD applications. Data Mining and KDD
Historically, the notion of ﬁnding useful patterns in data has been given a variety of
names, including data mining, knowledge extraction, information discovery, information
harvesting, data archaeology, and data pattern
processing. The term data mining has mostly
been used by statisticians, data analysts, and
the management information systems (MIS)
communities. It has also gained popularity in
the database ﬁeld. The phrase knowledge discovery in databases was coined at the ﬁrst KDD
workshop in 1989 (Piatetsky-Shapiro 1991) to
emphasize that knowledge is the end product
of a data-driven discovery. It has been popularized in the AI and machine-learning ﬁelds.
In our view, KDD refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step
in this process. Data mining is the application
of speciﬁc algorithms for extracting patterns
View Full Document
This document was uploaded on 02/15/2014.
- Spring '14