This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ake previously discovered patterns invalid. In addition, the variables measured in a given application database can be modiﬁed, deleted, or
augmented with new measurements over
time. Possible solutions include incremental
methods for updating the patterns and treating change as an opportunity for discovery
by using it to cue the search for patterns of
change only (Matheus, Piatetsky-Shapiro, and
McNeill 1996). See also Agrawal and Psaila
(1995) and Mannila, Toivonen, and Verkamo
Missing and noisy data: This problem is
especially acute in business databases. U.S.
census data reportedly have error rates as
great as 20 percent in some ﬁelds. Important
attributes can be missing if the database was
not designed with discovery in mind. Possible
solutions include more sophisticated statistical strategies to identify hidden variables and
dependencies (Heckerman 1996; Smyth et al.
Complex relationships between ﬁelds:
Hierarchically structured attributes or values,
relations between attributes, and more sophisticated means for representing knowledge about the contents of a database will require algorithms that can effectively use such
information. Historically, data-mining algorithms have been developed for simple attribute-value records, although new techniques for deriving relations between
variables are being developed (Dzeroski 1996;
Djoko, Cook, and Holder 1995).
Understandability of patterns: In many
applications, it is important to make the discoveries more understandable by humans.
Possible solutions include graphic representations (Buntine 1996; Heckerman 1996), rule
structuring, natural language generation, and
techniques for visualization of data and
knowledge. Rule-reﬁnement strategies (for example, Major and Mangano ) can be
used to address a related problem: The discovered knowledge might be implicitly or explicitly redundant.
User interaction and prior knowledge:
Many current KDD methods and tools are not
truly interactive and cannot easily incorporate prior knowledge about a problem except
in simple ways. The use of domain knowl- 50 AI MAGAZINE edge is important in all the steps of the KDD
process. Bayesian approaches (for example,
Cheeseman ) use prior probabilities
over data and distributions as one form of encoding prior knowledge. Others employ deductive database capabilities to discover
knowledge that is then used to guide the data-mining search (for example, Simoudis,
Livezey, and Kerber ).
Integration with other systems: A standalone discovery system might not be very
useful. Typical integration issues include integration with a database management system
(for example, through a query interface), integration with spreadsheets and visualization
tools, and accommodating of real-time sensor
readings. Examples of integrated KDD systems are described by Simoudis, Livezey, and
Kerber (1995) and Stolorz, Nakamura, Mesrobiam, Muntz, Shek, Santos, Yi, Ng, Chien,
Mechoso, and Farrara (1995). Concluding Remarks: The
Potential Role of AI in KDD
In addition to machine...
View Full Document
- Spring '14