From Data Mining to Knowledge Discovery in Databases

See also agrawal and psaila 1995 and mannila toivonen

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ake previously discovered patterns invalid. In addition, the variables measured in a given application database can be modified, deleted, or augmented with new measurements over time. Possible solutions include incremental methods for updating the patterns and treating change as an opportunity for discovery by using it to cue the search for patterns of change only (Matheus, Piatetsky-Shapiro, and McNeill 1996). See also Agrawal and Psaila (1995) and Mannila, Toivonen, and Verkamo (1995). Missing and noisy data: This problem is especially acute in business databases. U.S. census data reportedly have error rates as great as 20 percent in some fields. Important attributes can be missing if the database was not designed with discovery in mind. Possible solutions include more sophisticated statistical strategies to identify hidden variables and dependencies (Heckerman 1996; Smyth et al. 1996). Complex relationships between fields: Hierarchically structured attributes or values, relations between attributes, and more sophisticated means for representing knowledge about the contents of a database will require algorithms that can effectively use such information. Historically, data-mining algorithms have been developed for simple attribute-value records, although new techniques for deriving relations between variables are being developed (Dzeroski 1996; Djoko, Cook, and Holder 1995). Understandability of patterns: In many applications, it is important to make the discoveries more understandable by humans. Possible solutions include graphic representations (Buntine 1996; Heckerman 1996), rule structuring, natural language generation, and techniques for visualization of data and knowledge. Rule-refinement strategies (for example, Major and Mangano [1995]) can be used to address a related problem: The discovered knowledge might be implicitly or explicitly redundant. User interaction and prior knowledge: Many current KDD methods and tools are not truly interactive and cannot easily incorporate prior knowledge about a problem except in simple ways. The use of domain knowl- 50 AI MAGAZINE edge is important in all the steps of the KDD process. Bayesian approaches (for example, Cheeseman [1990]) use prior probabilities over data and distributions as one form of encoding prior knowledge. Others employ deductive database capabilities to discover knowledge that is then used to guide the data-mining search (for example, Simoudis, Livezey, and Kerber [1995]). Integration with other systems: A standalone discovery system might not be very useful. Typical integration issues include integration with a database management system (for example, through a query interface), integration with spreadsheets and visualization tools, and accommodating of real-time sensor readings. Examples of integrated KDD systems are described by Simoudis, Livezey, and Kerber (1995) and Stolorz, Nakamura, Mesrobiam, Muntz, Shek, Santos, Yi, Ng, Chien, Mechoso, and Farrara (1995). Concluding Remarks: The Potential Role of AI in KDD In addition to machine...
View Full Document

Ask a homework question - tutors are online