Overview of the KDD Process
Reference: Fayyad, Piatetsky-Shapiro, Smyth, "From Data Mining to
Knowledge Discovery: An Overview", in Fayyad, Piatetsky-Shapiro, Smyth,
Uthurusamy, Advances in Knowledge
CS831 Project Outline
Deadlines
Exact submission deadlines and any updates to deadlines will be shown on
URCourses:
10% Written proposal deadline:
30% Final written report:
5% Final oral report:
July
Confusion Matrix
A confusion matrix (Kohavi and Provost, 1998) contains information about
actual and predicted classifications done by a classification system.
Performance of such systems is common
Rough Sets
References:
Z. Pawlak, "Rough Sets: Theoretical Aspects of Reasoning about Data,"
in Theory and Decision Library: Series D, vol. 9, W. Leinfellner and G.
Eberlein eds, Kluwer Academic Pu
Cumulative Gains and Lift Charts
Lift is a measure of the effectiveness of a predictive model calculated as
the ratio between the results obtained with and without the predictive
model.
Cumulati
CS831 Research Paper Presentation Outline
Choosing a Paper
The choice of a research paper is to be from a list provided by the
instructor. The topic must be distinct from topics studied in previous
co
Introduction to Itemsets
The existence of large amounts of scan code data collected by many businesses
represents a potential wealth of information given adequate methods of
transforming the data
Data Cubes
Introduction
Users of decision support systems often see data in the form of data cubes. The
cube is used to represent data along some measure of interest. Although called
a "cube", it
Clustering
Introduction
Cluster analysis is the process of grouping objects into subsets that have meaning in the context of a
particular problem. The objects are thereby organized into an efficient
Probability Based Objective Interestingness Measures
Reference:
Geng, Liqiang and Hamilton, H.J. "Interestingness Measures for Data
Mining: A Survey" In ACM Computing Surveys, 28(3).
Information
The Virtuous Cycle of Data Mining
Reference: Berry, M.J.A. and Linoff, G.S., Mastering Data Mining, Wiley:
New York, 2000.
Two Styles of Data Mining
1. Directed Data Mining:
Top-down appr
Machine Learning
Reference:
D. Schuurmans, Machine Learning course notes, University of Waterloo,
1999.
T. Mitchell, Machine Learning, McGraw-Hill, 1997, pp. 20-50.
Machine learning is a process wh