CS831 Project Outline
Exact submission deadlines and any updates to deadlines will be shown on
10% Written proposal deadline:
30% Final written report:
5% Final oral report:
July 16, 2012
August 20, 2012
August 21, 2012
A confusion matrix (Kohavi and Provost, 1998) contains information about
actual and predicted classifications done by a classification system.
Performance of such systems is commonly evaluated using the data in the
matrix. The followin
Z. Pawlak, "Rough Sets: Theoretical Aspects of Reasoning about Data,"
in Theory and Decision Library: Series D, vol. 9, W. Leinfellner and G.
Eberlein eds, Kluwer Academic Publishers, 1991.
Xiaohua Hu and Nick Cercone, "Learning
Cumulative Gains and Lift Charts
Lift is a measure of the effectiveness of a predictive model calculated as
the ratio between the results obtained with and without the predictive
Cumulative gains and lift charts are visual aids for measuring
CS831 Research Paper Presentation Outline
Choosing a Paper
The choice of a research paper is to be from a list provided by the
instructor. The topic must be distinct from topics studied in previous
courses or theses.
Your project may be related to one of
Introduction to Itemsets
The existence of large amounts of scan code data collected by many businesses
represents a potential wealth of information given adequate methods of
transforming the data into meaningful information. One class of such data is
Users of decision support systems often see data in the form of data cubes. The
cube is used to represent data along some measure of interest. Although called
a "cube", it can be 2-dimensional, 3-dimensional, or higher-dimensi
Cluster analysis is the process of grouping objects into subsets that have meaning in the context of a
particular problem. The objects are thereby organized into an efficient representation that characterizes the
Probability Based Objective Interestingness Measures
Geng, Liqiang and Hamilton, H.J. "Interestingness Measures for Data
Mining: A Survey" In ACM Computing Surveys, 28(3).
Information about other references can be found in the Interesting
The Virtuous Cycle of Data Mining
Reference: Berry, M.J.A. and Linoff, G.S., Mastering Data Mining, Wiley:
New York, 2000.
Two Styles of Data Mining
1. Directed Data Mining:
Used when we know approximately what we are lookin
D. Schuurmans, Machine Learning course notes, University of Waterloo,
T. Mitchell, Machine Learning, McGraw-Hill, 1997, pp. 20-50.
Machine learning is a process which causes systems to improve with
Overview of the KDD Process
Reference: Fayyad, Piatetsky-Shapiro, Smyth, "From Data Mining to
Knowledge Discovery: An Overview", in Fayyad, Piatetsky-Shapiro, Smyth,
Uthurusamy, Advances in Knowledge Discovery and Data Mining, AAAI Press
/ The MIT Press,