CS831 Project Outline
Deadlines
Exact submission deadlines and any updates to deadlines will be shown on
URCourses:
10% Written proposal deadline:
30% Final written report:
5% Final oral report:
July 16, 2012
August 20, 2012
August 21, 2012
Proposal
Typew
Confusion Matrix
A confusion matrix (Kohavi and Provost, 1998) contains information about
actual and predicted classifications done by a classification system.
Performance of such systems is commonly evaluated using the data in the
matrix. The followin
Rough Sets
References:
Z. Pawlak, "Rough Sets: Theoretical Aspects of Reasoning about Data,"
in Theory and Decision Library: Series D, vol. 9, W. Leinfellner and G.
Eberlein eds, Kluwer Academic Publishers, 1991.
Xiaohua Hu and Nick Cercone, "Learning
Cumulative Gains and Lift Charts
Lift is a measure of the effectiveness of a predictive model calculated as
the ratio between the results obtained with and without the predictive
model.
Cumulative gains and lift charts are visual aids for measuring
CS831 Research Paper Presentation Outline
Choosing a Paper
The choice of a research paper is to be from a list provided by the
instructor. The topic must be distinct from topics studied in previous
courses or theses.
Your project may be related to one of
Introduction to Itemsets
The existence of large amounts of scan code data collected by many businesses
represents a potential wealth of information given adequate methods of
transforming the data into meaningful information. One class of such data is
Data Cubes
Introduction
Users of decision support systems often see data in the form of data cubes. The
cube is used to represent data along some measure of interest. Although called
a "cube", it can be 2-dimensional, 3-dimensional, or higher-dimensi
Clustering
Introduction
Cluster analysis is the process of grouping objects into subsets that have meaning in the context of a
particular problem. The objects are thereby organized into an efficient representation that characterizes the
population being
Probability Based Objective Interestingness Measures
Reference:
Geng, Liqiang and Hamilton, H.J. "Interestingness Measures for Data
Mining: A Survey" In ACM Computing Surveys, 28(3).
Information about other references can be found in the Interesting
The Virtuous Cycle of Data Mining
Reference: Berry, M.J.A. and Linoff, G.S., Mastering Data Mining, Wiley:
New York, 2000.
Two Styles of Data Mining
1. Directed Data Mining:
Top-down approach
Used when we know approximately what we are lookin
Machine Learning
Reference:
D. Schuurmans, Machine Learning course notes, University of Waterloo,
1999.
T. Mitchell, Machine Learning, McGraw-Hill, 1997, pp. 20-50.
Machine learning is a process which causes systems to improve with
experience.
Elements
Overview of the KDD Process
Reference: Fayyad, Piatetsky-Shapiro, Smyth, "From Data Mining to
Knowledge Discovery: An Overview", in Fayyad, Piatetsky-Shapiro, Smyth,
Uthurusamy, Advances in Knowledge Discovery and Data Mining, AAAI Press
/ The MIT Press,