8/21/2014
Differences and Ratios
Differences and Ratios
Difference
Ratios
When the variance in values from a standard is
more meaningful than the absolute value the
difference may be more useful.
Exam
8/21/2014
Data Reduction Strategies
A data store may have terabytes of data
Complex data analysis may take a very long time to run on the
complete data set
Data reduction
Obtain a reduced representati
8/21/2014
Underfitting and Overfitting
Methods for Performance Evaluation
Obtaining a reliable estimate of performance
Performance of a model may depend on other
factors besides the learning algorithm
CIS 436 Data Mining
Fall 2014
MWF 12:20-1:10
Hartwell 27
Professor: Dr. Anthony Scime
Office: Brown 219
Email: [email protected]
Course Description: Studies data mining process with the goal of dis
8/21/2014
The Market-Basket Model
Association
A large set of items, e.g., things sold in a
supermarket.
A large set of baskets, each of which is a small
set of the items, e.g., the things one customer
This is the WEKA Classification Homework with the correct answers.
To get the answers below you must use the Voting Data on ANGEL and the files already divided
into training and testing sets. Use Voti
8/21/2014
Smoothing
Smoothing
The values of an attribute may range over a
number of distinct values.
However, if the range contains many many
distinct values,
taken as a whole their value may be limi
CIS 436 Data Mining Project
Project: A substantial part of this course is completion of a data mining project. The project
should be worked on throughout the course with presentations discussing progr
This is an example of a Data Mining Log.
July 14, 2014 received data set (Copy of Attacks on Countries(1) need to make these changes
Convert ARI-Raw bins where 0-3.3 is Low, 3.3-6.6 is Medium, and 6.6
9/5/2014
Normalization
Normalization
Decimal Scaling Normalization
Transformation of data to scale data to a
specific range of values
When the distance between data points vary
and where larger distan
8/21/2014
Attribute Relevance Analysis - Why?
Which dimensions should be included?
How high level of generalization?
Reduce # attributes
Easy to understand patterns
Attribute Relevance
2
Attribute