Differences and Ratios
Differences and Ratios
When the variance in values from a standard is
more meaningful than the absolute value the
difference may be more useful.
90, 100, 110
Data Reduction Strategies
A data store may have terabytes of data
Complex data analysis may take a very long time to run on the
complete data set
Obtain a reduced representation of the data set that is much
smaller in volume but y
Underfitting and Overfitting
Methods for Performance Evaluation
Obtaining a reliable estimate of performance
Performance of a model may depend on other
factors besides the learning algorithm
Cost of misclassification
CIS 436 Data Mining
Professor: Dr. Anthony Scime
Office: Brown 219
Course Description: Studies data mining process with the goal of discovering nontrivial,
interesting and actionable knowled
The Market-Basket Model
A large set of items, e.g., things sold in a
A large set of baskets, each of which is a small
set of the items, e.g., the things one customer
buys on one day.
Application - Real market baskets
This is the WEKA Classification Homework with the correct answers.
To get the answers below you must use the Voting Data on ANGEL and the files already divided
into training and testing sets. Use Voting Data Train.arff and Voting Data Test.arff
If you cha
The values of an attribute may range over a
number of distinct values.
However, if the range contains many many
taken as a whole their value may be limited
their differences may not be significant.
CIS 436 Data Mining Project
Project: A substantial part of this course is completion of a data mining project. The project
should be worked on throughout the course with presentations discussing progress. It culminates
with a paper. Students will be assig
This is an example of a Data Mining Log.
July 14, 2014 received data set (Copy of Attacks on Countries(1) need to make these changes
Convert ARI-Raw bins where 0-3.3 is Low, 3.3-6.6 is Medium, and 6.6-10 is Large
Convert SRI to bins where 0-3.3 is Low, 3.
Decimal Scaling Normalization
Transformation of data to scale data to a
specific range of values
When the distance between data points vary
and where larger distances distort the data,
normalization can mitigate the di
Attribute Relevance Analysis - Why?
Which dimensions should be included?
How high level of generalization?
Reduce # attributes
Easy to understand patterns
Attribute Relevance Analysis - What?
Attribute Relevance Analysis