dm2part2 - University of Florida CISE department Data...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
University of Florida CISE department Gator Engineering Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Data Preprocessing • What preprocessing step can or should we apply to the data to make it more suitable for data mining? – Aggregation – Sampling – Dimensionality Reduction – Feature Subset Selection – Feature Creation – Discretization and Binarization – Attribute Transformation
Background image of page 2
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Aggregation • Aggregation refers to combing two or more attributes (or objects) into a single attribute (or object) • For example, merging daily sales figures to obtain monthly sales figures • Why aggregation? – Data reduction • Allows use of more expensive algorithms – If done properly, aggregation can act as scope or scale, providing a high level view of data instead of a low level view
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 – Behavior of group of objects in more stable than that of individual objects •The aggregate quantities have less variability than the individual objects being aggregated Standard Deviation of Average Monthly Precipitation Standard Deviation of Average Yearly Precipitation
Background image of page 4
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Sampling • Sampling is the process of understanding characteristics of data or models based on a subset of the original data. It is used extensively in all aspects of data exploration and mining • Why sample – Obtaining the entire set of “data of interest” is too expensive or time consuming – Obtaining the entire set of data may not be necessary (and hence a waste of resources)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Representative Sample • A sample is representative for a particular operation if it results in approximately the same outcome as if the entire data set was used • A sample that may be representative for one operation, may not be representative for another operation – For example, a sample may be representative for histogram along one dimension but may not be good enough for correlation between two dimensions
Background image of page 6
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Sampling Approaches •Simple Random Sampling – There is an equal probability of selecting any particular item – Sampling without replacement: Once an item is selected, it is removed from the population for obtaining future samples – Sampling with replacement: Selected item is not removed from the population for obtaining future samples
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/13/2011 for the course CIS 4930 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 26

dm2part2 - University of Florida CISE department Data...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online