class_08_27

# class_08_27 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows pages 1–9. Sign up to view the full content.

Statistical Data Mining ORIE 474 Fall 2007 Tatiyana V. Apanasovich Introduction 08/27/07

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1.2 Nature of Data Sets What is a Data Set ? Set of measurements taken from some environment or process Simplest case: collection of n objects , for each object we have the same p measurements n x p data matrix Observed Data Matrix can also be referred to as: data set, training data, sample, database p 2 1 1 n
Ex: Adult data ID Age Work class Education Marital Status Sex Income class 25 59 Private HS graduate Divorced Female <=50K 26 56 Local gov Bachelors Married Male >50K 27 19 Private HS graduate Never married Male <=50K 28 54 N/A Some college Married Male >50K 29 39 Private HS graduate Divorced Male <=50K 30 49 Private HS graduate Married Male <=50K 31 23 Local gov Assoc academic Never married Male <=50K Source: Machine Learning Repository of UCI

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1.3 Models and Patterns Model structure global summary of a data set Model can make statement about any point in the full measurement space: if we consider the rows as p- dimensional vectors Ex: Diagnosis based on test results Pattern structure makes statements only about restricted regions of the space spanned by the variables Ex: mail order purchases may reveal that people who buy certain combinations of products are likely to buy others
1.4 DM Objectives Model Building A Exploratory Data Analysis B Descriptive Modeling C Predictive Modeling Pattern Recognition D Discovering Pattern and Rules E Retrieval by Content

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
A. Exploratory Data Analysis (Chapter 3) Goal: Explore data w/o clear ideas of what we are looking for Techniques: For p>3, projection techniques useful and necessary Principal components Spatial displays
B. Descriptive Modeling (Chapter 9) Goal: Describe all of the data (or the process generating it) Ex: Models for the overall probability distribution of the data (density estimation) Partitioning of the p-dim. Measurement space into Models describing the relationship between variables (dependency modeling)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
C. Predictive Modeling (Chapter 10
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 12/23/2009 for the course ORIE 474 at Cornell.

### Page1 / 23

class_08_27 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online