class_08_27

class_08_27 - Statistical Data Mining ORIE 474 Fall 2007...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Data Mining ORIE 474 Fall 2007 Tatiyana V. Apanasovich Introduction 08/27/07
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1.2 Nature of Data Sets What is a Data Set ? Set of measurements taken from some environment or process Simplest case: collection of n objects , for each object we have the same p measurements n x p data matrix Observed Data Matrix can also be referred to as: data set, training data, sample, database p 2 1 1 n
Background image of page 2
Ex: Adult data ID Age Work class Education Marital Status Sex Income class 25 59 Private HS graduate Divorced Female <=50K 26 56 Local gov Bachelors Married Male >50K 27 19 Private HS graduate Never married Male <=50K 28 54 N/A Some college Married Male >50K 29 39 Private HS graduate Divorced Male <=50K 30 49 Private HS graduate Married Male <=50K 31 23 Local gov Assoc academic Never married Male <=50K Source: Machine Learning Repository of UCI
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1.3 Models and Patterns Model structure global summary of a data set Model can make statement about any point in the full measurement space: if we consider the rows as p- dimensional vectors Ex: Diagnosis based on test results Pattern structure makes statements only about restricted regions of the space spanned by the variables Ex: mail order purchases may reveal that people who buy certain combinations of products are likely to buy others
Background image of page 4
1.4 DM Objectives Model Building A Exploratory Data Analysis B Descriptive Modeling C Predictive Modeling Pattern Recognition D Discovering Pattern and Rules E Retrieval by Content
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
A. Exploratory Data Analysis (Chapter 3) Goal: Explore data w/o clear ideas of what we are looking for Techniques: For p>3, projection techniques useful and necessary Principal components Spatial displays
Background image of page 6
B. Descriptive Modeling (Chapter 9) Goal: Describe all of the data (or the process generating it) Ex: Models for the overall probability distribution of the data (density estimation) Partitioning of the p-dim. Measurement space into Models describing the relationship between variables (dependency modeling)
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
C. Predictive Modeling (Chapter 10
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/23/2009 for the course ORIE 474 at Cornell.

Page1 / 23

class_08_27 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online