class_10_24 - Statistical Data Mining ORIE 474 Fall 2007...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Data Mining ORIE 474 Fall 2007 Tatiyana Apanasovich 10/24/07 Classification Modeling
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
10.1 Predictive Modeling Aims to predict the unknown value of a variable of interest based on the known values of other variables Learn a mapping input X nxp scalar output Y Supervised Learning Partition data set {(x(i),y(i)) : i=1,…,n} into Training data set D train = {(x(i),y(i)) : i=1,…,m} Validation data set D validation = {(x(i),y(i)) : i=m+1, …,n}
Background image of page 2
Predictive Modeling (cont’d) From the training data, estimate a mapping/function f s.t. y = f(x;θ) where f is the functional form of the model structure θ is a vector of model parameters that have to be estimated Input by data miner Model structure(s) Score function Search method
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Predictive Modeling (cont’d) Score function where d( , ) is a distance measure Search method Minimize S as a function of θ on the training data set To compare different predictive models (e.g. f’s), evaluate the S f (θ*) at the optimal θ* for each model structure f on validation data set Choose f* such that S f (θ*) is minimal )) ); ( ( ), ( ( )) ( ˆ ), ( ( ) ( θ i x f i y d i y i y d S train train D D = = ) ); ( ( ) ( ˆ i x f i y =
Background image of page 4
10.2 Classification Modeling Target variable Y is categorical , and is often called the class variable Notation Instead of Y we will use C , taking values is {c 1 , …,c m } Input variables X 1 ,…,X p x(i) = (x 1 (i),…,x p (i)) T input vector for i th object Concepts Discriminative viewpoint (decision boundaries) Probabilistic viewpoint
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Ex: Red blood cell data Source:http://www.ics.uci.edu/ ~smyth/courses/ics274/ 182 individuals healthy iron deficient anemia
Background image of page 6
Discriminative Classification: Ex
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Probabilistic Classification: Ex 0.01 0.99 0.50
Background image of page 8
A. Discriminative Classification f: x(i) {c 1 ,…,c m } If m=2, f produces piecewise constant surface over the {X 1 ,X 2 } plane C X 2 X 1
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/23/2009 for the course ORIE 474 at Cornell University (Engineering School).

Page1 / 24

class_10_24 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online