chap4_classification_sh

chap4_classification_sh - Data Mining Classification: Basic...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Edited for STATS202, Stanford University, Fall 2010 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Classification: Definition z Given a collection of records ( training set ) Each record contains a set of attribute s/variables , one of the attributes is the class . z Find a model for class attribute as a function of the values of other attributes. z Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
Background image of page 2
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Illustrating Classification Task Apply Model Learn Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Classification Task z Predicting tumor cells as benign or malignant z Classifying credit card transactions as legitimate or fraudulent z Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil z Categorizing news stories as finance, weather, entertainment, sports, etc
Background image of page 4
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Classification Techniques z Decision Tree based Methods z Rule-based Methods z Memory based reasoning z Neural Networks z Naïve Bayes and Bayesian Belief Networks z Support Vector Machines
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 Married 100K 3 Single 70K 4 Yes Married 120K 5 Divorced 95K Yes 6 Married 60K 7 Yes Divorced 220K 8 Single 85K 9 Married 75K 10 Single 90K c a t e g o ri l nt i n u s Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree
Background image of page 6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Another Example of Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 Married 100K 3 Single 70K 4 Yes Married 120K 5 Divorced 95K Yes 6 Married 60K 7 Yes Divorced 220K 8 Single 85K 9 Married 75K 10 Single 90K c a t e g o ri l nt i n u s MarSt Refund TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data!
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Decision Tree Classification Task Apply Model Learn Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/29/2011 for the course STAT 202 at Stanford.

Page1 / 103

chap4_classification_sh - Data Mining Classification: Basic...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online