class07

class07 - August 29, 2011 Data Mining: Concepts and...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: August 29, 2011 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 6 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights reserved August 29, 2011 Data Mining: Concepts and Techniques 2 Chapter 6. Classification and Prediction What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by back propagation Support Vector Machines (SVM) Associative classification Lazy learners (or learning from your neighbors) Other classification methods Prediction Accuracy and error measures Ensemble methods Model selection August 29, 2011 Data Mining: Concepts and Techniques 3 Classification predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values ( class labels ) in a classifying attribute and uses it in classifying new data Prediction models continuous-valued functions, i.e., predicts unknown or missing values Typical applications Credit approval Target marketing Medical diagnosis Fraud detection Classification vs. Prediction August 29, 2011 Data Mining: Concepts and Techniques 4 ClassificationA Two-Step Process Model construction : describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematical formulae Model usage : for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known August 29, 2011 Data Mining: Concepts and Techniques 5 Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classification Algorithms IF rank = professor OR years > 6 THEN tenured = yes Classifier (Model) August 29, 2011 Data Mining: Concepts and Techniques...
View Full Document

Page1 / 121

class07 - August 29, 2011 Data Mining: Concepts and...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online