Stats 202 - Lecture 7

# Stats 202 - Lecture 7 - Statistics 202 Statistical Aspects...

This preview shows pages 1–9. Sign up to view the full content.

1 Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 7 = Start Chapter 4 Agenda: 1) Assign Homework 3 2) Start lecturing over Chapter 4

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter 4: Classification: Basic Concepts, Decision Trees, and Model Evaluation
3 Illustration of the Classification Task: Apply Model Induction Deduction Learn Model Model 7OG Attrib1 Attrib2 Attrib3 Class Yes Large !2\$5K No !2 No Medium °°K No "3 No Small &7°K No #4 Yes Medium !2°K No \$5 No Large (9\$5K Yes %6 No Medium %6°K No &7 Yes Large !2!2°K No '8 No Small '8\$5K Yes (9 No Medium &7\$5K No ° No Small (9°K Yes 7OG Attrib1 Attrib2 Attrib3 Class No Small \$5\$5K ? !2 Yes Medium '8°K ? "3 Yes Large °K ? #4 No Small (9\$5K ? \$5 No Large %6&7K ? Test Set Learning algorithm Training Set Learning Algorithm Model

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Classification: Definition ± Given a collection of records ( ZYXWDOTOTJ YXHZY ) Each record contains a set of DZYZYXWOE[Z[ZYHYX ±^]^° , with one additional attribute which is the FRDYXYX ±_^_° . ± Find a SUGHR to VUXWHGOFZY the class as a function of the values of other attributes. ± Goal: previously unseen records should be assigned a class as accurately as possible. A ZYHYXZY YXHZY is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
5 Classification Examples ± Classifying credit card transactions as legitimate or fraudulent ± Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil ± Categorizing news stories as finance, weather, entertainment, sports, etc ± Predicting tumor cells as benign or malignant

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Classification Techniques ± There are many techniques/algorithms for carrying out classification ± In this chapter we will study only GHFOYXOUT ZYXWHHYX ± In Chapter 5 we will study other techniques, including some very modern and effective techniques
7 An Example of a Decision Tree G\W Refund Marital Status Taxable Income Cheat Yes Single !2\$5K No !2 No Married °°K No "3 No Single &7°K No #4 Yes Married !2°K No \$5 No Divorced (9\$5K Yes %6 No Married %6°K No &7 Yes Divorced !2!2°K No '8 No Single '8\$5K Yes (9 No Married &7\$5K No ° No Single (9°K Yes categorical categorical continuous class Refund MarSt TaxInc YES NO NO NO Yes No Married Single± Divorced < '8°K > '8°K 6VUROZYZYOTJ <ZYZYXWOE[Z[ZYHYX Training Data Model: Decision Tree

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
8 Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO NO NO Yes No Married Single± Divorced < '8°K > '8°K Refund Marital Status Taxable Income Cheat No Married '8°K ?
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern