{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture02 - Data Mining Principles and Algorithms Jianyong...

Info icon This preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 1 Data Mining: Principles and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University [email protected]
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 2 Course Administrivia Homepage of this course - The link is http://dbgroup.cs.tsinghua.edu.cn/wangjy/DM/DataMining.html - Tentative class schedule » http://dbgroup.cs.tsinghua.edu.cn/wangjy/DM/ClassSchedule.htm » Slides can be downloaded from “ 网络学堂 - Course evaluation » http://dbgroup.cs.tsinghua.edu.cn/wangjy/DM/CourseEvaluation.htm
Image of page 2
October 9, 2009 Data Mining: Principles and Algorithms 3 Chapter 2: Data Preprocessing What is data? <== Why preprocess the data? Data summarization Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 4 What is Data? Collection of data objects and their attributes Attribute is a property or characteristic of an object - Examples: eye color of a person, temperature, etc. - Attribute is also known as variable, field, characteristic, feature, or observation A collection of attributes describe an object - Object is also known as record, point, case, sample, entity, or instance Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Attributes Objects
Image of page 4
October 9, 2009 Data Mining: Principles and Algorithms 5 Attribute Values Attribute values are numbers or symbols assigned to an attribute Distinction between attributes and attribute values - Same attribute can be mapped to different attribute values » Example: height can be measured in feet or meters - Different attributes can be mapped to the same set of values » Example: Attribute values for ID and age are integers » But properties of attribute values can be different ID has no limit but age has a maximum and minimum value
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 6 Types of Attributes There are different types of attributes - Nominal » Examples: ID numbers, eye color, zip codes - Ordinal » Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in {tall, medium, short} - Interval » Examples: calendar dates, temperatures in Celsius or Fahrenheit. - Ratio » Examples: length, time, counts - How to determine the attributes of some complex data like graphs?
Image of page 6