Lecture02 - Data Mining Principles and Algorithms Jianyong Wang Database Lab Institute of Software Department of Computer Science and Technology

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 1 Data Mining: Principles and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University [email protected]
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 2 Course Administrivia Homepage of this course - The link is http://dbgroup.cs.tsinghua.edu.cn/wangjy/DM/DataMining.html - Tentative class schedule » http://dbgroup.cs.tsinghua.edu.cn/wangjy/DM/ClassSchedule.htm » Slides can be downloaded from “ 网络学堂 - Course evaluation » http://dbgroup.cs.tsinghua.edu.cn/wangjy/DM/CourseEvaluation.htm
Background image of page 2
October 9, 2009 Data Mining: Principles and Algorithms 3 Chapter 2: Data Preprocessing What is data? <== Why preprocess the data? Data summarization Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 4 What is Data? Collection of data objects and their attributes Attribute is a property or characteristic of an object - Examples: eye color of a person, temperature, etc. - Attribute is also known as variable, field, characteristic, feature, or observation A collection of attributes describe an object - Object is also known as record, point, case, sample, entity, or instance Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Attributes Objects
Background image of page 4
October 9, 2009 Data Mining: Principles and Algorithms 5 Attribute Values Attribute values are numbers or symbols assigned to an attribute Distinction between attributes and attribute values - Same attribute can be mapped to different attribute values » Example: height can be measured in feet or meters - Different attributes can be mapped to the same set of values » Example: Attribute values for ID and age are integers » But properties of attribute values can be different ID has no limit but age has a maximum and minimum value
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 9, 2009 Data Mining: Principles and Algorithms 6 Types of Attributes There are different types of attributes - Nominal » Examples: ID numbers, eye color, zip codes - Ordinal » Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in {tall, medium, short} - Interval » Examples: calendar dates, temperatures in Celsius or Fahrenheit. - Ratio » Examples: length, time, counts - How to determine the attributes of some complex data like graphs?
Background image of page 6
Data Mining: Principles and Algorithms 7 Types of Attributes There are different types of attributes - Nominal Ordinal Interval Ratio - How to determine the attributes of some complex data like sequences and graphs? DNA
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/02/2010 for the course COMPUTER DM2009F taught by Professor Wangwei during the Fall '09 term at Tsinghua University.

Page1 / 47

Lecture02 - Data Mining Principles and Algorithms Jianyong Wang Database Lab Institute of Software Department of Computer Science and Technology

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online