chap02_new - Data Preprocessing Chapter 2 Data Mining...

Info icon This preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 1 Data Preprocessing — Chapter 2 —
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 2 Data Preprocessing Step in KDD Process Relevant Data Data Preprocessing Data Mining Evaluation/Interpretation Pattern Knowledge Databases
Image of page 2
二〇一七年五月三十一日 Introduction to Data Mining 3 Main Steps of a KDD Process (Fully) Domain knowledge Acquisition Learning relevant prior knowledge and goals of application Data collection and preprocessing (may take 60% of effort!) Data selection and integration : creating a target data set Data cleaning, data transformation, and data reduction Data mining Choosing functions of data mining association, classification, clustering, regression, summarization. Choosing the mining algorithm(s) Searching for patterns of interest Pattern evaluation and knowledge presentation visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 4 Chapter 2: Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary
Image of page 4
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 5 Why Data Preprocessing? Data in the real world is dirty incomplete : lacking attribute values, lacking certain attributes of interest, or containing only aggregate data noisy : containing errors or outliers inconsistent : containing discrepancies in codes or names No quality data, no quality mining results! Quality decisions must be based on quality data Data quality is important to data mining
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 6 Multi-Dimensional Measure of Data Quality A well-accepted multidimensional view: (人、事、時、地、物 ) Accuracy Completeness Consistency Timeliness Believability Value added Interpretability Accessibility
Image of page 6
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 7 Major Tasks in Data Preprocessing Data cleaning Process missing values and noisy data Data integration Integrating multiple records, files, tables, DBs, or DBMSs’ DBs Data transformation Normalization and aggregation Data reduction Reducing data representation in volume but produces the same or similar analytical results ( especially for Cloud computing ) Data discretization Part of data reduction. Reducing the number of attribute values ( especially for numerical attributes )
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 8 Forms of data preprocessing
Image of page 8
二〇一七年五月三十一日 Data Mining: Concepts and Techniques 9 Chapter 2: Data Preprocessing Why preprocess the data?
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern