chap3_data_exploration1 - Data Mining Exploring Data...

Info icon This preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Topics Exploratory Data Analysis Summary Statistics Visualization
Image of page 2
What is data exploration? Key motivations of data exploration include - Helping to select the right tool for preprocessing or analysis - Making use of humans’ abilities to recognize patterns People can recognize patterns not captured by data analysis tools Related to the area of Exploratory Data Analysis (EDA) - Created by statistician John Tukey - Seminal book is Exploratory Data Analysis by Tukey - A nice online introduction can be found in Chapter 1 of the NIST Engineering Statistics Handbook A preliminary exploration of the data to better understand its characteristics.
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Iris Sample Data Set Many of the exploratory data techniques are illustrated with the Iris Plant data set. - Can be obtained from the UCI Machine Learning Repository - From the statistician Douglas Fisher - Three flower types (classes): Setosa Virginica Versicolour - Four (non-class) attributes Sepal width and length Petal width and length Virginica. Robert H. Mohlenbrock. USDA NRCS. 1995. Northeast wetland flora: Field office guide to plant species. Northeast National Technical Center, Chester, PA. Courtesy of USDA NRCS Wetland Science Institute.
Image of page 4
Topics Exploratory Data Analysis Summary Statistics Visualization
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Summary Statistics Summary statistics are numbers that summarize properties of the data - Summarized properties include frequency, location and spread Examples: location - mean spread - standard deviation - Most summary statistics can be calculated in a single pass through the data
Image of page 6
Frequency and Mode The frequency of an attribute value is the percentage of time the value occurs in the data set - For example, given the attribute ‘gender’ and a representative population of people, the gender ‘female’ occurs about 50% of the time. The mode of an attribute is the most frequent attribute value The notions of frequency and mode are typically used with categorical data
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Percentiles For continuous data, the notion of a percentile is more useful.
Image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern