{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Topic2-EDAViz - DataVisualization Chapter2 credits...

Info icon This preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining 2011 - Volinsky - Columbia University Exploratory Data Analysis and  Data Visualization Chapter 2 credits: Interactive and Dyamic Graphics for Data Analysis: Cook and Swayne Padhraic Smyth’s UCI lecture notes R Graphics: Paul Murrell Graphics of Large Datasets: Visualizing a Milion: Unwin, Theus and Hofmann 1
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Data Mining 2011 - Volinsky - Columbia University Outline • EDA • Visualization – One variable – Two variables – More than two variables – Other types of data – Dimension reduction 2
Image of page 2
Data Mining 2011 - Volinsky - Columbia University EDA and Visualization Exploratory Data Analysis (EDA) and Visualization are  important (necessary?) steps in any analysis task.  get to know your data! – distributions (symmetric, normal, skewed) – data quality problems – outliers – correlations and inter-relationships – subsets of interest – suggest functional relationships Sometimes EDA or viz might be the goal! 3
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Data Mining 2011 - Volinsky - Columbia University 4 flowingdata.com 9/9/11 flowingdata.com 9/9/11
Image of page 4
Data Mining 2011 - Volinsky - Columbia University 5 NYTimes 7/26/11 NYTimes 7/26/11
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Data Mining 2011 - Volinsky - Columbia University Exploratory Data Analysis (EDA) Goal:  get a general sense of the data   means, medians, quantiles, histograms, boxplots You should always look at every variable - you will learn  something! data-driven (model-free) Think interactive and visual Humans are the best pattern recognizers  You can use more than 2 dimensions! x,y,z, space, color, time…. especially useful in early stages of data mining detect outliers     (e.g. assess data quality) test assumptions (e.g. normal distributions or skewed?) identify useful raw data & transforms (e.g. log(x)) Bottom line: it is always well worth looking at your data! 6
Image of page 6
Data Mining 2011 - Volinsky - Columbia University Summary Statistics not  visual sample statistics of data X    mean:    μ  =  i  X / n                     – mode: most common value in X median:  X =sort(X), median =  X n/2  (half below, half above) quartiles of sorted  X : Q1 value =  X 0.25n  , Q3 value =  X 0.75 n   interquartile range:   value(Q3) - value(Q1) range:                       max(X) - min(X)  =   X n  -  X 1 variance:  σ 2   i  (X μ ) / n   skewness:  i  (X μ ) 3   /  [ ( i  (X μ ) 2 ) 3/2  zero if symmetric; right-skewed more common (what kind of data is  right skewed?) – number of distinct values for a variable (see unique() in R) – Don’t need to report all of thses:  Bottom line…do these numbers  make sense???
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern