Statistics Primer Numerical Data Analysis Covariance / Correlation Errors Visualizing

Lesson Objectives u Just enough Statistics for Data Science Copyright © 2017 Elephant Scale. All rights reserved. 2
Data Types Numerical Data Analysis Covariance / Correlation Errors Visualizing

Data Types Type Description Example Continuous Data can take any value within an interval. Numeric, float, int Exam score : 0 - 100 Discrete Only integer values Clicks per day Categorical Specific values from a set. Enums, factors Colors : Red, White, Blue. States : AL, CA Binary Just two values, binary. 0/1 or true/false Transaction fraud or not Ordinal Categorical data, but with ordering. Grades: A, B, C, D A > B > C > D Copyright © 2017 Elephant Scale. All rights reserved. 4
Structured Data Dataframe Spreadsheet like data. Feature Column in the table. Attribute / input / predictor / variable Outcome Predicted. Dependent variable / response / target / output Records A row in the dataframe Copyright © 2017 Elephant Scale. All rights reserved. 5 Income Assets Approved? Application 1 Application 2 Row Feature Feature Outcome Dataframe

Numerical Data Analysis è Numerical Data Analysis Covariance / Correlation Errors Visualizing
Numeric Data Analysis u Analyze the following salary data. [30k, 35k, 22k, 70k, 50k, 55k, 45k, 40k, 25k, 42k, 60k, 65k] u Sorting the data [22k, 25k, 30k, 35k, 40k, 42k, 45k, 50k, 55k, 60k, 65k, 70k] u Min : 22k Max : 70k u Average / Mean = Total sum of all salaries / 12 (number of salaries) = 44.9k Copyright © 2017 Elephant Scale. All rights reserved. 7

Mean Mean Sum (values) / total number of samples Weighted Mean Sum (values * weights) / total number of samples Copyright © 2017 Elephant Scale. All rights reserved.
