ECON STATISTICS Class 1: EXPLORATORY DATA ANALYSIS (EDA) ----- John Tukey The median of a data set is found by putting in order the numbers in numerical order and finding: -the middle number if n is odd. -the average of the two middle numbers if n is even The median is called the 50 th percentile Tukey states that the median divides the data set into a lower half and a higher half. If n is odd, Tukey considers the median to belong in both the lower half and the upper half. Tukey’s lower hinge is the median of the lower half and the upper hinge is the median of the upper half. (Lower hinge is equal to 25 th percentile or Q1, which is the first quartile) (Upper hinge is equal to 75 th percentile or Q3, which is the third quartile) Tukey’s h-spread is equal to the upper hinge minus the lower hinge which is approximately equal to the inter-quartile range which is Q3-Q1. A number in the data set is called a Tukey outlier if either it is greater than the upper inner fence or less than the lower inner fence. Why is it important for statisticians to find the outliers? Because they show extremes of data sets and when verified, you sometimes realize they were

