Sept. 23, 2010 LEC #1 ECON 240A- 1 L. Phillips Exploratory Data Analysis I. I. Introduction At the beginning of the course we will study three branches of statistics: (1) data analysis, (2) probability, and (3) statistical inference. Data analysis is the gathering, display and summary of data. We will use visual devices and quantitative measures to accomplish these tasks. Probability has its origins in gambling and the laws of chance. This topic is interesting in its own right but we will also use probability as a means to better understand the binomial distribution, the central limit theorem, and the relationship between the binomial distribution and the normal distribution. II. Data Description One use of statistics is to describe data with summary measures. Two notions are central tendency and dispersion. There are several measures of central tendency. An intuitive and relative easy measure to use is the mode, i.e. the data value that is observed most frequently. Of course one issue is what if the data has two or three modes and has multiple peaks. Another measure of central tendency is the median. The data can be sorted and ordered from the highest value to the lowest, and the data point in the middle is the median, with one half of the data values above and one half of the data values below. Another measure of central tendency requiring some arithmetic is the sample mean of the data. Add up all the data values and divide by the number of observations or data points. III. Exploratory Data Analysis

Sept. 23, 2010 LEC #1 ECON 240A- 2 L. Phillips Exploratory Data Analysis John Tukey developed exploratory data analysis to visually describe the characteristics of data. Two visual tools useful for this purpose are the stem and leaf diagram and the box and whiskers plot. An example of the methodology of the stem and leaf plot is its application to
ECON 240a taught by Professor Staff during the Spring '08 term at UCSB.

