1 © Sven Thommesen 2011 Chapter 3: Descriptive statistics: numbers [Edited 01/30/08] This chapter discusses different ways of illustrating or representing a given data set by the use of various summary numbers that we compute. Why? Because if you have a large dataset, just staring at the raw data may not tell you all that much! You can’t see the forest for the trees, so to speak. The summary numbers we compute are themselves called “statistics”. So to get ahead of ourselves, the average value for a variable in a set of data is one such statistic. By itself, this statistic tells us something about the data set. (That’s why this chapter is about “descriptive statistics.”) Recall from our discussion in chapter 1 the distinction between a population and a sample. The terminology is: numbers that we calculate from a sample data set are called sample statistics . The corresponding numbers calculated from a population data set (a census) are referred to as population parameters . We use the sample statistics to “predict” what the corresponding population parameter is for the population from which the sample was selected. Thus we say: a sample statistic is a predictor for the corresponding population parameter . For example, a sample mean can be used as a predictor for the population mean. NOTE: for this chapter, you need to be comfortable with the summation-sign notation. If you are not, review Appendix C in the textbook or my Appendix to these chapter notes. These notes present the same material as the textbook chapter, but in a different order.

2 For purposes of discussing the concepts in this chapter, let us use the following little data set: i X i Y i -- --- --- 1 7 -2 2 3 -3 3 5 0 4 10 2 5 4 3 The number “i” is called the “index”. It is used to p oint to a specific row or observation in the data set, counting from the beginning. A note about the ORDERING OF THE DATA: Some of the statistics we will look at in this chapter require that the data set be SORTED first, i.e. ordered from the smallest to the largest data value. For others we do not care. If we sort the data set above by the X variable, we get: 1 3 -3 2 4 3 3 5 0 4 7 -2 5 10 2 If we sort it by the Y variable, we get: 1 3 -3 2 7 -2 3 5 0 4 10 2 5 4 3 A couple of points to note: a) the X and Y values for a given observation stay together, even if we sort the data according to the values in one of them; b) the INDEX we use to number the observation does not “follow the data” when we do a sort: i=1 refers to the first record , no matter how the data have been sorted. [Important for how we use summation signs.]
3 MAXIMUM AND MINIMUM To begin with, we may be interested in the smallest and the largest data values for a given variable in our data set: X MIN is the smallest data value, and X MAX is the largest data value, respectively.

