Chapter 2 – Organizing and Summarizing Data Definition : When data are in their original form, as collected, they are called raw data . We want to be able to visualize the characteristics of a data set; hence we construct graphical representations of the data. In order to do so, we must look at the frequency of occurrence of data values. Definition : A categorical frequency distribution , used for categorical (qualitative) data, is a table listing the categories, together with the frequency of occurrence of each category in the observed data. Definition : The frequency for a category is the number of data values falling in that category. The relative frequency for a category is the fraction, proportion, or percentage of the data values that fall within that category. Example : The following table shows data on class rank of students receiving financial aid at a small 4- year college. College Class Rank Frequency Relative Frequency Fr 18 18/40 = 0.45 = 45% So 12 12/40 = 0.30 = 30% Jr 6 6/40 = 0.15 = 15% Sr 4 4/40 = 0.10 = 10% Often, when the data are numeric, there are too many different data values for a listing of the raw data to be of use in seeing the characteristics of the data. It is common to divide the interval of values of the data into a relatively small number of subintervals, called classes , and to tabulate the data using the frequencies . Each frequency is the number of occurrences of data values in one of the classes. Definition : A grouped frequency distribution is the organizing of raw data in table form, using classes and frequencies. Definition : The largest data value that can be included in a class is the upper class limit for that class; the smallest data value that can be included is the lower class limit . Definition : The class width is the difference between the upper class limit of one class and the upper class limit of the next-higher class. Definition : The cumulative frequency for a class is the count of all observed data values in that class or in lower classes. Rules for constructing a frequency distribution: 1) The number of classes should be between 5 and 20; 5 for small data sets, 20 for large data sets. “Small” means roughly 25 to 30 observations; “large” means around 1000 or more observations. 2) An observed data value must be in one, and only one, class. This means that the classes must be non-overlapping, or mutually exclusive. 3) The classes must be continuous; even if there are no observed data values in a given class, that class must be included, with a frequency value of 0. 4) The classes must be exhaustive; i.e., together they must include all of the data. 5) The classes must be equal in width.

Procedure for constructing a grouped frequency distribution: 1) Find the range by subtracting the lowest value of the data from the highest. 2) Select the number of classes desired (between 5 and 20).
