### Chapter 2 Notes

Course: ISDS 2000, Summer 2009
School: LSU
2: II. Chapter Description Statistics: Tabular and Graphical Presentations Objectives: 1. Recognize which summaries are used for Quantitative or for Qualitative Data. 2. Construct a frequency table, bar graph and pie chart for qualitative data. 3. Convert raw data into a data array. 4. Construct frequency table, relative and cumulative frequency tables, and histogram for quantitative data. 5. Construct a stem-and-leaf display to represent quantitative data.

2: II. Chapter Description Statistics: Tabular and Graphical Presentations Objectives: 1. Recognize which summaries are used for Quantitative or for Qualitative Data. 2. Construct a frequency table, bar graph and pie chart for qualitative data. 3. Convert raw data into a data array. 4. Construct frequency table, relative and cumulative frequency tables, and histogram for quantitative data. 5. Construct a stem-and-leaf display to represent quantitative data. A. Introduction: Data are usually collected, entered, and saved into some form of database. In this form, trends and characteristics are not easily detectable as there can sometimes be millions of pieces of data. We want to summarize/reduce the data to a form which is more easily interpreted and which will aid in decisionmaking. Many summaries are found in newspapers, magazines, internet, annual reports, and research studies; therefore, it is important for you to understand how these summaries are constructed. B. Summarizing Qualitative Data (Sect 2.1) 1. Frequency Distribution - a tabular summary of a data showing the frequency of items in each of several distinct categories. Example: I recorded the number of students in each of the following academic majors and wanted to summarize: MAJOR ISDS FIN MKT ACCT PBADM TOTAL FREQ 24 9 15 7 40 95 RELATIVE FREQ (%freq) 0.253 (25.3) 0.095 (9.5) 0.158 (15.8) 0.074 (7.4) 0.421 (4.21) 1.001* (100.1) 2. Bar Graph graphical representation of data where each category is depicted by a bar representing the frequency or proportion of observations falling into a category. (Note: bars do not touch) ISDS 2000 - FALL 2001 45 40 35 30 25 20 15 10 5 0 ISDS FIN MKT ACADEMIC MAJORS ACCT PBADM 9 24 15 7 40 3. Pie Chart a graphical representation of data where slices of the pie, represented by degrees, are associated with the frequency or proportion of observations falling into a category. ISDS 2000 - FALL 2001 ACADEMIC MAJORS 25% 43% ISDS FIN MKT ACCT 9% PBADM 7% 16% For directions on how to create a Summary Table using Excel: On Bb, go to Course Docs, Excel Procedures. Click on Qualitative Variable CREATING A FREQUENCY TABLE. Click on Save As to save to hard drive. Open that file and run the given steps. For directions on how to create either a Bar Graph or Pie Chart using Excel: On Bb, go to Course Docs, Excel Procedures. Click on Qualitative Variable CREATING A BAR GRAPH OR PIE CHART. Click on Save As to save to hard drive. Open that file and run the given steps. C. Summarizing Quantitative Data (Sect 2.2) 1. Data Array (sometimes referred to as an ordered array) the listing resulting from placing raw data in rank order from the smallest to the largest observation. Example: Suppose you are provided with a data set containing the time in days required to complete year-end audits for a sample of 20 clients of a particular accounting firm: Year-End Audit Time (days) 12 14 19 18 15 15 18 17 20 27 22 23 22 21 33 28 14 18 16 13 Data Array: 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 (Immediately you can see min=12, max=33, range=21, 18 occurs most often) For directions on how to get a data array using Excel: On Bb, go to Course Docs, Excel Procedures. Click on Quantitative Data CREATING DATA ARRAY. Click on Save As to save to hard drive. Open that file and run the given steps to get data array. 2. Frequency Distribution sometimes we may prefer to arrange data into categories or class groups so that interpretation is more manageable; however, the original observations are lost in the grouping process. Definition A Frequency Distribution is a tabular display of data showing the frequency of observations in each of the defined categories (or classes). Creating a Frequency Select Distribution: a. Number of Classes usually 5 to 15 classes. (Larger data sets require more classes, smaller data sets require less classes; this is a very subjective decision should try to avoid the pancake (wide/flat) and skyscraper (tall/thin) effect) (In this example, lets use 5 classes for summarizing) b. Width of Class (approx) Width = Range 33 12 = = 4.2 NumberOfClasses 5 We will round up to 5 as that value is commonly used and is easily read. (Note: each category has the same width) c. Class Limits the boundaries for each class; These are very subjective, must be defined so that all observations are included. (Note: we must include the smallest value; however, instead of using 12 to begin the class definitions, we begin with 10 in order to facilitate the ease in interpretation) Frequency Distribution for Audit Time Data Audit Time (Days) Frequency 4 10 - under 15 8 15 - under 20 5 20 - under 25 2 25 - under 30 1 30 - under 35 20 d. Class Midpointhalfway point between the class boundaries. 3. Relative Frequency Distribution a tabular summary of a set of data showing the proportion of observations in each of the defined categories. Relative Frequency = Frequency n Relative Frequency Distribution Audit Time Data Relative Audit Frequency Time (Proportion) (Days) 0.20 10 - under 15 0.40 15 - under 20 0.25 20 - under 25 0.10 25 - under 30 0.05 30 - under 35 1.00 (Useful when comparing different data sets of different sizes) 4. Cumulative Distribution a tabular summary of a set of data that accumulates information from class to class. This type of tabular summary can be constructed from frequency and relative frequency distributions. Cumulative Distribution - Audit Time Data Audit Time (Days) Under 15 Under 20 Under 25 Under 30 Under 35 Cumulative Relative Frequency 0.20 0.60 0.85 0.95 1.00 Frequency 4 12 17 19 20 5. Histogram a vertical bar chart in which the rectangular bars are constructed at the boundaries of each class. a. Horizontal Axis represents the values of the random variable (in this case, the time of audit in days) b. Vertical Axis represents frequencies or proportions; the height of the bar represents the quantity of the random variable for that particular class) Histogram 10 Frequency 8 6 4 2 0 10 15 20 25 30 35 X = # of Audit Days 6. Ogive represents the cumulative frequency polygon. Histogram 10 8 6 4 2 0 10 15 20 25 30 35 More # Audit Days 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% .00% For directions on how to create a Frequency Distribution, Histogram, and/or Ogive using Excel: On Bb, go to Course Docs, Frequency Excel Procedures. Click on Quantitative Variable - CREATING A FREQ DISTRIBUTION AND HISTOGRAM. Click on Save As to save to hard drive. Open that file and run the steps provided. D. Exploratory Data Analysis: Stem-and-Leaf Display quick approach to get ordered array and shape of data distribution. (Not in EXCEL) 1. Stem-and-Leaf Display separates data into stems (leading digits) and leaves (or trailing digits). 2. Example: Student John Number of Credit Hours 30 Leaves 0 Carol Fred Molly Robert Barbara Jill 43 66 31 78 44 38 3 6 1 8 4 8 Note: Right-most digits are leaves, remaining numbers are stems. Stem-and-Leaf Display: 3|018 4|34 5| 6|6 7|8 3. Characteristics of Stem-and-Leaf a. most effective for relatively small data sets b. can use to determine minimum, maximum, range, mode (this info is usually lost when we utilize frequency tables). c. gives an idea of how the individual values are distributed across the range of the data. d. Retains all data - each observation remains distinctly identifiable. 4. Other data values (ex:1475) 1. 147|5 2. round to nearest 10s 1480 . 14|8 3. round to nearest 100s 1500 . 1|5
