Unformatted text preview: BIT 2405 ‐ Class 2 Notes: Objectives: I. Statistical Vocabulary II. Classification and Presentation of Data I. Statistical Vocabulary Definition of Statistics: Key Point We need to consider variability of data when describing a data set. Statistics finds extensive application in every business discipline. A few examples: Finance: Agriculture Economics: Management: Hospitality and Tourism Management: Accounting: International Affairs: Page 1 of 19 Horticulture: Public Affairs: Information Technology: Sports: Data: The facts and figures collected, analyzed, summarized for presentation and interpretation Data Set: All the data collected in a particular study Element: The entities on which data are collected Experimental Study: Observational Study: Quality Control Example: Suppose that the design specification for the length of a connecting pin used in a particular type of transmission is 50 ± 1mm with the target (optimal value) being 50mm. 100 pins from the last production shift are randomly selected and measured. 10 of these measurements are shown below for illustrative purposes. X1 = 50.9847 X2 = 51.6542 X3 = 51.4015 X4 = 51.1069 X5 = 49.0515 X6 = 50.2169 X7 = 51.6010 X8 = 51.6651 X9 = 50.5420 X10 = 50.8872 Page 2 of 19 Question: How is the process performing? We might begin by Frequency distribution Definition: The following is a graphical presentation of a frequency distribution called a Frequency Distribution of Pin Lengths 14
14
12 11 Frequency 10 9 11 9 9 9
8 8
6
4
4 3 3 3 2
2 1 0 47 48 49 50 51 52 53 54 Pin Lengths Here we have summarized the data graphically. Often there is a need to summarize the data numerically: Variable N Mean Median StDev Min Max Length 100 50.041 50.026 1.15 47.536 53.188 Page 3 of 19 The above summaries are examples of descriptive statistics. Based on this information, we would most likely conclude that the process is performing ….. how ? The Discipline of Statistics Statistics Descriptive
Statistics Graphs Inferential
Statistics Measures of
Center, Spread,
and Position Estimation Hypothesis
Testing Our study of statistics will be partitioned into two major categories: 1. Descriptive Statistics – numerical and graphical procedures to examine a data set, to summarize the information contained in the data set and to present the information in a meaningful manner. 2. Inferential Statistics – use of sample data to make estimates, decisions, and other generalizations about the population from which the sample is taken. Some Statistical Terms: 1. Population: Page 4 of 19 2. Variable 3. Observation 4. Sample 5. Parameter 6. Statistic Graphical Representation: II. Classification and Presentation of Data 1. Classification of Data (Variables) a. Nominal/Ordinal/Interval/Ratio b. Qualitative/Quantitative Page 5 of 19 2. Presentation of Data: a. Graphical Presentation of Quantitative Information i. Frequency Distribution Tables ii. Histograms 1. Absolute Frequency Histogram 2. Relative Frequency Histogram 3. Cumulative Frequency Histogram iii. Stem‐and‐Leaf Diagrams iv. Crosstabulations v. Scatter Diagrams b. Graphical Presentation of Qualitative Information i. Frequency Distribution Tables ii. Bar Charts iii. Pie Charts Classification of Data What we can do with a data set (e.g., summarize, present, make inferences) depends on the type of variable we are working with. There are two major classifications schemes: 1. Nominal, Ordinal, Interval, Ratio Classification a. Nominal data b. Ordinal data c. Interval data d. Ratio data Page 6 of 19 2. Qualitative, Quantitative Classification a. Qualitative data b. Quantitative data Quantitative data can be further classified as continuous or discrete. Continuous data Discrete data Summary Exercises: 1. A supervisor must give a summary evaluation rating from among the following choices: 1) Poor 2) Fair 3) Good 4) Very Good 5) Excellent Are these data qualitative or quantitative? Qualitative Quantitative Are these data discrete or continuous? Discrete Continuous Neither Page 7 of 19 What is the highest level of measurement the data possess? Nominal Ordinal Interval Ratio 2. A company is evaluating customer satisfaction with one of their products. A survey of 400 persons is conducted. Each person is asked: “What is your level of satisfaction with the company’s products?” 1) Poor 2) Average 3) Good 4) Excellent Are these data qualitative or quantitative? Qualitative Are these data discrete or continuous? Discrete What is the highest level of measurement the data possess? Nominal Ordinal Interval Quantitative Continuous Neither Ratio 3. The weight of 50 newborn babies at a local hospital. Are these data qualitative or quantitative? Qualitative Are these data discrete or continuous? Discrete What is the highest level of measurement the data possess? Nominal Ordinal Interval Quantitative Continuous Neither Ratio 4. You want to order a pizza. There are four kinds of pizza: 1) Pepperoni 2) Mushroom 3) Black Olive 4) Sausage Are these data qualitative or quantitative? Qualitative Are these data discrete or continuous? Discrete What is the highest level of measurement the data possess? Nominal Ordinal Interval Page 8 of 19 Quantitative Continuous Ratio Neither 5. You toss a coin and record “head” as 0 and “tail” as 1. Are these data qualitative or quantitative? Qualitative Are these data discrete or continuous? Discrete What is the highest level of measurement the data possess? Nominal Ordinal Interval Quantitative Continuous Neither Ratio Presentation of Data The alternatives we have for presenting a given set of data is dictated by the data type. We will consider quantitative data presentation method first. Quantitative Data Presentation Methods: Three basic tools: 1. Frequency distribution tables 2. Histograms 3. Stem‐and‐leaf diagrams Example data set: We will be using a subset of a study of out‐of‐state tuition charged by 60 Texas universities and colleges. College or University Tuition Type Setting Praire View A&M U. 2.4 Public Rural U. of Houston 3.4 Public Urban U. of Texas at Arlington 3.5 Public Suburban U. of Texas, San Antonio 3.5 Public Urban Paul Quinn C. 3.6 Private Urban Texas C. 3.6 Private Urban Wiley C. 3.6 Private Urban Jarvis Christian C. 3.8 Private Rural Sul Ross State U. 3.9 Public Rural Texas Women’s U. 3.9 Public Urban U. of Houston‐Downtown 3.9 Public Urban U. of North Texas 3.9 Public Urban U. of Texas‐Pan American 3.9 Public Urban U. of Texas at El Paso 4.1 Public Urban Texas A & I U. 4.4 Public Rural Midwestern State U. 4.5 Public Urban Page 9 of 19 East Texas State U. East Texas Baptist U. Huston‐Tillotson C. 4.6 4.7 4.7 Private Private Private Urban Urban Urban Frequency Distribution Tables What we want to do is display the data by showing the frequency with which the observations making up the data set fall in a set of specified intervals. Consider the out‐of‐state tuition data presented above. The data has been ordered from the smallest to the largest value. 2.4, 3.4, 3.5, 3.5, 3.6, 3.6, 3.6, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 4.1, 4.4, 4.5, 4.6, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 5.0, 5.4, 5.8, 5.8, 5.9, 6.0, 6.4, 6.4, 6.6, 7.0, 7.2, 7.4, 7.7, 7.9, 8.0, 8.0, 8.0, 8.3, 8.3, 8.5, 8.6, 8.8, 10.3, 10.4, 10.7, 11.0, 11.6, 12.0 In constructing a frequency distribution table (and later a histogram) there are three important considerations: 1. 2. 3. Supposed we try using 6 classes then the width of each interval would be something like: Page 10 of 19 One form for our frequency table would then be: Tuition Rates (in $000) Number of Schools 2.0 but less than 4.0 13 4.0 but less than 6.0 24 6.0 but less than 8.0 9 8.0 but less than 10.0 8 10.0 but less than 12.0 5 12.0 but less than 14.0 1 A variation of the above absolute frequency table is to display the relative frequency of observations that fall in the specified intervals rather than absolute frequencies. A relative frequency table has 3 or 4 columns. Its components are described below. Category Frequency Relative Frequency Percent (optional) Total For our data set one possible form for a relative frequency table is as follows: Tuition Rates (in $000) Proportion of Schools 2.0 but less than 4.0 0.217 4.0 but less than 6.0 0.400 6.0 but less than 8.0 0.150 8.0 but less than 10.0 0.133 10.0 but less than 12.0 0.083 12.0 but less than 14.0 0.017 Yet another variation is to display the cumulative frequency distribution; i.e. – display the number of observations that are less than the upper boundary of each class interval. Page 11 of 19 For example, the Data Analysis routine in Excel provides us with the following output: Upper Limit
2
4
6
8
10
12 Frequency
13
24
9
8
5
1 Cumulative %
21.67%
61.67%
76.67%
90.00%
98.33%
100.00% Histograms A Histogram is simply a graphical display of a frequency distribution (table). There are a number of different forms of histograms. We will consider three types of histograms: 1. Absolute frequency histograms 2. Relative frequency histograms 3. Cumulative frequency histograms Similar to constructing a frequency table, we have three major considerations: 1. 2. 3. Absolute frequency histogram: A graphical display of the information found in an absolute frequency table. Page 12 of 19 Distribution of Out‐of‐State Tuition Rates
24 25 Frequency 20 15 13
9 10 8
5 5
1 0
0
0 to < 2 2 to < 4 4 to < 6 6 to < 8 8 to < 10 10 to < 12 12 to < 14 Tuition Dollars (in $1,000) Note: When we examine a frequency distribution (either in tabular or graphical form) we are very much interested in two things: 1. 2. Relative frequency histogram: A graphical display of the information found in a relative frequency table. Page 13 of 19 Distribution of Out‐of‐State Tuition Rates
0.45 0.40 0.40 Percent 0.35
0.30
0.25 0.22 0.20 0.15 0.15 0.13
0.08 0.10
0.05 0.02 0.00 0.00
0 to < 2 2 to < 4 4 to < 6 6 to < 8 8 to < 10 10 to < 12 12 to < 14 Tuition (in $1,000) Cumulative frequency histogram: A graphical display of the cumulative frequency Distribution of Out‐of‐State Tuition Rates
Cumlative Frequency 70
59 60 60 54
46 50
37 40
30
20 13 10
0 0
0 to < 2 2 to < 4 4 to < 6 6 to < 8 8 to < 10 10 to < 12 12 to < 14 Tuition (in $1,000) Common errors in constructing histograms: Examples: Page 14 of 19 Frequency Distribution of Out‐of‐State Tuition Rates
19 20
12 15
10
5 545 001 8
0 32
10 0
1.00 5.00 9.00 13.00 Tuition in thousands of dollars Frequency Distribution of Out‐of‐State Tuition Rates
40 32 30 22 20
10
0
1.00 Tuition in thousands of dollars Frequency Distribution of Out‐of‐State Tuition Rates
30
20 24
13 17
6 10
0
1.00 Tuition in thousands of dollars Page 15 of 19 Stem‐and‐Leaf Diagrams A stem‐and‐leaf diagram is a tool similar to a histogram. Like a histogram, they reflect frequencies, concentrations of data, and shapes. Advantages: Given a set of numbers, generally the first digit or two will be the ‘stem,’ the rest will be the ‘leaf.’ Example: Consider the tuition data for the 60 Texas Schools presented in the ordered array. 2.4, 3.4, 3.5, 3.5, 3.6, 3.6, 3.6, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 4.1, 4.4, 4.5, 4.6, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 5.0, 5.4, 5.8, 5.8, 5.9, 6.0, 6.4, 6.4, 6.6, 7.0, 7.2, 7.4, 7.7, 7.9, 8.0, 8.0, 8.0, 8.3, 8.3, 8.5, 8.6, 8.8, 10.3, 10.4, 10.7, 11.0, 11.6, 12.0 The resulting stem‐and‐leaf diagram is presented below: Stem‐and‐leaf of Tuition N = 60; Leaf Unit = 0.10 1 2 4 12 3 455666899999 19 4 1456778888999999999 5 5 04889 4 6 0446 5 7 02479 8 8 00033568 0 9 3 10 347 2 11 06 1 12 0 BE CAREFUL! Too few or too many stems give little descriptive information about the distribution of numbers. Page 16 of 19 Crosstabulations Restaurant 1 2 3 4 5 6 7 8 9 10 . . . Quality Good Very Good Good Excellent Very Good Good Very Good Very Good Very Good Good . . . Price 18 22 28 38 33 28 19 11 23 13 . . . Quality Good Very Good Excellent Total $10‐19 42 34 2 78 Meal Price $20‐29 $30‐39 40 2 64 46 14 28 118 76 Page 17 of 19 $40‐49 0 6 22 28 Total 84 150 66 300 Scatter Diagram Page 18 of 19 Qualitative Data Presentation Methods: Three basic tools: 1. Frequency distribution tables 2. Bar Graphs 3. Pie Charts 1. Frequency Distribution Tables Same rules as Frequency Distribution Tables for quantitative data. 2. Bar Graphs Bar graphs are almost identical to Histograms. However, there is a (vague) distinction: Histograms Bar Graphs o Bars o Height of the bar 3. Pie Charts Pie Charts are especially useful for unordered nominal data o Pie slices o Size of the slice Job Description of Graduates
Other
10% Accounting, 30% Gen Mgmt
15% Finance
20% Marketing
25% Page 19 of 19 ...
View
Full
Document
 Summer '08
 PLKitchin

Click to edit the document details