This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Chapter 1 Data and Statistics I need help! • Applications in Business and Economics
• Data
• Data Sources
• Descriptive Statistics
• Statistical Inference
• Statistical Analysis Using Microsoft Excel Applications in
Business and Economics
• Accounting
Public accounting firms use statistical
sampling procedures when conducting
audits for their clients.
s Economics Economists use statistical information
in making forecasts about the future of
the economy or some aspect of it. Applications in Business and Economics
s Marketing
Electronic pointofsale scanners at
retail checkout counters are used to
collect data for a variety of marketing
research applications. s Production A variety of statistical quality control charts are used to monitor
the output of a production process. Applications in Business and Economics Finance Financial advisors use priceearnings ratios and
dividend yields to guide their investment
recommendations. Data and Data Sets
• Data are the facts and figures collected, summarized, analyzed, and interpreted. The data collected in a particular study are referred to as the data set. Elements, Variables, and Observations The elements are the entities on which data are collected. A variable is a characteristic of interest for the elements. The set of measurements collected for a particular element is called an observation. The total number of data values in a data set is the number of elements multiplied by the number of variables. Data, Data Sets, Elements, Variables, and Observations
Element Names Variables Observation Company Dataram EnergySouth Keystone LandCare Psychemedics Stock Annual Earn/
Exchange Sales($M) Share($) NQ N N NQ N 73.10 0.86 74.00 1.67
365.70 0.86 111.40 0.33 17.60 0.13
Data Set Scales of Measurement Scales of measurement include:
Nominal Interval Ordinal Ratio The scale determines the amount of information contained in the data. The scale indicates the data summarization and statistical analyses that are most appropriate. Scales of Measurement
• Nominal Data are labels or names used to identify an attribute of the element. A nonnumeric label or numeric code may be used. Scales of Measurement
s Nominal Example: Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on). Scales of Measurement
• Ordinal The data have the properties of nominal data and the order or rank of the data is meaningful. A nonnumeric label or numeric code may be used. Scales of Measurement
• Ordinal Example: Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on). Scales of Measurement
• Interval The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. Scales of Measurement
• Interval Example: Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 points more than Kevin. Scales of Measurement
• Ratio The data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale. This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. Scales of Measurement
• Ratio Example: Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa. Qualitative and Quantitative Data Data can be further classified as being qualitative or quantitative. The statistical analysis that is appropriate depends on whether the data for the variable are qualitative or quantitative. In general, there are more alternatives for statistical analysis when the data are quantitative. Qualitative Data Labels or names used to identify an attribute of each element Often referred to as categorical data Use either the nominal or ordinal scale of measurement Can be either numeric or nonnumeric Appropriate statistical analyses are rather limited Quantitative Data Quantitative data indicate how many or how much: discrete, if measuring how many continuous, if measuring how much Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data. Scales of Measurement
Data
Qualitative Numerical Nominal Ordinal Quantitative Nonnumerical Nominal Ordinal Numerical Interval Ratio Descriptive Statistics • Descriptive statistics are the tabular, graphical, and numerical methods used to summarize data. Statistical Inference Population − the set of all elements of interest in a particular study Sample − a subset of the population Statistical inference − the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population
Census − collecting data for a population Sample survey − collecting data for a sample Process of Statistical Inference
1. Population consists of all
tuneups. Average
cost of parts is
unknown. 2. A sample of 50 4. The sample average 3. The sample data engine tuneups is examined. provide a sample
average parts cost
of $79 per tuneup. is used to estimate the population average. Chapter 2
Descriptive Statistics:
Tabular and Graphical Presentations
Part A • Summarizing Qualitative Data
• Summarizing Quantitative Data Summarizing Qualitative Data
•
•
•
•
• Frequency Distribution
Relative Frequency Distribution Percent Frequency Distribution
Bar Graph
Pie Chart Frequency Distribution A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data. Example: Marada Inn
Guests staying at Marada Inn were
asked to rate the quality of their accommodations as being excellent,
above average, average, below average, or
poor. The ratings provided by a sample of 20 guests are: Below Average Average Above Average Above Average Above Average Average Above Average Average Above Average Above Average Below Average Poor Excellent Above Average Average Above Average Below Average Poor Above Average Average Average Frequency Distribution Rating
Frequency 2
Poor 3
Below Average 5
Average 9
Above Average 1
Excellent
Total 20 Relative Frequency Distribution The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Relative Frequency and
Percent Frequency Distributions Relative
Frequency
Rating .10
Poor .15
Below Average .25
Average .45
Above Average .05
Excellent
Total 1.00 Percent
Frequency 10 15 25 .10(100) = 10 45 5 100
1/20 = .05 Bar Graph A bar graph is a graphical device for depicting qualitative data. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis). Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category. Bar Graph
Marada Inn Quality Ratings
10
9
Frequency 8
7
6
5
4
3
2
1
Poor Below Average Above Excellent
Average
Average Rating Pie Chart The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.
s First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. s Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle. Pie Chart Marada Inn Quality Ratings
Excellent 5% Poor
10% Below
Average 15% Above
Average 45% Average 25% Example: Marada Inn
s Insights Gained from the Preceding Pie Chart • Onehalf of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager. • For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager. Summarizing Quantitative Data
• Frequency Distribution
• Relative Frequency and Percent Frequency Distributions
• Histogram
• Cumulative Distributions
• Ogive Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tuneups performed in the
shop. She examines 50
customer invoices for tuneups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide. Example: Hudson Auto Repair
s Sample of Parts Cost($) for 50 Tuneups
91
71
104
85
62 78
69
74
97
82 93
72
62
88
98 57
89
68
68
101 75
66
97
83
79 52
75
105
68
105 99
79
77
71
79 80
75
65
69
69 97
72
80
67
62 62
76
109
74
73 Frequency Distribution
• Guidelines for Selecting Number of Classes
• Use between 5 and 20 classes.
• Data sets with a larger number of elements usually require a larger number of classes. • Smaller data sets usually require fewer classes. Frequency Distribution
• Guidelines for Selecting Width of Classes
•Use classes of equal width.
•Approximate Class Width =
Largest Data Value − Smallest Data Value
Number of Classes Frequency Distribution
For Hudson Auto Repair, if we choose six classes: Approximate Class Width = (109 52)/6 = 9.5 ≅ Frequency
Parts Cost ($) 5059 2 6069 13 7079 16 8089 7 9099 7 100109 5
Total 50 10
10 Relative Frequency and
Percent Frequency
Distributions
Parts Relative
Percent Cost ($)
Frequency Frequency 5059 .04 4 6069 .26
2/50 26 .04(100) 7079 .32 32 8089 .14 14 9099 .14 14 100109 .10 10
Total 1.00 100 Relative Frequency and
Percent Frequency Distributions
s Insights Gained from the Percent Frequency Distribution • Only 4% of the parts costs are in the $5059 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost onethird) of the parts costs are in the $7079 class. • 10% of the parts costs are $100 or more. Histogram Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes. Histogram
18 Tuneup Parts Cost 16 Frequency 14
12
10
8
6
4
2
50−
59 60−
69 70− 80−
79 89 90−
99 100110 Parts
Cost ($) Histogram
• Symmetric – Left tail is the mirror image of the right tail
– Examples: heights and weights of people Relative Frequency .35
.30
.25
.20
.15
.10
.05
0 Histogram
• Moderately Skewed Left
– A longer tail to the left
– Example: exam scores Relative Frequency .35
.30
.25
.20
.15
.10
.05
0 Histogram
• Moderately Right Skewed
– A Longer tail to the right
– Example: housing values Relative Frequency .35
.30
.25
.20
.15
.10
.05
0 Histogram
• Highly Skewed Right – A very long tail to the right
– Example: executive salaries Relative Frequency .35
.30
.25
.20
.15
.10
.05
0 Cumulative Distributions Cumulative frequency distribution − shows the number of items with values less than or equal to the upper limit of each class.. Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the upper limit of each class. Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the upper limit of each class. Cumulative Distributions
• Hudson Auto Repair Cumulative Cumulative Cumulative
Relative
Percent Frequency
Frequency
Cost ($) Frequency 2 .04 < 59 4 15 .30 < 69 30 < 79 31 2 + 13 .62
15/50 62 .30(100) 38 .76 < 89 76 45 .90 < 99 90 1.00 < 109 50 100 Ogive
s
s The data values are shown on the horizontal axis. s Shown on the vertical axis are the:
• cumulative frequencies, or
• cumulative relative frequencies, or
• cumulative percent frequencies s The frequency (one of the above) of each class is plotted as a point. s An ogive is a graph of a cumulative distribution. The plotted points are connected by straight lines. Ogive
s Hudson Auto Repair
• Because the class limits for the partscost data are 50
59, 6069, and so on, there appear to be oneunit gaps from 59 to 60, 69 to 70, and so on. •
• These gaps are eliminated by plotting points halfway between the class limits.
Thus, 59.5 is used for the 5059 class, 69.5 is used for the 6069 class, and so on. Ogive with Cumulative Percent Frequencies Cumulative Percent Frequency Tuneup Parts Cost
100
80
60 (89.5, 76) 40
20 Parts
50 60 70 80 90 100 110 Cost ($) Chapter 2
Descriptive Statistics:
Tabular and Graphical Presentations
Part B
Exploratory Data Analysis
s Crosstabulations and
y Scatter Diagrams
s x Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easytodraw pictures that can be used to summarize data quickly. One such technique is the stemandleaf display. StemandLeaf Display A stemandleaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tuneups performed in the
shop. She examines 50
customer invoices for tuneups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide. Example: Hudson Auto Repair
s Sample of Parts Cost ($) for 50 Tuneups
91
71
104
85
62 78
69
74
97
82 93
72
62
88
98 57
89
68
68
101 75
66
97
83
79 52
75
105
68
105 99
79
77
71
79 80
75
65
69
69 97
72
80
67
62 62
76
109
74
73 StemandLeaf Display
5
6
7
8
9
10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf Stretched StemandLeaf Display If we believe the original stemandleaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 − 4, and the second value corresponds to leaf values of 5 − 9. Stretched StemandLeaf Display
5
5
6
6
7
7
8
8
9
9
10
10 2
7
2 2 2 2
5 6 7 8 8 8 9 9 9
1 1 2 2 3 4 4
5 5 5 6 7 8 9 9 9
0 0 2 3
5 8 9
1 3
7 7 7 8 9
1 4
5 5 9 StemandLeaf Display
s Leaf Units
• A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.
• Where the leaf unit is not shown, it is assumed to equal 1. Example: Leaf Unit = 0.1
If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stemandleaf display of these data will be
Leaf Unit = 0.1 8 6 8 9 1 4
10 2
11 0 7 Example: Leaf Unit = 10
If we have data with values such as
1806 1717 1974 1791 1682 1910 1838
a stemandleaf display of these data will be
Leaf Unit = 10 16 8 17 1 9 18 0 3 19 1 7 The 82 in 1682
is rounded down
to 80 and is
represented as an 8. Crosstabulations and Scatter
Diagrams Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously. Crosstabulation A crosstabulation is a tabular summary of data for two variables.
s Crosstabulation can be used when: • one variable is qualitative and the other is quantitative,
• both variables are qualitative, or
• both variables are quantitative. The left and top margin labels define the classes for the two variables. Crosstabulation
s Example: Finger Lakes Homes
The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative
qualitative variable variable
Price
Range < $99,000
> $99,000
Total Home Style Colonial Log Split AFrame
18 6 19 12
12 14 16 3
30 20 35 15 Total
55
45 100 Crosstabulation
• Insights Gained from Preceding Crosstabulation
• The greatest number of homes (19) in the sample are a splitlevel style and priced at less than or equal to $99,000.
• Only three homes in the sample are an AFrame style and priced at more than $99,000. Crosstabulation
Frequency distribution
for the price variable
Price
Range
< $99,000
> $99,000
Total Home Style Colonial Log Split AFrame
18 6 19 12
12 14 16 3
30 20 35 15 Frequency distribution
for the home style variable Total
55
45 100 Crosstabulation: Row or Column
Percentages
• Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. Crosstabulation: Row Percentages Price
Range
< $99,000
> $99,000 Home Style Colonial Log Split AFrame
32.73 10.91 34.55 21.82
26.67 31.11 35.56 6.67 Total
100
100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100 Crosstabulation: Column Percentages Price
Range
< $99,000
> $99,000
Total Home Style Colonial Log Split AFrame
60.00 30.00 54.29 80.00
40.00 70.00 45.71 20.00 100 100 100 100 (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100 Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. suggests the overall relationship between the variables. Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is an approximation of the relationship. Scatter Diagram
• A Positive Relationship
y x Scatter Diagram
• A Negative Relationship
y x Scatter Diagram
• No Apparent Relationship
y x Example: Panthers Football Team
• Scatter Diagram The Panthers football team is interested
in investigating the relationship, if any,
between interceptions made and points scored.
x = Number of
Interceptions y = Number of Points Scored 1
3
2
1
3 14
24
18
17
30 Number of Points Scored Scatter Diagram 35 y 30
25
20
15
10
5
0 0 1 2 3 Number of Interceptions 4 x Example: Panthers Football Team
s Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. • Higher points scored are associated with a higher number of interceptions. • The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data Tabular
Methods Tabular
Methods Graphical Methods •Frequency Distribution
•Rel. Freq. Dist.
•Percent Freq. Distribution
•Crosstabulation Graphical Methods
•Bar Graph
•Pie Chart •Frequency Distribution
•Rel. Freq. Dist.
•Cum. Freq. Dist.
•Cum. Rel. Freq. Distribution •StemandLeaf Display
•Crosstabulation •Histogram
•Ogive
•Scatter Diagram ...
View
Full
Document
This note was uploaded on 08/28/2011 for the course BUS 300 taught by Professor White during the Spring '09 term at Rutgers.
 Spring '09
 White
 Accounting, Economics

Click to edit the document details