Unformatted text preview: Business Statistics (BUSA 3101)
Dr. Lari H. Arjomand
[email protected] Slide 1 Chapter 2 & 3 (Part B) Descriptive Statistics:
Tabular and Graphical Presentations
Exploratory Data Analysis
s Crosstabulations and Scatter Diagrams
s Descriptive Statistics: These
are statistical methods used
to describe data that have
been collected.
been y x Slide 2 Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easytodraw pictures that can be used to summarize data quickly. One such technique is the stemandleaf display. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
26 Slide 3 StemandLeaf Display A stemandleaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the next digit for each item in rank orderwhen the leaf unit is not shown, it is assumed to equal 1. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. Slide 4 Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tuneups performed in the
shop. She examines 50
customer invoices for tuneups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide. Slide 5 Example: Hudson Auto Repair
s Sample of Parts Cost for 50 Tuneups 91
71
104
85
62 78
69
74
97
82 93
72
62
88
98 57
89
68
68
101 75
66
97
83
79 52
75
105
68
105 99
79
77
71
79 80
75
65
69
69 97
72
80
67
62 62
76
109
74
73 First step is to rearrange these data
in rank order. See next slide. Slide 6 Solution: Hudson Auto Repair
s Sample of Parts Cost for 50 Tuneups
52 57
57
71
79
98
98 62 62 62 62 65 66 67 68 68 68 69 69 69
71 72 72 73 74 74 75 75 75 76 77 78 79
79 80 80 82 83 85 88 89 91 93 97 97 97
99 101 104 105 105 109 Data are rearranged in rank order. Slide 7 Solution: StemandLeaf Display
5
6
7
8
9
10
a stem 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9
So What? 1 3 7 7 7 8 9
Explain! 1 4 5 5 9
a leaf •When the leaf unit is not shown, it is assumed to equal 1. Slide 8 Stretched StemandLeaf Display If we believe the original stemandleaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 − 4, and the second value corresponds to leaf values of 5 − 9. Slide 9 Example: Hudson Auto Repair Stretched StemandLeaf Display
s Sample of Parts Cost for 50 Tuneups 91
71
104
85
62 78
69
74
97
82 93
72
62
88
98 57
89
68
68
101 75
66
97
83
79 52
75
105
68
105 99
79
77
71
79 80
75
65
69
69 97
72
80
67
62 62
76
109
74
73 First step is to rearrange these data
in rank order. See next slide. Slide 10 Solution: Hudson Auto Repair
s Sample of Parts Cost for 50 Tuneups
52 57
57
71
79
98
98 62 62 62 62 65 66 67 68 68 68 69 69 69
71 72 72 73 74 74 75 75 75 76 77 78 79
79 80 80 82 83 85 88 89 91 93 97 97 97
99 101 104 105 105 109 Data are rearranged in rank order. Slide 11 Solution: Stretched StemandLeaf Display
5
5
6
6
7
7
8
8
9
9
10
10 2
7
2 2 2 2
5 6 7 8 8 8 9 9 9
1 1 2 2 3 4 4
5 5 5 6 7 8 9 9 9
0 0 2 3
5 8 9
1 3
7 7 7 8 9
1 4
5 5 9 The first value
The
corresponds to
corresponds
leaf values of
0  4, and the
second value
corresponds to
leaf values of
59 Slide 12 StemandLeaf Display
s Leaf Units
• A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.
• Where the leaf unit is not shown, it is assumed to equal 1. Slide 13 Example: Leaf Unit = 0.1
If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stemandleaf display of these data will be
Leaf Unit = 0.1 8 6 8 9 1 4
10 2
11 0 7 Slide 14 Example: Leaf Unit = 10
If we have data with values such as
1806 1717 1974 1791 1682 1910 1838
a stemandleaf display of these data will be
Leaf Unit = 10 16 8 17 1 9 18 0 3 19 1 7 The 82 in 1682
is rounded down
to 80 and is
represented as an 8.
You do this for all of the data. Slide 15 Another Example & Solution The Dean of the School of Business at OU reports the following number of students in the 15 sections of basic statistics offered this semester . Construct a stemandleaf chart for the data. 27 36 29 21 24 26 32 30 36 30 28 23 17 41 19 STEM
1
2
3
4 LEAF
79
1346789
00266
1 So What?
Explain! Slide 16 Crosstabulation Or Contingency Table
s Shows number of observations jointly in two categorical variables
• Example: Male accounting student
• Gender variable and Major variable
• Can use categorized numerical variables s May include row %, column %, or total % s Helps find relationships s Used widely in marketing Slide 17 Crosstabulation Or Contingency Table Example Residence:
Gender: C C O O C C O O C O MF FMMMFMMF (C=OnCampus, O=OffCampus; M=Male, F=Female) Use gender as the explanatory variable. Gender
Residence Male Female Total
OnCampus
5
4
1
2
3
OffCampus
5
Total
6
4
10 Slide 18 Crosstabulation Or Contingency Table Example
(Row %) Gender
Residence Male Female Total
OnCampus
4
1
5
(80) (20) (100)
OffCampus
2
3
5
(40) (60) (100)
Total
6
4
10
(60) (40) (100)
(Cell Count) (100)
Row Total (3/5)(100) = 60% Slide 19 Crosstabulation Or Contingency Table Example
(Column %) Gender
Residence Male Female Total
OnCampus
4
1
5
(67) (25)
(50)
OffCampus
2
3
5
(33) (75)
(50)
Total
6
4
10
(100) (100) (100)
(Cell Count) (100)
Column Total (3/4)(100) = 75% Slide 20 Crosstabulation Or Contingency Table Example
(Total %) Gender
Residence Male Female Total
OnCampus
4
1
5
(40) (10)
(50)
OffCampus
2
3
5
(20) (30)
(50)
Total
6
4
10
(60) (40) (100)
(Cell Count) (100)
Grand Total (3/10)(100) = 30% Slide 21 Which Percentage?
s
s s Compute % in direction of explanatory variable
Then, for example, If explanatory variable is in row, use row total
In previous example, gender is explanatory variable
• ‘Explains’ residence choice Slide 22 Thinking Challenge Example
You’re a marketing research analyst for Visa. You want to analyze data on credit card use & annual income. Use the following information, create a contingency table. In this example, use income as
In
the explanatory variable.
the Income (000):
12 20 32 45
Use:
Y
N
N
Y
(Income categories: Under $25,000;
Use categories: Y = Use credit cards, 72 46 18 55
Y
Y
N
Y
$25,000 & over;
N = Don’t use) Slide 23 Solution Use
Income
Under $25k
Explanatory
Explanatory
variable
variable $25K & Over
Total Row percentages No Yes 2
(67)
1
(20)
3
(38) 1
(33)
4
(80)
5
(62) Total
3
(100)
5
(100)
8
(100) (4/5)(100) = 80% Slide 24 Crosstabulations and Scatter Diagrams As we indicated, often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation (or Contingency Table) and a scatter diagram are two methods for summarizing the data for two (or more) variables simultaneously. Slide 25 Crosstabulation Or Contingency Table Remember that a crosstabulation is a tabular summary of data for two variables.
s Crosstabulation can be used when: • one variable is qualitative and the other is quantitative,
• both variables are qualitative, or
• both variables are quantitative. As we said, the left and top margin labels define the classes for the two variables. Slide 26 Crosstabulations Using SWStat
STEPS:
1. From Data Area Set New Data Area
2. From Statistics Select Grouped Data
3. From the Window Select CrossTab
4.Click Calculate SEE NEXT EXAMPLE Slide 27 Crosstabulations Using SWStat
DATA Slide 28 Crosstabulations Using SWStat Slide 29 Crosstabulations Using SWStat : Solution
Solution
Crosstabulations Slide 30 Crosstabulation Or Contingency Table
s Example: Finger Lakes Homes
The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative variable
Price
Range < $99,000
> $99,000
Total qualitative variable Home Style Colonial Log Split AFrame
18 6 19 12
12 14 16 3
30 20 35 15 Total
55
45 100 Slide 31 Crosstabulation Or Contingency Table
Frequency distribution
for the price variable
Price
Range
< $99,000
> $99,000
Total Home Style Colonial Log Split AFrame
18 6 19 12
12 14 16 3
30 20 35 15 Total
55
45 100 Frequency distribution
for the home style variable Slide 32 Crosstabulation Or Contingency Table
s Insights Gained from Preceding Crosstabulation • The greatest number of homes in the sample (19) are a splitlevel style and priced at less than or equal to $99,000.
• Only three homes in the sample are an AFrame style and priced at more than $99,000. Slide 33 Crosstabulation Row Or Column Percentages?
s As we said, converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. See Next
Two
Slides Slide 34 Crosstabulation or Contingency Table
(Row %)
Price
Range
< $99,000
> $99,000 Home Style Colonial Log Split AFrame
32.73 10.91 34.55 21.82
26.67 31.11 35.56 6.67 Total
100
100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100
(Cell Count) (100)
Row Total Slide 35 Crosstabulation or Contingency Table
(Column %)
Price
Range
< $99,000
> $99,000
Total Home Style Colonial Log Split AFrame
60.00 30.00 54.29 80.00
40.00 70.00 45.71 20.00 100 100 100 100 (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100
(Cell Count) (100)
Column Total Slide 36 Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is a line that provides an approximation of the relationship. Slide 37 Scatter Diagram and Trendline
s A Positive Relationship y x Slide 38 Scatter Diagram and Trendline
s A Negative Relationship y x Slide 39 Scatter Diagram and Trendline
s No Apparent Relationship y x Slide 40 Example: Panthers Football Team
s Scatter Diagram The Panthers football team is interested
in investigating the relationship, if any,
between interceptions made and points scored.
x = Number of
Interceptions
1
3
2
1
3 y = Number of Points Scored
14
24
18
17
30 Slide 41 Number of Points Scored Scatter Diagram 35 y 30
25
20
15
10
5
0 0 1 2 3 Number of Interceptions 4 x Slide 42 Panthers Football Team Example (Continued): s Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. • Higher points scored are associated with a higher number of interceptions. • The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Slide 43 Tabular and Graphical Procedures
Data
Data
Qualitative Data
Qualitative Data Tabular
Tabular
Methods
Methods Graphical Methods •Frequency Distribution
•Rel. Freq. Dist.
•Percent Freq. Distribution
•Crosstabulation (Contingency Table) •Bar Graph
•Pie Chart Quantitative Data Tabular
Methods
Methods
•Frequency Distribution
•Rel. Freq. Dist.
•Cum. Freq. Dist.
•Cum. Rel. Freq. Distribution •StemandLeaf Display
•Crosstabulation (Contingency Table) Graphical
Graphical Methods
Methods
•Dot Plot
•Histogram
•Ogive
•Scatter Diagram Slide 44 End of Chapter 2 & 3, Part B Slide 45 ...
View
Full Document
 Fall '09
 Statistics, Prime number, Nontotient

Click to edit the document details