**Unformatted text preview: **Business Statistics (BUSA 3101)
Dr. Lari H. Arjomand
[email protected] Slide 1 Chapter 4 (Part A) Descriptive Statistics: Numerical Measures
s
s Measures of Location
Measures of Variability Numerical Data
Properties
Central
Tendency Variation Shape Mean Range Skew Median Kurtosis Mode Interquartile
Range
Variance Midrange Standard Deviation Midhinge Coeff. of Variation Slide 2 Measures of Location
s Mean s Median
Mode s
s
s If the measures are computed for data from a sample,
they are called sample statistics. Percentiles
Quartiles If the measures are computed for data from a population,
they are called population parameters. A sample statistic is referred to
as the point estimator of the
corresponding population parameter. For example, the sample mean is a point estimator of the population mean. Slide 3 Mean
s
s The mean of a data set is the average of all the data values.
As we said, the sample mean is the point estimator of the population mean µ. x Slide 4 Sample Mean x x= ∑x Sum of the values
Sum of the values
of the n observations
of the n observations
i n
Number of
Number of
observations
observations
in the sample
in the sample Slide 5 Population Mean µ µ= ∑x Sum of the values
Sum of the values
of the N observations
of the i N
Number of
Number of
observations in
observations in
the population
the population Slide 6 Sample Mean Example: Apartment Rents Seventy efficiency apartments
were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide. Slide 7 Sample Mean
Example Continued
425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Monthly Rent for 70 Apartments Slide 8 Sample Mean
Example Continued ∑x
x= 34, 356
=
= 490.80
n
70 425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 i 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Monthly Rent for 70 Apartments Slide 9 Properties of the Arithmetic Mean
1 Every set of intervallevel and ratiolevel data has a mean.
2 All the values are included in computing the mean.
3 A set of data has a unique mean.
4 The mean is affected by unusually large or small data values.
5 The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero. See next
Slide for
An example ∑( X − X ) = 0 Slide 10 Illustration of Item Number 5 on Previous Slide Consider the set of values: 3, 8, and 4. The mean is 5. So (3 5) + (8 5) + (4 5) = 2 + 3 1 = 0. Symbolically we write: ∑( X − X ) = 0 Slide 11 Median The median of a data set is the value in the middle when the data items are arranged in ascending order. Whenever a data set has extreme values, the median is the preferred measure of central location. The median is the measure of location most often reported for annual income and property value data. A few extremely large incomes or property values can inflate the mean. n +1
Positioning Point =
2 Slide 12 Median For an odd number of observations: 26 18 27 12 14 27 19
12 14 18 19 26 27 27 7 observations
in ascending order the median is the middle value.
Median = 19 Slide 13 Median For an even number of observations: 26 18 27 12 14 27 30 19
12 14 18 19 26 27 27 30 8 observations
in ascending order the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5 Slide 14 Median: Example
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Monthly Rent for 70 Apartments Slide 15 Mode The mode of a data set is the value that occurs with greatest frequency. The greatest frequency can occur at two or more different values. If the data have exactly two modes, the data are bimodal. If the data have more than two modes, the data are multimodal. Slide 16 Mode: Example
450 occurred most frequently (7 times)
Mode = 450
425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Monthly Rent for 70 Apartments Slide 17 Mode: Another Example
No Mode
Raw Data: 10.3 4.9 11.7 6.3 7.7 One Mode
Raw Data: 6.0 4.9 6.0 8.9 6.3 4.9 4.9 28 43 43 More Than 1 Mode
Raw Data:
21 8.9 28 41 Slide 18 Use Excel to Compute
the Mean, Median, and Mode
s of the Following Data and Explain the Answers: 425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 STUDENTS 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Slide 19 Percentiles A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. Admission test scores for colleges and universities are frequently reported in terms of percentiles.
s You are familiar with percentile score of national educational tests such as ACT, and SAT, which tell you where you stand in comparison with others.
s For example, if you are in the 83th percentile, then 83% of the testtakers scored below you and you are in the top 17% of the test takers. Slide 20 Percentiles
Definition
s The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 p) percent of the items take on this value or more. Slide 21 Steps for Finding Percentiles Arrange the data in ascending order. Compute index i, the position of the pth percentile.
i = (p/100)n If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i +1. Slide 22 80th Percentile: Example
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Note: Data is in ascending order. Slide 23 80th Percentile: Example Continued
“At least 80% of the items take on a value of 542 or less.”
56/70 = .8 or 80%
425
440
450
465
480
510
575 “At least 20% of the items take on a value of 542 or more.”
14/70 = .2 or 20% 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Slide 24 Use Excel to Find 80th Percentile Excel Formula Worksheet 1
2
3
4
5
6 A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480 C 80th percentile
D E 80th Percentile
=PERCENTILE(B2:B71,.8) Note: Rows 771 are not shown. It is not necessary
It
to put the data
in ascending order.
in Slide 25 80th Percentile Excel Value Worksheet 1
2
3
4
5
6 A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480 C D E 80th Percentile
537.8 Note: Rows 771 are not shown. Slide 26 EXAMPL
Given the following data, use Excel to
Given
find the 25th percentile:
find
357
654
763
621
900 550
290
700
789
605 Slide 27 Quartiles Quartiles are specific percentiles. First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile
s s Unless the sample size is large, percentiles may not make sense, since percentiles divide the data into 100 groups.
In smaller samples, we might divide the data into four groups (quartiles). Since almost any sample can be divided into four groups, the quartiles are important descriptive statistics to explain. Slide 28 Third Quartile Excel Formula Worksheet 1
2
3
4
5
6 A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480 C 3rd quartile
D E Third Quartile
=QUARTILE(B2:B71,3) Note: Rows 771 are not shown. It is not necessary
It
to put the data
in ascending order.
in Slide 29 Third Quartile Excel Value Worksheet 1
2
3
4
5
6 A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480 C D E Third Quartile
522.5 Note: Rows 771 are not shown. Slide 30 EXAMPL
Given the following data, use Excel to
Given
find the second quartile:
find
357
654
763
621
900 550
290
700
789
605 Slide 31 Measures of Variability (Dispersion) It is often desirable to consider measures of variability (dispersion), as well as measures of location. For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each. Slide 32 Measures of Variability (Dispersion)
s Range s Interquartile Range or Midspread s Variance s Standard Deviation s Coefficient of Variation Slide 33 Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values. Slide 34 Range: Example
Range = largest value smallest value
Range = 615 425 = 190
425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Monthly Rent for 70 Apartments Slide 35 Interquartile Range or Midspread The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values—it is not effected by the extreme values. Interquartile Range = Q3 − Q1 Slide 36 Interquartile Range: Example
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 Q1 = 525 445 = 80
425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 Monthly Rent for 70 Apartments 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Slide 37 EXAMPL
Given the following data, use Excel to
Given
find the Interquartile Range :
find
357
654
763
621
900 550
290
700
789
605 Slide 38 Variance The variance is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation (xi) and the mean ( for a sample,
x µ for a population). Slide 39 Variance The variance is the average of the squared differences between each data value and the mean. The variance is computed as follows:
∑ ( xi − x ) 2
s2 =
n −1 for a
sample ∑ ( xi − µ )
σ=
N
2 2 for a
population Slide 40 Standard Deviation The standard deviation of a data set is the positive square root of the variance. It is measured in the same units as the data, making it more easily interpreted than the variance. Slide 41 Standard Deviation The standard deviation is computed as follows: s = s2 σ = σ2 for a
sample for a
population Slide 42 Coefficient of Variation The coefficient of variation indicates how large the standard deviation is in relation to the mean. The coefficient of variation is computed as follows: s ×100 %
x σ ×100 %
µ for a
sample for a
population Slide 43 Coefficient of Variation (Continued)
s
s
s s
s s Measure of relative dispersion
Always a %
CV is the standard deviation expressed as percent of the mean
Used to compare two or more groups
Weakness: CV is undefined if the mean is zero or if data are negative. Thus, CV is used only for variables whose values are X>=0 Slide 44 Example Continued
Given the following monthly rent prices for 70 apartments, find
Given
variance, standard deviation, and the coefficient of variation.: use
equations & Excel
equations 425
440
450
465
480
510
575 430
440
450
470
485
515
575 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 Monthly Rent for 70 Apartments 440
450
465
480
500
570
615 440
450
465
480
510
570
615 Slide 45 Solutions
s Variance
s2 s ∑( x
= − x )2
i
= 2, 996.16
n −1 Standard Deviation
s = s 2 = 2996.47 = 54.74 s Coefficient of Variation the standard
deviation is
about 11% of
of the mean s 54.74 × 100 % = × 100 % = 11.15% x 490.80 Note that CV is the standard deviation expressed as percent of the Note
CV is the standard deviation expressed as percent of the mean. Slide 46 Thinking Challenge Example
You’re a financial analyst for PrudentialBache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.
Describe the volatility (variation) of the stock prices. Slide 47 EXAMPL
Given the following data, use Excel to
Given
find the followings:
find
357
357
654
654
763
621
900 550
290
700
789
605 Slide 48 EXAMPLE
s s
s
s
s
s Given the
Given
following data:
following
357
357
654
763
621
900 550
290
700
789
605 If you need help with
If
this, see next slides.
this, s Use Excel to find:
A. The mean
B. The mode
C. The median
D. The 75th percentile
E. The first and the third
quartile
F. The range
G. The interquartile range or
midspread
H. The standard deviation
I. The coefficient of variation Slide 49 A Problem Using Excel s A private research
private
organization studying
families in various
countries reported
the following data for
the amount of time 4the
year old children
year
spent alone with their
fathers each day.
fathers Country Time with Dad
(minutes) Belgium 30 Canada
China
Finland
Germany
Nigeria
Sweden 44
54
50
36
42
46 U.S.A. 42 Slide 50 A Problem Using Excel (Continued)
s Use Excel, answer the following questions and explain your answers (round all numbers into two decimal places): •
•
•
•
•
•
•
•
• A. The mean
B. The mode
C. The median
D. The 75th percentile
E. The first and the third quartile F. The range
G. The interquartile range or midspread
H. The standard deviation
I. The coefficient of variation Note: All results are rounded to two decimal places. Slide 51 Using SWStat+
(Creating Data Area) Data Area Slide 52 Using SWStat+
(Choose Statistics; Ungrouped Data; Choose Measures) Slide 53 Using SWStat+
(Numerical Data, Summary Measures (Sample); Calculate) Slide 54 Using SWStat+ (Results) Slide 55 Using SWStat+
(Numerical Data; Percentile; Calculate) Slide 56 Using SWStat+ (Results) Slide 57 End of Chapter 4, Part A Slide 58 ...

View
Full Document