**Unformatted text preview: **Business Statistics (BUSA 3101)
Dr. Lari H. Arjomand
[email protected] Slide 1 Chapter 4 (Part B) Descriptive Statistics: Numerical Measures
s Measures of Distribution Shape, Relative Location, and Detecting Outliers s Exploratory Data Analysis s Measures of Association Between Two Variables s The Weighted Mean and Working with Grouped Data Slide 2 Measures of Distribution Shape,
Relative Location, and Detecting Outliers
s
s
s
s
s Distribution Shape
zScores
Chebyshev’s Theorem
Empirical Rule
Detecting Outliers Slide 3 Distribution Shape: Skewness
s An important measure of the shape of a distribution is called skewness. s The formula for computing skewness for a data set is somewhat complex. s Skewness can be easily computed using statistical software. Slide 4 Distribution Shape: Skewness
Pearson’s Coefficient of Skewness Equation
SK = 3(Mean – Median)/Standard Deviation
-3 <= SK <=+3
Left-Skewed
Mean Median Symmetric
Mode Mean = Median Right-Skewed
= Mode Mode Median Mean Slide 5 Distribution Shape: Skewness
Symmetric (not skewed, SK = 0)
• If skewness is zero, then
• Mean and median are equal.
.35 Relative Frequency s Skewness = 0 .30
.25
.20
.15
.10
.05
0 Slide 6 Distribution Shape: Skewness
Moderately Skewed Left
• Is skewness is negative (left skewed SK >= 3), then
• Mean will usually be less than the median.
.35 Relative Frequency s Skewness = − .31 .31 .30
.25
.20
.15
.10
.05
0 Slide 7 Distribution Shape: Skewness
Moderately Skewed Right
• If skewness is positive (right skewed, SK >= +3), then
• Mean will usually be more than the median.
.35 Relative Frequency s Skewness = .31 .30
.25
.20
.15
.10
.05
0 Slide 8 Distribution Shape: Skewness
s Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median. Relative Frequency .35 Skewness = 1.25 .30
.25
.20
.15
.10
.05
0 Slide 9 Distribution Shape: Skewness
s Example: Apartment Rents Seventy efficiency apartments
were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide. Slide 10 Distribution Shape: Skewness Skewness = .87 425
440
450
465
480
510
575 430
440
450
470
485
515
575 ∑x
x= 430
440
450
470
490
525
580 435
445
450
472
490
525
590 435
445
450
475
490
525
600 34, 356
=
= 490.80
n
70
i 435
445
460
475
500
535
600 435
445
460
475
500
549
600 435
445
460
480
500
550
600 440
450
465
480
500
570
615 440
450
465
480
510
570
615 s = s 2 = 2996.47 = 54.74 Median = (475 + 475)/2 = 475 Mode = 450 Slide 11 Distribution Shape: Skewness
Example (Continued)
SK = 3(Mean – Median) / Standard deviation
SK = 3(490.80 – 475) / 54.74 = 0.87 Slide 12 Distribution Shape: Skewness Relative Frequency .35 Skewness = .87 .30
.25
.20
.15
.10
.05
0 Slide 13 zScore OR Standardized Value The zscore is often called the standardized value.
The zscore is often called the standardized value. It denotes the number of standard deviations a data
It denotes the number of standard deviations a data value xi is from the mean.
value xi is from the mean. xi − x
zi =
s Slide 14 zScores An observation’s zscore is a measure of therelative location of the observation in a data set. A data value less than the sample mean will have a zscore less than zero—the zscore is negative. A data value greater than the sample mean will have a zscore greater than zero—the zscore is positive. A data value equal to the sample mean will have a zscore of zero. Slide 15 zScores or Standardized Values
Example
s zScore of Smallest Value (425)
x i − x 425 − 490.80
z=
=
= − 1.20
s
54.74 Standardized Values for Apartment Rents
1.20
0.93
0.75
0.47
0.20
0.35
1.54 1.11
0.93
0.75
0.38
0.11
0.44
1.54 1.11
0.93
0.75
0.38
0.01
0.62
1.63 1.02
0.84
0.75
0.34
0.01
0.62
1.81 1.02
0.84
0.75
0.29
0.01
0.62
1.99 1.02
0.84
0.56
0.29
0.17
0.81
1.99 1.02
0.84
0.56
0.29
0.17
1.06
1.99 1.02
0.84
0.56
0.20
0.17
1.08
1.99 0.93
0.75
0.47
0.20
0.17
1.45
2.27 0.93
0.75
0.47
0.20
0.35
1.45
2.27 Slide 16 Chebyshev’s Theorem At least (1 1/z22)) of the items in any data set will be
At least (1 1/z of the items in any data set will be within z standard deviations of the mean, where z is
within z standard deviations of the mean, where z is any value greater than 1..
any value greater than 1
In other words, the Chebyshev’s theorem indicates that for any set of observations (sample or population), the minimum proportion of the values that lie within z standard deviations of the mean is at least 100[1 1/z2], where z is any constant greater than 1. Slide 17 Chebyshev’s Theorem
75%
75% At least of the data values must be
At least of the data values must be z = 2 standard deviations
z = 2 standard deviations within of the mean.
within of the mean.
89%
89% At least of the data values must be
At least of the data values must be z = 3 standard deviations
z = 3 standard deviations within of the mean.
within of the mean.
94%
94% At least of the data values must be
At least of the data values must be z = 4 standard deviations
z = 4 standard deviations within of the mean.
within of the mean. Slide 18 Chebyshev’s Theorem
For example: x
Let z = 1.5 with = 490.80 and s = 54.74
At least (1 − 1/(1.5)2) = 1 − 0.44 = 0.56 or 56%
of the rent values must be between x z(s) = 490.80 − 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573 Slide 19 Empirical Rule For data having a bellshaped distribution:
68.26% of the values of a normal random variable
68.26% of the values of a normal random variable
+/ 1 standard deviation are within of its mean.
+/ 1 standard deviation
are within of its mean.
95.44% of the values of a normal random variable
95.44% of the values of a normal random variable
+/ 2 standard deviations are within of its mean.
+/ 2 standard deviations
are within of its mean.
99.72% of the values of a normal random variable
99.72% of the values of a normal random variable
+/ 3 standard deviations are within of its mean.
+/ 3 standard deviations
are within of its mean. Slide 20 Empirical Rule
99.72%
95.44%
68.26% µ – 3σ
µ – 1σ
µ – 2σ µ µ + 3σ
µ + 1σ
µ + 2σ x Slide 21 Example
s s Q: Suppose we give an exam to 100 students. How many students will score within two standard deviations of the mean?
A: 95.44% of 100 or 95 students—only about 5 students will have scores more than two standard deviations from the mean. Slide 22 Detecting Outliers An outlier is an unusually small or unusually large value in a data set. A data value with a zscore less than 3 or greater than +3 might be considered an outlier. It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the data set
• a correctly recorded data value that belongs in the data set Slide 23 Detecting Outliers The most extreme zscores are 1.20 and 2.27 Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set.
Standardized Values for Apartment Rents
1.20
0.93
0.75
0.47
0.20
0.35
1.54 1.11
0.93
0.75
0.38
0.11
0.44
1.54 1.11
0.93
0.75
0.38
0.01
0.62
1.63 1.02
0.84
0.75
0.34
0.01
0.62
1.81 1.02
0.84
0.75
0.29
0.01
0.62
1.99 1.02
0.84
0.56
0.29
0.17
0.81
1.99 1.02
0.84
0.56
0.29
0.17
1.06
1.99 1.02
0.84
0.56
0.20
0.17
1.08
1.99 0.93
0.75
0.47
0.20
0.17
1.45
2.27 0.93
0.75
0.47
0.20
0.35
1.45
2.27 Slide 24 Measures of Association Between Two Variables
s
s Covariance
Correlation Coefficient This concepts will be explained
in more details in chapter 12 Slide 25 The Weighted Mean and
Working with Grouped Data
s
s
s
s Weighted Mean
Mean for Grouped Data
Variance for Grouped Data
Standard Deviation for Grouped Data Slide 26 Weighted Mean When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean. In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade. When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value. Slide 27 Weighted Mean ∑w x
x=
∑w ii
i where: xi = value of observation i wi = weight for observation i Slide 28 Weighted Mean Example During a one hour period on a busy Friday night, fifty soft drinks were sold at the Kruzin Cafe. Compute the weighted mean of the price of the soft drinks. (Price ($), Number sold): (0.5, 5), (0.75, 15), (0.9, 15), and (1.10, 15). The weighted mean is $[0.5× 5 + + 0.75× 15 + 0.9× 15 + 1.1× 15]/[5 + 15 + 15 + 15] = $43.75/50 = $0.875. ∑w x
x=
∑w ii
i Slide 29 Grouped Data The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. We compute a weighted mean of the class midpoints using the class frequencies as weights. Similarly, in computing the variance and standard deviation, the class frequencies are used as weights. Slide 30 Mean for Grouped Data
s Sample Data ∑fM
x= i ∑fM
µ= i i n s Population Data
i N where: fi = frequency of class i Mi = midpoint of class i Slide 31 Sample Mean for Grouped Data
Example
Given below is the previous sample of monthly rents
for 70 efficiency apartments, presented here as grouped
data in the form of a frequency distribution. Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619 Frequency
8
17
12
8
7
4
2
4
2
6 Slide 32 Sample Mean for Grouped Data
(Example Continued)
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total fi
8
17
12
8
7
4
2
4
2
6
70 Mi
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569.5
589.5
609.5 f iMi
3436.0
7641.5
5634.0
3916.0
3566.5
2118.0
1099.0
2278.0
1179.0
3657.0
34525.0 34,525
x=
= 493.21
70
This approximation
differs by $2.41 from
the actual sample
mean of $490.80. ∑fM
x=
i n i Slide 33 Variance for Grouped Data
s For sample data ∑ f i ( Mi − x ) 2
s2 =
n −1
s For population data ∑ f i ( Mi − µ ) 2
σ2 =
N Slide 34 Sample Variance for Grouped Data
(Example Continued)
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total fi
8
17
12
8
7
4
2
4
2
6
70 Mi
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569.5
589.5
609.5 Mi - x
-63.7
-43.7
-23.7
-3.7
16.3
36.3
56.3
76.3
96.3
116.3 (M i - x )2 f i (M i - x )2
4058.96 32471.71
1910.56 32479.59
562.16
6745.97
13.76
110.11
265.36
1857.55
1316.96
5267.86
3168.56
6337.13
5820.16 23280.66
9271.76 18543.53
13523.36 81140.18
208234.29 continued Slide 35 Sample Variance for Grouped Data
(Example Continued)
s Sample Variance
s2 = 208,234.29/(70 – 1) = 3,017.89 s Sample Standard Deviation s = 3,017.89 = 54.94
This approximation differs by only $.20 from the actual standard deviation of $54.74. ∑ f i ( Mi − x ) 2
s2 =
n −1 Slide 36 An Example of Grouped Data Using Excel
(Student)
Given
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total fi
8
17
12
8
7
4
2
4
2
6
70 Use Excel, to find:
Mean. Median, Mode, Variance, Standard Deviation, Range, First and Third Quartiles, CV, Interquartile Range. Is the data skewed? Explain. Slide 37 Using SWStat+ For Grouped Data DATA Slide 38 Problem to be Solved by Students
s The following is a frequency distribution of grades for a statistics examination. s s Examination Grade Frequency
40 49 3
50 59 5
60 69
11
70 79
22
80 89
15
90 99 6 s Treating these data as a sample, compute the following: s a.
b.
c.
d. s
s
s
s
s s
s
s Mean, median, and mode
Variance, Standard deviation, and coefficient of variation First, second and the third quartiles 25th and 50th and 75th percentiles
Slide 39 Solution Using SWStat+ 50th Percentile
25th and 75th percentile DATA Slide 40 End of Chapter 4, Part B Slide 41 ...

View
Full Document