As the realisation of this intention requires a well

This preview shows page 30 - 33 out of 151 pages.

of the data. As the realisation of this intention requires a well-defined concept of distance, the measures of variability are meaningful for data relating to metrically scaled 1–D variables X only. One can distinguish two kinds of such measures: (i) simple 2 -data-point measures, and (ii) sophisticated n -data-point measures. We begin with two examples belonging to the first category.
3.2. MEASURES OF VARIABILITY 23 3.2.1 Range For a raw data set { x i } i =1 ,...,n of n observed values for X , the dimensionful range R (metr) simply expresses the difference between the largest and the smallest value in this set, i.e., R := x ( n ) - x (1) . (3.10) The basis of this measure is the ordered data set x (1) x (2) . . . x ( n ) . Alternatively, the range can be denoted by R = Q 4 - Q 0 . SPSS: Analyze Descriptive Statistics Frequencies . . . Statistics . . . : Range 3.2.2 Interquartile range In the same spirit as the range, the dimensionful interquartile range d Q (metr) is defined as the difference between the third quantile and the first quantile of the relative frequency distribution for some X , i.e., d Q := ˜ x 0 . 75 - ˜ x 0 . 25 . (3.11) Alternatively, this is d Q = Q 3 - Q 1 . 3.2.3 Sample variance The most frequently employed measure of variability in Statistics is the dimensionful n -data-point sample variance s 2 (metr), and the related sample standard deviation to be discussed below. Given a raw data set { x i } i =1 ,...,n for X , its spread is essentially quantified in terms of the sum of squared deviations of the n data points from their common mean ¯ x . Due to the algebraic identity ( x 1 - ¯ x ) + . . . + ( x n - ¯ x ) = n X i =1 ( x i - ¯ x ) = n X i =1 x i ! - n ¯ x Eq. (3 . 6) 0 , there are only n - 1 degrees of freedom involved in this measure. The sample variance is thus defined by: (i) From raw data set: s 2 := 1 n - 1 ( x 1 - ¯ x ) 2 + . . . + ( x n - ¯ x ) 2 =: 1 n - 1 n X i =1 ( x i - ¯ x ) 2 . (3.12) alternatively, by the shift theorem : 2 s 2 = 1 n - 1 x 2 1 + . . . + x 2 n - n ¯ x 2 = 1 n - 1 " n X i =1 x 2 i - n ¯ x 2 # . (3.13) 2 That is, the algebraic identity n X i =1 ( x i - ¯ x ) 2 = n X i =1 ( x 2 i - 2 x i ¯ x + ¯ x 2 ) Eq. (3 . 6) n X i =1 x 2 i - n X i =1 ¯ x 2 = n X i =1 x 2 i - n ¯ x 2 .
24 CHAPTER 3. MEASURES FOR UNIVARIATE DISTRIBUTIONS (ii) From relative frequency distribution: s 2 := n n - 1 ( a 1 - ¯ x ) 2 h n ( a 1 ) + . . . + ( a k - ¯ x ) 2 h n ( a k ) =: n n - 1 k X j =1 ( a j - ¯ x ) 2 h n ( a j ) . (3.14) alternatively: s 2 = n n - 1 a 2 1 h n ( a 1 ) + . . . + a 2 k h n ( a k ) - ¯ x 2 = n n - 1 " k X j =1 a 2 j h n ( a j ) - ¯ x 2 # . (3.15) Remarks: (i) We point out that the alternative formulae for a sample variance provided here prove computationally more efficient. (ii) For binned data, when one selects the midpoint of each class interval K j to represent the a j (given the raw data set is no longer accessible), a correction of Eqs. (3.14) and (3.15) by an additional term (1 / 12)( n/n - 1) k j =1 b 2 j h j becomes necessary, assuming uniformly distributed data within each class intervals K j of width b j ; cf. Eq. (8.31).

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture