Empirical Cumulative Distribution Function as an Informative Tool
Interquartiles Range:
The interquartile range (IQR) describes the extent for which the middle
50% of the observations scattered or dispersed. It is the distance between the first and the third
quartiles:
IQR = Q3 - Q1,
which is twice the Quartile Deviation. For data that are
skewed
, the
relative dispersion
, similar
to the coefficient of variation (C.V.) is given (provided the denominator is not zero) by the
Coefficient of Quartile Variation:
CQV = (Q3-Q1) / (Q3 + Q1).
Note that almost all statistics that we have covered up to now can be obtained and understood
deeply by graphical method using
Empirical (i.e., observed) Cumulative Distribution Function
(ECDF)
JavaScript. However, the numerical
Descriptive Statistics
provides a complete set of
information about all statistics that you ever need.
The Duality between the ECDF and the Histogram:
Notice that the empirical (i.e., observed)
cumulative distribution function (
ECDF
) indicates by its height at a particular pointthat is
numerically equal to the area in the corresponding histogram to the left of that point. Therefore,
either or both could be used depending on the intended applications.
Mean Absolute Deviation (MAD):
A simple measure of variability is the mean absolute
deviation:
MAD =
Σ
|(x
i
- )| / n.
The mean absolute deviation is widely used as a performance measure to assess the quality of the
modeling, such
forecasting techniques
. However, MAD does not lend itself to further use in
making inference; moreover, even in the error analysis studies, the variance is preferred since
variances of independent (i.e., uncorrelated) errors are additive; however MAD does not have
such a nice feature.
The MAD is a simple measure of variability, which unlike range and quartile deviation, takes
every item into account, and it is simpler and less affected by extreme deviations. It is therefore
often used in small samples that include extreme values.
The mean absolute deviation theoretically should be measured from the median, since it is at its
minimum; however, it is more convenient to measure the deviations from the mean.
As a numerical example, consider the price (in $) of same item at 5 different stores: $4.75, $5.00,
$4.65, $6.10, and $6.30. The mean absolute deviation from the mean is $0.67, while from the
median is $0.60, which is a better representative of deviation among the prices.
Variance: