Accountability Modules
Data Analysis: Describing Data - Descriptive Statistics
Texas State Auditor's Office, Methodology Manual, rev. 5/95
Data Analysis: Describing Data - Descriptive Statistics - 1
WHAT IT IS
Descriptive statistics include the numbers, tables, charts, and graphs used to
Return to Table of Contents
describe, organize, summarize, and present raw data.
Descriptive statistics are
most often used to examine:
+
central tendency
(location) of data, i.e. where data tend to fall, as measured
by the mean, median, and mode.
+
dispersion
(variability) of data, i.e. how spread out data are, as measured
by the variance and its square root, the standard deviation.
+
skew
(symmetry) of data, i.e. how concentrated data are at the low or high
end of the scale, as measured by the skew index.
+
kurtosis
(peakedness) of data, i.e. how concentrated data are around a
single value, as measured by the kurtosis index.
Any description of a data set should include examination of the above.
As a
rule, looking at central tendency via the mean, median, and mode and dispersion
via the variance or standard deviation is not sufficient.
(See the definitions
below for more details.)
DEFINITIONS
The following definitions are vital in understanding descriptive statistics:
+
Variables
are quantities or qualities that may assume any one of a set of
values.
Variables may be classified as nominal, ordinal, or interval.
—
Nominal
variables use names, categories, or labels for qualitative
values.
Typical nominal variables include gender, ethnicity, job title,
and so forth.
—
Ordinal
variables, like nominal variables, are categorical variables.
However, the order or rank of the categories is meaningful.
For
example, staff members may be asked to indicate their satisfaction
with a training course on an ordinal scale ranging from “poor” to
“excellent.”
Such categories could be converted to a numerical scale
for further analysis.
—
Interval
variables are purely numeric variables.
The nominal and
ordinal variables noted above are discrete since they do not permit
making statements about degree, e.g., “Person A is three times more
male than person B” or “Person A rated the course as five times more
excellent than person B.”
Interval variables are continuous, and the
difference between values is both meaningful and allows statements
about extent or degree.
Income and age are interval variables.
+
Frequency distributions
summarize and compress data by grouping them
into classes and recording how many data points fall into each class.
The
frequency distribution is the foundation of descriptive statistics.
It is a
prerequisite for the various graphs used to display data and the basic
statistics used to describe a data set, such as the mean, median, mode,
variance, standard deviation, etc.
(See the module on
Frequency
Distribution
for more information.)
+
Measures of Central Tendency
indicate the middle and commonly
occurring points in a data set.
The three main measures of central tendency
are discussed below.