1
This sample is drawn from the Survey of Consumer Finance (SCF), 1996.
This large survey
questionnaire was completed by almost 100,000 adult Canadians and provides information on sources of
income, hours of work and family characteristics during 1995.
Further information about the SCF is
available at http://trex.econ.uoguelph.ca/dprescot/courses/scf_info.htm.
The subsample of 3,921
individuals used here is a random subset of the full sample.
Some restrictions were imposed when the
sample was drawn.
In particular, only workers who stated they worked full-time throughout the year
were included.
2
Refer to the appendix of this chapter for details on the properties of the summation operator,
Chapter 1
Univariate Distributions
1 Descriptive Statistics
The most basic application of statistical concepts is to describe data.
In many situations large
quantities of data are available to researchers and typically, the most urgent problem is to find a way of
presenting the data so that the most important features can be highlighted.
One useful approach is to
construct a diagram known as a histogram
for each variable.
Figure 1.1 is a histogram that was
constructed from 3,921 observations on the hourly pay earned by full-time Canadian workers in 1995.
1
The data have been sorted into 10 bins.
The centre of each bin is recorded on the horizontal axis.
For
example, the first bin contains all the wage rates in the sample that lie between $2.00 and $6.00 per hour
- its centre is at $4.00 per hour.
The number of observations within a bin is called the frequency
and this
type of histogram is known as a frequency distribution
because it shows how the frequencies are
distributed amongst the bins.
Since each observation falls in only one bin, the sum of the frequencies is
the sample size, 3,921. By rescaling the vertical axis, the heights of the bars in Figure 1.1 can also be
interpreted as the relative frequencies
, which are obtained by dividing each frequency by the sample size.
For example, the relative frequency of the first bin is 177/3921 = 0.045
In other words, 4.5% of the
sample falls in the first bin.
Clearly, the sum of the relative frequencies (or shares) must be unity.
It will be useful if some notation is used to refer to key concepts.
The size of the entire sample is
defined to be n (n = 3,921 in the example).
The number of bins is m, where m < n and in the wage
example m = 10.
The frequency of observations in the j
th
bin is denoted by f
j
for j = 1, 2, .
.., m.
In the
example, f
1
= 177.
The sum of the frequencies must equal the total number of observations in the
sample
2
: