1
Chapters 1 and 2: Summary and Display of Univariate Data
Let’s start our introduction to data with a
n example. Suppose that people are interviewed at
UBC and asked a few questions about their study habits. The results are recorded in the
following data table.
Subject
Age
Gender
Hours Spent
Studying /
Week
Most
Common
Study
Location
Stress Level
1
19
M
2
IBLC
Low
2
20
M
4
Koerner
Low
3
20
F
18
IBLC
Medium
4
28
M
10
Coffee Shop
Medium
5
21
F
7
IBLC
Low
6
21
F
4
IBLC
Low
7
20
F
5
Koerner
Medium
8
21
M
7
IBLC
Low
This is the rawest representation of data. When your data set is small, sometimes the best
summary of the data is simply the data itself
–
it’s easy enough to read and digest this table
by
just looking at it. However, more typically, we’re going to have a
big ugly data set, so we need
some nicer ways to summarize it: both numerically and with pictures. This is the ‘theme’ of
chapters 1 and 2.
First, let’s note that data does
not have to be numerical; it can be a label as well.
2
Types of Variables
What is a variable? Let’s note that we’re defining ‘variable’ in a different way from the typical
‘Math’ variable. So…

A variable is, for our purposes at this point, a set of data describing one characteristic of
the measured object or individual.
Each of
‘Subject’, ‘Age’, ‘Gender’ etc. are all variables. We can separate variables into two broad
types: categorical and quantitative.

Categorical variables, as the name implies, denote a ‘category’
, and can be both ordinal
and nominal.
o
An ordinal categorical variable has an implied ordering (eg. Stress level), but we
do not necessarily know the ‘distance’ between the categories.
o
A nominal categorical variable simply denotes different labels (eg. Study
location)

Quantitative variables denote quantities, or numerical, data. There are two types of
quantitative variables: discrete and continuous.
o
If a variable has a countable range, then we call it a discrete variable.
o
If a variable has an uncountable range, then we call it a continuous variable.
Note that there is a strange grey area between discrete and continuous variables, both as data
and both as we will treat it later. Eg: In our set of data, age is recorded as an integer, so perhaps
it’s better characterized as a discrete variable
–
however, we could imagine that the variable
age could be measured to some arbitrary degree of precision, so perhaps it’s better
characterized as a continuous variable. For most cases, though, either the distinction is obvious,
or we are comfortable in assuming that it
isn’t so
important
… don’t worry about it too much for
now.
Just because a variable is represented with numbers does not mean it is a quantitative variable!
The entries for ‘subject’ are numbers from 1 to 8, but they do not have any particular
quantitative meaning
–
they are simply labels saying ‘this row represents the observations for
subject 1’ and so on.
3
Population vs. Sample
As Statisticians, we must always keep the interplay between the sample we collect and the