Working with Data in Base
R
STAT UN2102
Applied Statistical Computing
Gabriel Young
Columbia University
January 31, 2019
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
1 / 82

Last Time
•
Vectors
. Elements must all be the same type. Access like
v[5]
,
create with
v <- c()
.
•
Matrices
. Two dimension (rows and columns) version of array.
Access like
m[1,3], m[2, ], m[ ,"colname"]
. Create with
matrix()
.
•
Linear Algebra for matrices
: matrix multiplication, determinant,
inverse.
•
Lists
. Elements can all be di
↵
erent types. Access like
l[[3]],
l$name
. Create with
list()
.
•
Filtering
. Accessing elements of a vector based on some criteria.
v[v>5]
.
•
NA
and
NULL
values
.
NA
is missing data and
NULL
doesn’t exist.
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
2 / 82

Section XI
Factors and Tables
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
3 / 82

Factors Definition
•
Qualitative data that can assume only a discrete number of values
(i.e.
categorical
data) can be represented as a
factor
in
R
.
•
For example, Democrat, Republican, or Independent, Male or Female,
Control or Treatment, etc.
•
In
R
, think of factors as vectors with additional information which is a
record of the distinct elements of the factor, called the
levels
.
•
R
automatically treats factors specially in many functions.
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
4 / 82

Factors Definition
Factors Example
> data <- rep(c("Control","Treatment"),c(3,4))
> data # A character vector
[1] "Control"
"Control"
"Control"
"Treatment"
[5] "Treatment" "Treatment" "Treatment"
> group <- factor(data)
> group
[1] Control
Control
Control
Treatment Treatment
[6] Treatment Treatment
Levels: Control Treatment
The
levels
of the factor
group
are
Control
and
Treatment
.
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
5 / 82

Factors Definition
Factors Example
> str(group)
Factor w/ 2 levels "Control","Treatment": 1 1 1 2 2 2 2
> mode(group) # Numeric?
[1] "numeric"
> summary(group)
Control Treatment
3
4
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
6 / 82

Functions on Factors
The
split()
function takes as input a vector and a factor (or list of
factors), splitting the input according to the groups of the factor. The
output is a list.
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
7 / 82

Functions on Factors
The
split()
function takes as input a vector and a factor (or list of
factors), splitting the input according to the groups of the factor. The
output is a list.
Example
Suppose that we knew the ages and sex of the members of the Control
and Treatment groups.
> group
[1] Control
Control
Control
Treatment Treatment
[6] Treatment Treatment
Levels: Control Treatment
> ages <- c(20, 30, 40, 35, 35, 35, 35)
> sex <- c("M", "M", "F", "M", "F", "F", "F")
Gabriel Young
Lecture 2: Data in
R
January 31, 2019
7 / 82