Set 2.pdf - Working with Data in Base R STAT UN2102 Applied Statistical Computing Gabriel Young Columbia University Gabriel Young Lecture 2 Data in R 1

# Set 2.pdf - Working with Data in Base R STAT UN2102 Applied...

• 96

This preview shows page 1 - 10 out of 96 pages.

Working with Data in Base R STAT UN2102 Applied Statistical Computing Gabriel Young Columbia University January 31, 2019 Gabriel Young Lecture 2: Data in R January 31, 2019 1 / 82
Last Time Vectors . Elements must all be the same type. Access like v[5] , create with v <- c() . Matrices . Two dimension (rows and columns) version of array. Access like m[1,3], m[2, ], m[ ,"colname"] . Create with matrix() . Linear Algebra for matrices : matrix multiplication, determinant, inverse. Lists . Elements can all be di erent types. Access like l[[3]], l\$name . Create with list() . Filtering . Accessing elements of a vector based on some criteria. v[v>5] . NA and NULL values . NA is missing data and NULL doesn’t exist. Gabriel Young Lecture 2: Data in R January 31, 2019 2 / 82
Section XI Factors and Tables Gabriel Young Lecture 2: Data in R January 31, 2019 3 / 82
Factors Definition Qualitative data that can assume only a discrete number of values (i.e. categorical data) can be represented as a factor in R . For example, Democrat, Republican, or Independent, Male or Female, Control or Treatment, etc. In R , think of factors as vectors with additional information which is a record of the distinct elements of the factor, called the levels . R automatically treats factors specially in many functions. Gabriel Young Lecture 2: Data in R January 31, 2019 4 / 82
Factors Definition Factors Example > data <- rep(c("Control","Treatment"),c(3,4)) > data # A character vector [1] "Control" "Control" "Control" "Treatment" [5] "Treatment" "Treatment" "Treatment" > group <- factor(data) > group [1] Control Control Control Treatment Treatment [6] Treatment Treatment Levels: Control Treatment The levels of the factor group are Control and Treatment . Gabriel Young Lecture 2: Data in R January 31, 2019 5 / 82
Factors Definition Factors Example > str(group) Factor w/ 2 levels "Control","Treatment": 1 1 1 2 2 2 2 > mode(group) # Numeric? [1] "numeric" > summary(group) Control Treatment 3 4 Gabriel Young Lecture 2: Data in R January 31, 2019 6 / 82
Functions on Factors The split() function takes as input a vector and a factor (or list of factors), splitting the input according to the groups of the factor. The output is a list. Gabriel Young Lecture 2: Data in R January 31, 2019 7 / 82
Functions on Factors The split() function takes as input a vector and a factor (or list of factors), splitting the input according to the groups of the factor. The output is a list. Example Suppose that we knew the ages and sex of the members of the Control and Treatment groups. > group [1] Control Control Control Treatment Treatment [6] Treatment Treatment Levels: Control Treatment > ages <- c(20, 30, 40, 35, 35, 35, 35) > sex <- c("M", "M", "F", "M", "F", "F", "F") Gabriel Young Lecture 2: Data in R January 31, 2019 7 / 82
Functions on Factors Use the split() function to list the ages in each group + sex pair. > split(ages, list(group, sex)) \$Control.F [1] 40 \$Treatment.F [1] 35 35 35 \$Control.M [1] 20 30 \$Treatment.M [1] 35 Split has coerced sex into a factor variable.