Stat133Lecture2

# Stat133Lecture2 - A few announcements: If you haven’t...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A few announcements: If you haven’t gotten your computer account, be sure to email Daisy (yanhuang@stat.berkeley.edu) ASAP. If you are just joining the course this week, please see me after class, in ofﬁce hours, or send me an email if you have not done so already. Tuesday, September 2, 2008 1 Last time in our introduction to R, we learned how to • start and quit R in interactive mode • do basic calculations in R • assign, print, list, and remove variables • save some or all of the variables in the workspace • ﬁnd the arguments for a function and use the help system • create numeric, character, and logical vectors and concatenate them using c() • name the elements of a vector • extract elements of a vector ﬁve different ways Tuesday, September 2, 2008 2 A few notes before we move on... You can assign values to variables using = rather than <- if you like. You can use c to concatenate existing vectors. > x1 <> x2 <> x3 <> c(x1, [1] 1 8 c(1, 8) 2:5 4 x2, x3) 23454 Remember that unlike other languages you may have used, R does not start indexing with 0. Also, it does not allow mixing of positive and negative subscripts. (Why not?) Tuesday, September 2, 2008 3 Indexing by exclusion can be used to remove elements of a vector. >x >x [1] >x >x [1] <- 1:5 12345 <- x[-c(1,3,5)] 24 There was a question about indexing by name when the names are not unique. It appears that R returns only the ﬁrst element with that name. So I’d avoid repeating names. > x <- 1:2 > names(x) <- c("a", "a") > x["a"] a 1 Tuesday, September 2, 2008 4 Today, we’ll cover • missing values and other special values • assigning parts of a vector using indexing • vector arithmetic and the recycling rule • making patterned vectors • some built-in summary functions for vectors • basic manipulation of character vectors • logical vectors and Boolean algebra • a new data type: factors Next time: more complicated data structures, reading data into R Next week: graphics Tuesday, September 2, 2008 5 The missing value symbol is NA. Note that this is different from “NA”, so don’t include the quotation marks. You can check for the presence of NA values using the is.na function. > x <- c(1, 5, NA) > is.na(x) [1] FALSE FALSE TRUE Other special values are NaN, for “not a number,” which typically arises when you try to compute an indeterminate form such as 0/0. The result of dividing a non-zero number by zero is Inf (or -Inf). Tuesday, September 2, 2008 6 In general, the same indexing may be used to assign values to elements of a vector. Make sure the vector exists ﬁrst, or you will get an error. Can you guess what x will look like after each of the following lines? > > > > > > > > x <- 1:10 names(x) <- letters[1:10] x[1:2] <- 2:1 # By inclusion x[-(1:2)] <- 10:3 # By exclusion x["a"] <- 100 # By name x[x==100] <- NA # By logical index x <- 10 # No index x <- 10 # Watch out - what happens here? Tuesday, September 2, 2008 7 A very important feature of R is that it can carry out vectorized calculations. What this means is that basic arithmetic, as well as many built-in R functions, will operate on each element of a vector. This avoids much of the looping that’s used in lower-level languages. > x <- 1:3 > x * 10 [1] 10 20 30 > x^2 [1] 1 4 9 > y <- 0:2 >x+y [1] 1 3 5 >x/y [1] Inf 2.0 1.5 Tuesday, September 2, 2008 8 When the vectors in a calculation are of different lengths, R follows the recycling rule. That is, it starts repeating elements from the shorter one. > x <- 1:3 > y <- 1:2 >x+y [1] 2 4 4 Warning message: In x + y : longer object length is not a multiple of shorter object length We’ve actually used this before. It would be a good exercise for you to go through the notes so far and identify where R is applying the recycling rule. Tuesday, September 2, 2008 9 R has a number of built-in functions for making patterned vectors, including seq and rep. We’ve seen “:” many times, which is just a special case of the seq function. > 1:5 [1] 1 2 3 4 5 > 5:1 [1] 5 4 3 2 1 > seq(0, 10, by = 2) [1] 0 2 4 6 8 10 > seq(0, 0.5, length = 6) [1] 0.0 0.1 0.2 0.3 0.4 0.5 > seq(1, 0, by = -0.1) [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 > rep(c(0, 1), times = 5) [1] 0 1 0 1 0 1 0 1 0 1 > rep(letters[1:5], each = 2) [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" Tuesday, September 2, 2008 10 R also has many built-in summary functions. > x <- rnorm(100) > summary(x) Min. 1st Qu. Median Mean -1.92500 -0.71430 -0.19300 -0.09377 > mean(x) [1] -0.09377121 > min(x) [1] -1.925202 > max(x) [1] 2.682179 > range(x) [1] -1.925202 2.682179 > length(x) [1] 100 > sum(x) [1] -9.377121 > prod(x) [1] 1.482105e-25 Tuesday, September 2, 2008 3rd Qu. 0.49810 Max. 2.68200 11 A handy way to make patterned character vectors is to use the paste function. > args(paste) function (..., sep = " ", collapse = NULL) NULL The help says . . . represents “one or more R objects, to be converted to character vectors.” This actually depends on the function, but “one or more R objects” is a good way to think of it for now. For another example, see the help for c(). Type help(paste) to see more about how this function works. Tuesday, September 2, 2008 12 Some examples using paste > paste("Iteration", 1:3) [1] "Iteration 1" "Iteration 2" "Iteration 3" > paste("Iteration", 1:3, sep = "") [1] "Iteration1" "Iteration2" "Iteration3" > words <- c("Hi", "everyone") > paste(words, collapse = " ") [1] "Hi everyone" > paste(letters[1:5], collapse = "-") [1] "a-b-c-d-e" > paste("Iteration", 1:3, sep = "", collapse = "-") [1] "Iteration1-Iteration2-Iteration3" Tuesday, September 2, 2008 13 The substr function allows us to extract parts of a string. > some.letters <- paste(letters[1:5], collapse = "-") > some.letters [1] "a-b-c-d-e" > substr(some.letters, start = 1, stop = 3) [1] "a-b" It also allows us to assign parts of a string. > substr(some.letters, start = 1, stop = 3) <- "A*B" > some.letters [1] "A*B-c-d-e" We’ll talk a lot more about working with text later in the course. Tuesday, September 2, 2008 14 We learned that one of the three main data types in R is a logical vector, which is either TRUE or FALSE. To understand how R operates on logical vectors, you need to know a bit about Boolean algebra. Boolean algebra is a mathematical formalization of the truth or falsity of statements. It has three operations, which we’ll call “not,” “or,” and “and.” Boolean algebra tells us how to evaluate the truth or falsity of compound statements that are built using these operations. For example, if A and B are statements, some compound statements are A and B (not A) or B Tuesday, September 2, 2008 15 The “not” operation just causes the statement following it to switch its truth value. So (not TRUE) is FALSE and (not FALSE) is TRUE. The compound statement A and B is TRUE only if both A and B are TRUE. The compound statement A or B is TRUE if either or both A or B is TRUE. In R, we write ! for “not,” & for “and,” and | for “or.” Note: all of these are vectorized! > A <- c(TRUE, TRUE, FALSE, FALSE) > B <- c(TRUE, FALSE, TRUE, FALSE) > !A [1] FALSE FALSE TRUE TRUE >A&B [1] TRUE FALSE FALSE FALSE >A|B [1] TRUE TRUE TRUE FALSE Tuesday, September 2, 2008 16 We often need to test various conditions using the relational operators. Again, these are vectorized and follow the recycling rule. >x >x [1] >x [1] >x [1] >x [1] >x [1] >x [1] <- 1:5 >2 FALSE FALSE TRUE TRUE TRUE <2 TRUE FALSE FALSE FALSE FALSE == 2 FALSE TRUE FALSE FALSE FALSE >= 2 FALSE TRUE TRUE TRUE TRUE <= 2 TRUE TRUE FALSE FALSE FALSE != 2 TRUE FALSE TRUE TRUE TRUE Tuesday, September 2, 2008 17 Two other useful functions that operate on logical vectors are all and any. Can you guess what they do? Logical vectors in R are just special representations of numeric vectors ﬁlled with 1’s and 0’s.Treating them as 1’s and 0’s in calculations where we’d otherwise use their numeric value is one of those instances in which implicit coercion is ok, even helpful. > x <- rnorm(1000) > sum(x > 0) # Number of times the condition is TRUE [1] 468 > mean(x > 0) # Proportion of times the condition is TRUE [1] 0.468 > y <- x * (x > 0) # Multiplying by an indicator variable > min(y) [1] 0 Tuesday, September 2, 2008 18 Factors are a special storage class in R used for categorical data. > group <- rep(c("control", "treatment"), each = 2) > group [1] "control" "control" "treatment" "treatment" > group <- factor(group) > group [1] control control treatment treatment Levels: control treatment > levels(group) [1] "control" "treatment" Because the levels of a factor are internally coded as integers, this is more efﬁcient than using character vectors. However, we still have the advantage of seeing what the levels represent (rather than just the integer codes). Tuesday, September 2, 2008 19 ...
View Full Document

## This note was uploaded on 10/08/2010 for the course STAT 133 taught by Professor Staff during the Spring '08 term at University of California, Berkeley.

Ask a homework question - tutors are online