R_TutorialBW - 8/24/2009 A Tutorial on R Programming Ping...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 8/24/2009 A Tutorial on R Programming Ping Ma Introduction GNU SGNU S-Plus A flexible programming language for statistical computing. flexible Multitude Multitude of packages exist for computational biology analyses. BioConductor BioConductor Project. Some Programming Gems: Fantastic Fantastic graphics! Extensibility Extensibility – ports to perl, Python, Java, HTML, etc. perl, Support Support – active user community, especially in computation biology. Open Open source in design and nature. http://www.rhttp://www.r-project.org http://cran.rhttp://cran.r-project.org 1 8/24/2009 R Projects CRAN – Comprehensive R Archive Network All areas of mathematical and statistical software applications. Finance Finance modeling, time series, spatial modeling, high performance parallel computing, Outline Outline Data Data Structures Functionality Functionality Input/Output Input/Output Workspace Workspace Management 2 8/24/2009 Getting Started Installation: (usually) a snap – download file, unzip and run wizard… Start up: via icon or inside a shell >R R Basics • Note: everything in R is case sensitive. • Assignments can also made using “ = “. [1] 6 • Variable names may be delimited by a ‘.’ > a.meaningful.name <- 6 • Indices always begin with 1. • Comments: # > y <- c(1,2,3,4) >y [1] 1 2 3 4 > z <- 1:4 >z [1] 1 2 3 4 > z[1] [1] 1 > x <- 1 + 5 >x 3 8/24/2009 Mathematical Operators R as a calculator: >2+3 [1] 5 > 3*4/6 + 2*(1 + 9) [1] 22 > A%*%B # matrix multiplication BuiltBuilt-In R Functions R comes with a suite a built-in mathematical and statistical functions. > sqrt(54) [1] 7.348469 > mean(1:5) [1] 3 > lm(y~x) # simple linear regression For more specialized functions, look at CRAN or BioConductor. 4 8/24/2009 Matrices Matrices are 2 dimensional vectors. > A <- matrix(1:9, nrow=3, ncol=3, byrow=T) >A [,1] [,2] [,3] [1,] [2,] [3,] 1 4 7 2 5 8 3 6 9 > row.names(A) <- c(“a”, “b”, “c”) > colnames(A) <- c(“f”, “g”, “h”) >A fgh a123 b456 c789 Extracting and Extending Matrices Extract information from the matrix using indices. > A[,1] abc 147 > A[1,] fgh 123 Extend the matrix by adding rows or columns. > B <- cbind(A, c(-10,-20,-30)) >B fgh a 1 2 3 -10 b 4 5 6 -20 c 7 8 9 -30 a b c > C <- rbind(A, c(-10,-20,-30)) >C f 1 4 7 g 2 5 8 h 3 6 9 -10 -20 -30 A matrix can only consist of the one data type; e.g. numeric, character. 5 8/24/2009 Interrogating a Matrix Object Useful functions are: > dim(A) [1] 3 3 > ncol(A) [1] 3 > nrow(A) [1] 3 > length(A) [1] 9 Similarly for a vector object: > length(x) Operating Operating on Matrices A really useful function for matrices is the apply function. This allows us to apply a specific function to row-wise or columnwise. > apply(A, 1, mean) [1] 2 5 8 # the 1 means row-wise, # use 2 for column-wise. 6 8/24/2009 Data Frame A data frame is a collection of column vectors. Gpdh Drosophila Fungi Animal Phyla 1.50 40.0 13.2 Sod 25.7 24.9 19.2 Xdh 30.4 13.7 19.2 AvRate 22.4 21.4 17.5 Myr 55 300 600 A useful way to store table-like information. > molclock <- data.frame(Gpdh=c(1.50, 40, 13.2), + Sod=c(25.7, 24.9, 19.2), Xdh=c(30.4, 13.7, 19.2), + AvRate=c(22.4, 21.4, 17.5), Myr=c(55, 300, 600), + row.names=c(“Drosophila”, “Fungi”, “Animal Phyla”)) Working with Data Frame Extracting data from a data frame object by column, we can use indices or names: > molclock[,1] [1] 1.5 40.0 13.2 > molclock[,”Gpdh”] [1] 1.5 40.0 13.2 For rows: we must use row indices. > molclock[2,] Gpdh Fungi Sod Xdh AvRate Myr 21.4 300 Recall: a data.frame object is a collection of column vectors. 40 24.9 13.7 > class(molclock[,1]) [1] “numeric” > class(molclock[2,]) [1] “data.frame” 7 8/24/2009 List Structures Up until now, all our data structure objects have needed a uniform data type. List structures are powerful because we can store multiple data types in the same object. > miscObjs <- list("actin"=c(1.3, 99.6, 2.45), <+ "gapdh"=matrix(rnorm(100), nrow=10), "atp"=molclock) We extract data from a list using names or indices. > names(miscObjs) names(miscObjs) [1] "actin" "gapdh" "atp" > miscObjs$actin [1] 1.30 99.60 2.45 > miscObjs[[1]] [1] 1.30 99.60 2.45 Visualizing Data: Plot Function A simple scatter plot: > x.dat <- rnorm(100) # 100 N(0,1) rvs > plot(x.dat, xlab="Index", ylab="Normal RVS", + main="Figure 1: Scatter Plot") 8 8/24/2009 Exporting Graphics In Windows: • right mouse click to copy to clipboard. For most operating systems: > bitmap("file.bmp") > plot(x.dat) > dev.off() You can create export graphics to many file formats – bitmap, jpeg, gif, postscript, etc. # <- insert code for making plot here Classes Classes A class describes the way an object in R is stored. class describes Strings: Strings: “Homo sapiens” Numeric: Numeric: 3.141593 Boolean: Boolean: TRUE, FALSE We can interrogate an object to find out its class: > a <- FALSE <> class(a) [1] "logical" > is.numeric(a) [1] FALSE Classes also reflect their data structure, eg. matrix, data.frame, function. 9 8/24/2009 Working with Strings While Perl or Python are more competent languages for text parsing, R does have capabilities for manipulating and creating strings. Pasting Strings Together > paste(c("Cat", "Dog"), sep="") [1] "CatDog" Splitting Strings > strsplit("Seuss", "") strsplit("Seuss", [[1]] [1] "S" "e" "u" "s" "s" Searching for Patterns > grep("and", "Brown eggs and ham") [1] 1 # grep also lets you search with regexp patterns Booleans Algebra In R, to test for equality use "==" > 1 == 3 == [1] FALSE > 1 ~= 3 [1] TRUE # inequality Another powerful tip: we can test for inclusion in a vector by asking with "%in%" > x <- 1:10 ; even.numbers <- seq(from=2, to=10, by=2) <<>x [1] 1 2 3 4 5 6 7 8 9 10 > even.numbers [1] 2 4 6 8 10 > x %in% even.numbers [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE + TRUE We can subset vectors with TRUE/FALSE flags: > x[x %in% even.numbers] [1] 2 4 6 8 10 10 8/24/2009 Missing Values NA is the all-inclusive symbol for a missing value in R. all> mean(c(1, 4, NA)) mean(c(1, [1] NA > mean(c(1, 4, NA), na.rm=T) [1] 2.5 We can test whether an object is a missing value. > NA == NA NA [1] NA # this doesn't work! > is.na(NA) [1] TRUE > na.omit(c(1, 4, NA)) [1] 1 4 Other objects: NaN, Inf. For Loops For loops are very simple in R. > for( m in 1:3 ){ for( + print(m) } [1] 1 … > for( m in c("actin", "myosin", "gapdh") ){ + print(m) } [1] "actin" … Note: R does not process for loops very quickly, try to avoid them for large data if you can (eg. Use apply) 11 8/24/2009 Conditional Statements We can use conditional statements to automate tasks and functions. If..Else If..Else Block If( If( condition 1 holds ) then do task 1. Else, do task 2. > if( x > 0 ){ print("positive") } + else{ print("negative") } While While Block While( While( condition 1 holds) then do task 1. If condition 1 no longer holds, stop. > while( x > 0 ){ x <- x + rnorm(1) } <You can put the break command inside an if( … ) to break out of the conditional loop. break Writing Writing Your Own Functions Imagine you need to write a simple function that returns both the mean and the standard deviation of a vector in a list structure. > mean.and.sd <- function(x){ mean.and.sd <+ res.mean <- mean(x) ; res.sd <- sd(x) <<+ res = list(mean=res.mean, sd=res.sd) + return(res) +} > mean.and.sd(rpois(10,5)) $mean [1] 4.4 $sd [1] 0.9660918 You can use the args function to find out what arguments a function needs. > args(mean.and.sd) args(mean.and.sd) [1] function (x) NULL 12 8/24/2009 Inputting Data into R R has capabilities for reading in data files of many different formats. For simple ASCII text files we can use the read.table function. > my.data <- read.table("forbes.txt", header = TRUE) my.data read.table("forbes.txt", > my.data Temp Pressure Lpres 1 194.5 20.79 131.79 2 194.3 20.79 131.79 3 197.9 22.4 135.02 4 198.4 22.67 135.55 > my.data$Temp [1] 194.5 194.3 197.9 198.4 199.4 199.9 200.9 201.1 [9] 201.4 201.3 203.6 204.6 209.5 208.6 210.7 211.9 [17] 212.2 Other read-in functions: read.csv, scan, readLines readread.csv, Outputting Data from R To output data to a simple table text file, we can use write.table. write.table. > write.table(my.data, "my.forbes.txt") write.table(my.data, "my.forbes.txt") > write.table(my.data, "my.forbes.txt", row.names=F) write.table(my.data, "my.forbes.txt", row.names=F) Other write functions: write, cat. 13 8/24/2009 Porting to Other Languages A port is a piece of software that provides a means to get one programming language to port is communicate with another. The The Omega Project for Statistical Computing An An umbrella project to link different programming languages seamlessly. Some packages available: RSPython, RSPerl, RMatlab. RSPython, (Plus a variety of others). Example: Example: RSPython To To call Python from R: load RSPython, call py commands using .Python(func, args1, args2, …) To To call R from Python: load RS module, RS.call("plot", x, y). Workspace Management Where am I? > getwd() # returns the working directory > setwd("C://Jess") # sets the working directory > dir() # lists files in working directory > list.files() How can I tell what objects I have? > ls() ls() To remove individual objects use rm(): > rm("name.of.object") To save specific objects use save(): > save(x, file="fileName.Rdata") At a later date, you can load this into your workspace: > load("fileName.RData") 14 8/24/2009 Libraries Libraries Libraries are a collection of R functions that together perform a specialized analysis or task. library(alr3) Consult Consult CRAN for more: http://cran.us.r-project.org/ http://cran.us.r- Helpful Functions To boot up HTML help files: > help.start() help.start() To pop up a help file on an individual function. > help(function) To seach for help on something around a topic or function: > help.search("plot") To search on a string for something: > apropos("string") apropos("string") 15 8/24/2009 More Info & Resources For R tutorials and simple documents to learn more about R, consult the R website for lots of resources www.r-project.org/ www.r(go to Documentation > Other > Contributed Documentation Really Really Great HTML Tutorial: Kickstarting R by Jim Lemon by http://cran.r-project.org/doc/contrib/Lemonhttp://cran.r-project.org/doc/contrib/Lemon-kickstart/index.html "R "R for Beginners" by Emmanuel Paradis [short pdf] There are also reference cards that contain the most important R functions (and their descriptions) you need to know (like a cheat sheet). "R "R Reference Card" by Jonathon Baron [1 page list] 16 ...
View Full Document

This note was uploaded on 12/12/2010 for the course STAT 425 taught by Professor Ma,p during the Fall '08 term at University of Illinois, Urbana Champaign.

Ask a homework question - tutors are online