Unformatted text preview: 8/24/2009 A Tutorial on R Programming
Ping Ma Introduction
GNU SGNU SPlus A flexible programming language for statistical computing. flexible Multitude Multitude of packages exist for computational biology analyses. BioConductor BioConductor Project. Some Programming Gems: Fantastic Fantastic graphics! Extensibility Extensibility – ports to perl, Python, Java, HTML, etc. perl, Support Support – active user community, especially in computation biology. Open Open source in design and nature. http://www.rhttp://www.rproject.org http://cran.rhttp://cran.rproject.org 1 8/24/2009 R Projects
CRAN – Comprehensive R Archive Network All areas of mathematical and statistical software applications.
Finance Finance modeling, time series, spatial modeling, high performance parallel computing, Outline Outline
Data Data Structures Functionality Functionality Input/Output Input/Output Workspace Workspace Management 2 8/24/2009 Getting Started
Installation: (usually) a snap – download file, unzip and run wizard… Start up: via icon or inside a shell >R R Basics
• Note: everything in R is case sensitive. • Assignments can also made using “ = “. [1] 6 • Variable names may be delimited by a ‘.’ > a.meaningful.name < 6 • Indices always begin with 1. • Comments: # > y < c(1,2,3,4) >y [1] 1 2 3 4 > z < 1:4 >z [1] 1 2 3 4 > z[1] [1] 1 > x < 1 + 5 >x 3 8/24/2009 Mathematical Operators
R as a calculator: >2+3 [1] 5 > 3*4/6 + 2*(1 + 9) [1] 22 > A%*%B # matrix multiplication BuiltBuiltIn R Functions
R comes with a suite a builtin mathematical and statistical functions. > sqrt(54) [1] 7.348469 > mean(1:5) [1] 3 > lm(y~x) # simple linear regression For more specialized functions, look at CRAN or BioConductor. 4 8/24/2009 Matrices
Matrices are 2 dimensional vectors. > A < matrix(1:9, nrow=3, ncol=3, byrow=T) >A [,1] [,2] [,3] [1,] [2,] [3,] 1 4 7 2 5 8 3 6 9 > row.names(A) < c(“a”, “b”, “c”) > colnames(A) < c(“f”, “g”, “h”) >A fgh a123 b456 c789 Extracting and Extending Matrices
Extract information from the matrix using indices.
> A[,1] abc 147 > A[1,] fgh 123 Extend the matrix by adding rows or columns.
> B < cbind(A, c(10,20,30)) >B fgh a 1 2 3 10 b 4 5 6 20 c 7 8 9 30 a b c > C < rbind(A, c(10,20,30)) >C f 1 4 7 g 2 5 8 h 3 6 9 10 20 30 A matrix can only consist of the one data type; e.g. numeric, character. 5 8/24/2009 Interrogating a Matrix Object
Useful functions are: > dim(A) [1] 3 3 > ncol(A) [1] 3 > nrow(A) [1] 3 > length(A) [1] 9 Similarly for a vector object: > length(x) Operating Operating on Matrices
A really useful function for matrices is the apply function. This allows us to apply a specific function to rowwise or columnwise. > apply(A, 1, mean) [1] 2 5 8 # the 1 means rowwise, # use 2 for columnwise. 6 8/24/2009 Data Frame
A data frame is a collection of column vectors. Gpdh Drosophila Fungi Animal Phyla 1.50 40.0 13.2 Sod 25.7 24.9 19.2 Xdh 30.4 13.7 19.2 AvRate 22.4 21.4 17.5 Myr 55 300 600 A useful way to store tablelike information. > molclock < data.frame(Gpdh=c(1.50, 40, 13.2), + Sod=c(25.7, 24.9, 19.2), Xdh=c(30.4, 13.7, 19.2), + AvRate=c(22.4, 21.4, 17.5), Myr=c(55, 300, 600), + row.names=c(“Drosophila”, “Fungi”, “Animal Phyla”)) Working with Data Frame
Extracting data from a data frame object by column, we can use indices or names: > molclock[,1] [1] 1.5 40.0 13.2 > molclock[,”Gpdh”] [1] 1.5 40.0 13.2 For rows: we must use row indices. > molclock[2,] Gpdh Fungi Sod Xdh AvRate Myr 21.4 300 Recall: a data.frame object is a collection of column vectors. 40 24.9 13.7 > class(molclock[,1]) [1] “numeric” > class(molclock[2,]) [1] “data.frame” 7 8/24/2009 List Structures
Up until now, all our data structure objects have needed a uniform data type. List structures are powerful because we can store multiple data types in the same object. > miscObjs < list("actin"=c(1.3, 99.6, 2.45), <+ "gapdh"=matrix(rnorm(100), nrow=10), "atp"=molclock) We extract data from a list using names or indices. > names(miscObjs) names(miscObjs) [1] "actin" "gapdh" "atp" > miscObjs$actin [1] 1.30 99.60 2.45 > miscObjs[[1]] [1] 1.30 99.60 2.45 Visualizing Data: Plot Function
A simple scatter plot: > x.dat < rnorm(100) # 100 N(0,1) rvs > plot(x.dat, xlab="Index", ylab="Normal RVS", + main="Figure 1: Scatter Plot") 8 8/24/2009 Exporting Graphics
In Windows: • right mouse click to copy to clipboard. For most operating systems: > bitmap("file.bmp") > plot(x.dat) > dev.off() You can create export graphics to many file formats – bitmap, jpeg, gif, postscript, etc. # < insert code for making plot here Classes Classes
A class describes the way an object in R is stored. class describes Strings: Strings: “Homo sapiens” Numeric: Numeric: 3.141593 Boolean: Boolean: TRUE, FALSE We can interrogate an object to find out its class: > a < FALSE <> class(a) [1] "logical" > is.numeric(a) [1] FALSE Classes also reflect their data structure, eg. matrix, data.frame, function. 9 8/24/2009 Working with Strings
While Perl or Python are more competent languages for text parsing, R does have capabilities for manipulating and creating strings. Pasting Strings Together > paste(c("Cat", "Dog"), sep="") [1] "CatDog" Splitting Strings > strsplit("Seuss", "") strsplit("Seuss", [[1]] [1] "S" "e" "u" "s" "s" Searching for Patterns > grep("and", "Brown eggs and ham") [1] 1 # grep also lets you search with regexp patterns Booleans Algebra
In R, to test for equality use "==" > 1 == 3 == [1] FALSE > 1 ~= 3 [1] TRUE # inequality Another powerful tip: we can test for inclusion in a vector by asking with "%in%" > x < 1:10 ; even.numbers < seq(from=2, to=10, by=2) <<>x [1] 1 2 3 4 5 6 7 8 9 10 > even.numbers [1] 2 4 6 8 10 > x %in% even.numbers [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE + TRUE We can subset vectors with TRUE/FALSE flags: > x[x %in% even.numbers] [1] 2 4 6 8 10 10 8/24/2009 Missing Values
NA is the allinclusive symbol for a missing value in R. all> mean(c(1, 4, NA)) mean(c(1, [1] NA > mean(c(1, 4, NA), na.rm=T) [1] 2.5 We can test whether an object is a missing value. > NA == NA NA [1] NA # this doesn't work! > is.na(NA) [1] TRUE > na.omit(c(1, 4, NA)) [1] 1 4 Other objects: NaN, Inf. For Loops
For loops are very simple in R. > for( m in 1:3 ){ for( + print(m) } [1] 1 … > for( m in c("actin", "myosin", "gapdh") ){ + print(m) } [1] "actin" … Note: R does not process for loops very quickly, try to avoid them for large data if you can (eg. Use apply) 11 8/24/2009 Conditional Statements
We can use conditional statements to automate tasks and functions. If..Else If..Else Block If( If( condition 1 holds ) then do task 1. Else, do task 2. > if( x > 0 ){ print("positive") } + else{ print("negative") } While While Block While( While( condition 1 holds) then do task 1. If condition 1 no longer holds, stop. > while( x > 0 ){ x < x + rnorm(1) } <You can put the break command inside an if( … ) to break out of the conditional loop. break Writing Writing Your Own Functions
Imagine you need to write a simple function that returns both the mean and the standard deviation of a vector in a list structure. > mean.and.sd < function(x){ mean.and.sd <+ res.mean < mean(x) ; res.sd < sd(x) <<+ res = list(mean=res.mean, sd=res.sd) + return(res) +} > mean.and.sd(rpois(10,5)) $mean [1] 4.4 $sd [1] 0.9660918 You can use the args function to find out what arguments a function needs. > args(mean.and.sd) args(mean.and.sd) [1] function (x) NULL 12 8/24/2009 Inputting Data into R
R has capabilities for reading in data files of many different formats. For simple ASCII text files we can use the read.table function. > my.data < read.table("forbes.txt", header = TRUE) my.data read.table("forbes.txt", > my.data Temp Pressure Lpres 1 194.5 20.79 131.79 2 194.3 20.79 131.79 3 197.9 22.4 135.02 4 198.4 22.67 135.55 > my.data$Temp [1] 194.5 194.3 197.9 198.4 199.4 199.9 200.9 201.1 [9] 201.4 201.3 203.6 204.6 209.5 208.6 210.7 211.9 [17] 212.2 Other readin functions: read.csv, scan, readLines readread.csv, Outputting Data from R
To output data to a simple table text file, we can use write.table. write.table. > write.table(my.data, "my.forbes.txt") write.table(my.data, "my.forbes.txt") > write.table(my.data, "my.forbes.txt", row.names=F) write.table(my.data, "my.forbes.txt", row.names=F) Other write functions: write, cat. 13 8/24/2009 Porting to Other Languages
A port is a piece of software that provides a means to get one programming language to port is communicate with another. The The Omega Project for Statistical Computing An An umbrella project to link different programming languages seamlessly. Some packages available: RSPython, RSPerl, RMatlab. RSPython, (Plus a variety of others). Example: Example: RSPython To To call Python from R: load RSPython, call py commands using .Python(func, args1, args2, …) To To call R from Python: load RS module, RS.call("plot", x, y). Workspace Management
Where am I? > getwd() # returns the working directory > setwd("C://Jess") # sets the working directory > dir() # lists files in working directory > list.files() How can I tell what objects I have? > ls() ls() To remove individual objects use rm(): > rm("name.of.object") To save specific objects use save(): > save(x, file="fileName.Rdata") At a later date, you can load this into your workspace: > load("fileName.RData") 14 8/24/2009 Libraries
Libraries Libraries are a collection of R functions that together perform a specialized analysis or task. library(alr3) Consult Consult CRAN for more: http://cran.us.rproject.org/ http://cran.us.r Helpful Functions
To boot up HTML help files: > help.start() help.start() To pop up a help file on an individual function. > help(function) To seach for help on something around a topic or function: > help.search("plot") To search on a string for something: > apropos("string") apropos("string") 15 8/24/2009 More Info & Resources
For R tutorials and simple documents to learn more about R, consult the R website for lots of resources www.rproject.org/ www.r(go to Documentation > Other > Contributed Documentation Really Really Great HTML Tutorial: Kickstarting R by Jim Lemon by http://cran.rproject.org/doc/contrib/Lemonhttp://cran.rproject.org/doc/contrib/Lemonkickstart/index.html "R "R for Beginners" by Emmanuel Paradis [short pdf] There are also reference cards that contain the most important R functions (and their descriptions) you need to know (like a cheat sheet). "R "R Reference Card" by Jonathon Baron [1 page list] 16 ...
View
Full Document
 Fall '08
 Ma,P
 Type system, Bioconductor

Click to edit the document details