This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 1 R for beginners
Emmanuel Paradis
¢ ¡ ¦ ¤ 1 What is R ? 3 £ ¥ § 2 The few things to know before starting 5 ¥ ¥ ¥ ¥ 2.1 The operator <2.2 Listing and deleting the objects in memory
2.3 The online help 5
5
6 ¨ © " $ & # ! % ' 3 Data with R 8 ¥ " ¥ ! 3.1 The ‘objects’
3.2 Reading data from files
3.3 Saving data
3.4 Generating data
3.4.1 Regular sequences
3.4.2 Random sequences
3.5 Manipulating objects
3.5.1 Accessing a particular value of an object
3.5.2 Arithmetics and simple functions
3.5.3 Matrix computation 8
8
10
11
11
13
13
13
14
16
( % 4 ( 0 2 # 2 0 ¨ ) % 3 1 0 7 5 % % 1 6 0 9 8 % 1 1 % % # A @ @ % # A @ " % ! " % ! 2 4 4 1 @ 0 @ 0 6 2 # @ 4 B 1 0 1 0 # % 0 # 1 @ 3 1 # & # 8 # 0 % 3 ¨ 0 % C % 0 1 B # % @ 3 # D ¤ ¦ ¦ ¤ 18 ¥ G F 4.1 Managing graphic windows
4.1.1 Opening several graphic windows
4.1.2 Partitioning a graphic window
4.2 Graphic functions
4.3 Lowlevel plotting commands
4.4 Graphic parameters 18
18
18
19
20
21 0 ( 0 0 8 C ( E 4 Graphics with R
H 1 # H 1 1 I ( H H # 6 H P # H 9 # 0 7 ¨ # @ 4 4 4 % Q 1 3 3 # 6 H 0 ¨ 8 1 9 8 8 ¤ 3 ¦ # R R ¦ ¦ 5 Statistical analyses with R 23 6 The programming language R 26
& 26
27
W ¥ ¥ ¥ G U ¨ ¥ S T 6.1 Loops and conditional executions
6.2 Writing you own functions
@ & # 0 # 2 0 0 ¨ V 1 & # 1 1 @ ¤ H ¦ @ ¤ $ 1 ` 7 How to go farther with R ? 30 8 Index 31 Y ¥ ¥ F ¥ F X 2 The goal of the present document is to give a starting point for people newly interested in R. I
tried to simplify as much as I could the explanations to make them understandables by all,
while giving useful details, sometimes with tables. Commands, instructions and examples are
written in Courier font.
¦ ¦ R R ` ¦ ¦ ¦ ¦ ¤ ` R ¤ d ¥ ¥ S ¥ F § S S G T G F ¥ S G S ¥ ¥ F ¥ ¥ S c b a ¥ S U F G § ¥ ¥ T ¥ ¥ ¥ ¥ G U U G U T § ¥ G ¥ ¥ ¥ ¥ ¥ U U U c ¥ ¥ ¥ e I thank Julien Claude, Christophe Declercq, Friedrich Leisch and Mathieu Ros for their
comments and suggestions on an earlier version of this document. I am also grateful to all the
members of the R Development Core Team for their considerable efforts in developing R and
animating the discussion list ‘rhelp’. Thanks also to the R users whose questions or
comments helped me to write “R for beginners”.
f ¥ ¥ ¥ ¥ ¥ ¥ G U ¥ ¥ U ¥ U U ¥ U ¥ c U § § ¥ G ¥ ¥ ¥ G c ¥ ¥ U U ¥ c ¥ ¥ G ¥ U § ¥ ¥ ¥ G § ¦ R © 2000, Emmanuel Paradis (20 octobre 2000)
h g ¥ F F U S b b ¥ 3
% 1 What is R ?
i q r § p § R is a statistical analysis system created by Ross Ihaka & Robert Gentleman (1996, J.
Comput. Graph. Stat., 5: 299314). R is both a language and a software; its most remarkable
features are:
s ¥ ¥ ¥ ¥ ¥ T T § ¥ ¥ T § w x ¥ ¥ ¥ ¥ U v u v u t ¥ U ¦ R ¦ ` ¦ R ¤ ¦ ` ` an effective data handling and storage facility,
a suite of operators for calculations on arrays, matrices, and other complex operations,
a large, coherent, integrated collection of tools for statistical analysis,
numerous graphical facilities which are particularly flexible, and
a simple and effective programming language which includes many facilities.
¥ ¥ ¦ R ¤ S ¦ ¥ ¦ ¥ S F G G y b ¥ F T R ¥ F S ¥ b F F T ¥ S R S c ` ` ¦ ¥ S ¥ ¥ S ¥ F ¥ S F U ¥ ¥ F ¥ F ¥ •
•
•
•
• G U ¥ T § ¥ ¥ G T U G ¥ T R U U ¥ R ¤ U G U § R ¤ ` R G c ¦ ¦ R ¦ R is a language considered as a dialect of the language S created by the AT&T Bell
Laboratories. S is available as the software SPLUS commercialized by MathSoft (see
http://www.splus.mathsoft.com/ for more information). There are importants differences in
the conceptions of R and S, but they are not of interest to us here: those who want to know
more on this point can read the paper by Gentleman & Ihaka (1996) or the RFAQ
(http://cran.rproject.org/doc/FAQ/RFAQ.html), a copy of which is alse distributed with the
software.
a a ¥ ¥ ¥ F T ` ¤ § ¦ R ¥ ¥ S U F ¦ ` ¥ R ¥ F b T ¦ ¤ ` ` ¦ S § R b ¦ ¦ § ¥ ¤ ¦ F c ` ¥ F ¦ S U ¦ h ¦ ` `
¤ F R
¤ a ¥ S S ¥ F ¥ S F G b F F ¥ S b F S F b F ¥ ¥ b b G U ¥ G § ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ U ¥ ¥ T ¥ G U § ¥ ¥ ¥ G T ¥ G
¥ ¥ ¥ G § ¥ ¥
¥
T
¥ ¥ G U G ¥ G ¥ ` ¤ ` ¦ ¦ R § ¤ ` ¤ § ¦ ¦ R ` ¦ R is freely distributed on the terms of the GNU Public Licence of the Free Software
Foundation (for more information: http://www.gnu.org/); its development and distribution are
carried on by several statisticians known as the R Development Core Team. A keyelement in
this development is the Comprehensive R Archive Network (CRAN).
¥ ¥ F ¦ § S ¦ R ¥ S ¦ ¥ F U ¥ b U ¦ ¥ F h ¥ F
¥ S S b G ¥ F c S U ¥ ¥ F ¥ S ¤
` ¥ F U ¦ F T ¦ ` ¦ ¥ G ¥ S b F S F b F S S U § ¥ ¥ ¥ G T ¥ ¥ ¥ c c T ¥ ¥ c ¥ G c G ¥ c § R is available in several forms: the sources written in C (and some routines in Fortran77)
ready to be compiled, essentially for Unix and Linux machines, or some binaries ready for use
(after a very easy installation) according to the following table.
¥ ¥ ¥ ¥ ¥ U U c c § § ¥ U T U R § ¦ R R ` ¤ ¦ ¥ ¥ S
7 7 7 ¨ f 7 8 © f g ( P f C F T F c ! e 1 0 h i 1 0 4 V H ¨ ` ¥ T 0 ¨ W T ¦ ¥ S P k ) R S Windows 95/98/NT4.0/2000
Linux (Debian 2.2, Mandrake 7.1, RedHat
6.x, SuSe 5.3/6.4/7.0)
l F R ¥ S Operating system(s)
¦ ¥ Intel d ¥ G T Architecture
Q 1 1 j D 1 @ j m 7 & f W f % & 5 5 @ j PPC M a c OS
LinuxPPC 5.0 Alpha Systems Digital Unix 4.0
Linux (RedHat 6.x) Sparc Linux (RedHat 6.x) 5 I o C # 7 n n o @ 7 5 p & m $ h @ & m h ¤ ¦ ¤ ` ¦ § # @ ¤ 5 ¦ ¦ § ¤ R R ¦ R ¦ ` ¤ The files to install these binaries are at http://cran.rproject.org/bin/ (except for the Macintosh
version 1) where you can find the installation instructions for each operating system as well.
¥ ¥ ¥ S ¥ F G S y ¥ ¥ F F G F ¥ S F ¥ G T ¥ ¥ ¥ G F ¥ ¥ F ¥ ¥ ¥ ¥ S U ¥ U ¥ G ¥ a ¥ S ¥ T R is a language with many functions for statistical analyses and graphics; the latter are
visualized immediately in their own window and can be saved in various formats (for
example, jpg, png, bmp, eps, or wmf under Windows, ps, bmp, pictex under Unix). The
results from a statistical analysis can be displayed on the screen, some intermediate results (P
values, regression coefficients) can be written in a file or used in subsequent analyses. The R
language allows the user, for instance, to program loops of commands to successively analyse
¥ ¥ c ¥ T U T U § ¥ ¥ U ¤ c ¥ c ¦ T ¦ a § ¦ U ` § c R £ ¥ S y F R S U G y ¦ G b G S ¦ F ¤ S R b U ¦ F G § ¦ G R b S R ¦ G G G ¦ b ` y R q ¥ ¥ ¥ F U b ¤ ¥ F S b R S § ¥ F S ¦ R ¦ G T ` S ¦ ¦ ¥ ¥ ¥ S T § ¦ b ¦ ` ` F F U ¦ R a ¥ R ¥ S T R S U S U F U ¦ S ` ¥ ¥ S F S R 4 ! 4 T 0 c ! ¥ S U 4 0 k 0 b b G 4 S ¦ ¥ S T S b " F 2 F 0 0 F ¤ R U c R ¥ S ! ` ¥ G F S 2 F ! F " U S U 2 0 The Macintosh port of R has just been finished by Stefano Iacus <[email protected]>, and should be available
soon on CRAN.
5 e 6 @ 1 C t r © 1 s 1 # 3 @ # 1 $ 1 1 @ 1 # j g B ) o 1 1 R 4
§ § several data sets. It is also possible to combine in a single program different statistical
functions to perform more complex analyses. The R users may benefit of a large number of
routines written for S and available on internet (for example: http://stat.cmu.edu/S/), most of
these routines can be used directly with R.
¥ ¥ ¥ ¥ ¥ ¥ G ` § R ` ¦ ` ¥ ¥ G § ¤ R c R ` ¦ ` a ¥ F b S U ` ¥ F
S
¥ ¥ b U F ¤
¥ ¥ b U b T U S T R ` ¦ ¥ ¥ G G b R b § R F b b F F S ¦ S ` S S S c ¤ ¦ R F ¦ ¥ S ¦ ¥ F ¥ G ¥ F y G y U ¦ ¥ ¥ S F S § ¦ ¤ ¦ R ¦ § ¦ ` ¦ R ¦ ` F U ¦ ¤ ¥ ¥ F T S U S R ¥ F U At first, R could seem too complex for a nonspecialist (for instance, a biologist). This may
not be true actually. In fact, a prominent feature of R is its flexibility. Whereas a classical
software (SAS, SPSS, Statistica, ...) displays (almost) all the results of an analysis, R stores
these results in an object, so that an analysis can be done with no result displayed. The user
may be surprised by thus, but such a feature is very useful. Indeed, the user can extract only
the part of the results which is of interest. For example, if one runs a series of 20 regressions
and wants to compare the different regression coefficients, R can display only the estimated
coefficients: thus the results will take 20 lines, whereas a classical software could well open
20 results windows. One could cite many other examples illustrating the superiority of a
system such as R compared to classical softwares; I hope the reader will be convinced of this
after reading this document.
R ¦ ` a ¥ ¥ b T R ¥ S ¦ R ¤ ¦ R ¦ § S ¦ ¥ F R G ` ¦ ¦ S S ` F G y ` ¥ b b ¦ R ¥ F ¦ R § d ¥ T R ¥ F y ` R ¤ R R ¥ ¥ S U R S R b F ¥ G S ¦ ¦ T ¥ U ¥ S T ¥ S ¥ b ¥ ¥ ¥ ¥ G T S ` ¥ F U F U ¦ h ¥ F ¥ F U ` £ F § u ¥ U ¥ G T ¥ U ¥ ¥ T ¥ ¥ ¥ U U U T c R R R ¦ ¦ ¥ R S G T R ` G ` ` R ` ¦ ¦ ¥ ` ¦ S R F ¤ ¦ ¤ ¦ ¥ F F G ` R ¥ R S § R ¦ R F R ¦ R v R R ¥ G ¦ R ¥ b S ¤ ¤ S T ¤ ` ¥ ` ` S ¦ R ¥ S U ¦ ¦ ¥ U R b R S ¦ ¥ F ¥ F y ¥ G ¥ ¦ ¥ b ¤ T ¥ F U ¤ G U ¦ U ¤ F S ¥ S U ¦ ` ¥ F ¥ F ¤ ¥ T ¦ G T ¥ ¦ S F U U ¥ S ¥ S U U ¦ ¥ T § ¥ U ¥ ¤ ¥ b § ¥ U G U ¦ ¥ S F U R ¤ d ¥ S ¥ c S F F G ¥ ¥ U § ¥ T ¥ ¥ F F G b ¦ ¥ b U ¤ T ¦ ` ¥ S b U ¥ S F F ¥ 5 2 The few things to know before starting
q q p r w x w w Once R is installed on your computer, the software is accessed by launching the corresponding
executable (RGui.exe ou Rterm.exe under Windows, R under Unix). The prompt ‘>’ indicates
that R is waiting for your commands.
¦ ¤ ¦ ¤ R § ¦ ` ¥ S S G F ¦ S S U R ¥ G U b F U S T ¦ S ¦ a ¦ ¥ F ¦ R ¥ F T ¤ ¥ ¤ ¥ F ¦ S ¦ y £ R y ¥ § y ¥ S G b F G S y F S S U F S b U y ¥ F U y U U y ¥ U ¥ ¥ T Under Windows, some commands related to the system (accessing the online help, opening
files, ...) can be executed via the pulldown menus, but most of them must heve to be typed on
the keyboard. We shall see in a first step three aspects of R: creating and modifying elements,
listing and deleting objects in memory, and accessing the online help.
£ ¥ G ¥ ¤ ¥ G ¥ ¥ T § S ¥ G ¤ ¥ ¥ T ¦ ` § ¥ b c R ` ¥ ¤ ¦ ¦ § ¥ b ¦ R ¥ b U R U S U b S ` ¤ c ¦ ¦ ` ¥ G U R ` U ¦ R S y R ¤ § v ¤ £ ¥ ¥ S b S b T S ¥ S ¥ F R ¤ G ¦ R ¤ ¥ F ¥ G F S F ¦ ¦ ¥ G S § ¦ ¥ S S S F T b b ¦ ¦ ¥ S ¥ T R R ¥ S S S { 2.1 The operator <~ }  z § R is an objectoriented language: the variables, data, matrices, functions, results, etc. are
stored in the active memory of the computer in the form of objects which have a name: one
has just to type the name of the object to display its content. For example, if an object n has
for value 10 :
u ¥ ¥ ¥ ¥ U ¥ ¥ U c § ¥ ¥ ¥ ¦ R ¦ R ¥ G U ` § ¥ c ¤ U T ¦ § ¤ ¥ ¥ c ` ¤ ¤ ¥ ¥ S G b ¥ F y ¥ S ¥ S ¥ ¥ ¥ G T b ¥ S G ¥ ¥ T U R U ` F c >n
[1] 10
§ The digit 1 within brackets indicates that the display starts at the first element of n (see §
3.4.1). The symbol assign is used to give a value to an object. This symbol is written with a
bracket (< or >) together with a sign minus so that they make a small arrow which can be
directed from left to right, or the reverse:
¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¤ ¥ ¤ R v F F b ¥ ¥ ¥ ¤ b c c ¤ ¥ S ¥ ¥ U R ¥ § ¥ ¦ ¥ § T § ¥ G T § U ¦ ¥ ¦ ¤ ¦ ¥ T S U b T v ¤ v ¥ ¥ S ¥ F ¤ ¥ F ¦ ` ¥ F c ¥ F § ¥ F ¤ F R ` ¦ ¥ ¥ F b F F >n
>n
[1]
>5
>n
[1] < 15 15
> n 5 The value which is so given can be the result of an arithmetic expression:
¦ ¦ ¤ ¥ S F G ¦ ` R ¥ ¤ § ¥ b y F ¦ ¦ ¤ ¦ ¤ R ¤ a ¥ S F U S S c U c > n < 10+2
>n
[1] 12
R ¤ § R ¦ ¦ ¦ ¤ ¦ ¦ R ¦ ¤ Note that you can simply type an expression without assigning its value to an object, the result
is thus displayed on the screen but not stored in memory:
¥ ¥ U ¥ ¥ F S ¥ U ¥ S c S ¥ ¥ S U F G S y G ¥ T G T b S U ¥ ¥ T § ¥ ¥ ¥ ¥ T ¥ U G T U > (10+2)*5
[1] 60
{ 2.2 Listing and deleting the objects in memory
~
~ ~
~
}
z § § The function ls() lists simply the objects in memory: only the names of the objects are
displayed.
¥ ¥ ¥ ¥ T ¥ ¥ T T ¥ G U G T > name < "Laure"; n1 < 10; n2 < 100; m < 0.5
> ls()
[1] "m"
"n1"
"n2"
"name"
e Note the use of the semicolon ";" to separate distinct commands on the same line. If there are
¥ ¥ ¥ ¥ ¥ ¥ G ¥ ¥ U ¥ 6 &
§ § a lot of objects in memory, it may be useful to list those which contain given character in their
name: this can be done with the option pattern (which can be abbreviated with pat) :
¥ ¥ ¥ ¥ ¥ ¥ ¥ c ¤ U ¦ ¦ ¥ § § § ¤ ¦ U ¤ ¤ ¥ F c S ¦ ¥ S ¥ T ¦ ¥ ¥ T ¤ § ¦ ¤ ¥ ¥ G S S b S > ls(pat="m")
[1] "m"
"name"
§ If we want to restrict the list of objects whose names strat with this character:
¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ > ls(pat="^m")
[1] "m"
¦ ` ¤ ¦ § ¤ R ¦ ¤ R ¦ To display the details on the objects in memory, we can use the function ls.str():
¥ ¥ S S ¥ U S U F b T b ¥ ¥ S a ¥ S G T > ls.str()
m: num 0.5
n1: num 10
n2: num 100
name: chr "name"
` ¦ R ` ¤ § § ¦ § R ¦ ¤ The option pattern can also be used as described above. Another useful option of ls.str()
is max.level which specifies the level of details to be displayed for composite objects. By
default, ls.str() displays the details of all objects in memory, including the columns of data
frames, matrices and lists, which can result in a very long display. One can avoid to display all
these details with the option max.level=1:
¥ ¥ S G U § ¦ a ¥ F U S F c S U S ` R ¦ § R ¦ ` R R ¤ ¦ ` ¦ G ¤ ¦ ¤ ¦ ¥ ¥ ¥ G T b F ¥ ¥ G T G c ` R ¤ ¦ ¥ R ¦ ¦ ¥ S R R R b § R R ` S U F T ¦ b R b ¦ ¤ R ¥ ¦ R ¤ ¦ S G T S F T ¤ ¦ F U ` U R ¦ ¥ S c R ¥ G T R ¥ S c ¦ ¥ S ¦ ¥ G T R ¥ S U ¦ ` ¥ S S ¥ F ¥ b ¥ b F ¥ ¥ G > M < data.frame(n1,n2,m)
> ls.str(pat="M")
M: ‘data.frame’:
1 obs. of
$ n1: num 10
$ n2: num 100
$ m: num 0.5
> ls.str(pat="M", max.level=1)
M: ‘data.frame’:
1 obs. of
3 variables:
e
e
e 3 variables:
§ § To delete objects in memory, we use the function rm(): rm(x) deletes the object x, rm(x,y)
both objects x et y, rm(list=ls()) deletes objects in memory; the same options mentioned
for the function ls() can then be used to delete selectively some objects:
rm(list=ls(pat="m")).
¥ ¥ ¥ ¥ ¥ ¥ U U ¥ T ¥ ¥ ¥ ¥ ¥ ¥ § ¥ § ¥ G § T § R ¦ ¥ T R R ¥ b T § ¥ ¤ ¦ ¥ ` ¥ c ¤ ¥ S U S ` ¥ S S F U { { 2.3 The online help

z ¤ ¦ ` ¤ ¤ ¦ ` ¦ R ` ¦ ` R ¤ ¦ R ¤ The online help of R gives some very useful informations on how to use the functions. The
help in html format is called by typing:
a ¥ ¥ S S ¥ U a ¥ S U S b F S U U T F b c G ¦ § R R ¦ ` ¥ S G S c S R ¤ ¥ T ¦ R ¤ ¥ b T F b S G > help.start()
§ ¦ ` ` ¤ R ¤ R ¤ ¦ ¤ ¤ ¦ R § ¦ ¦ v ¤ ¦ ¤ A search with keywords is possible with this html help. A search of functions can be done
with apropos("what") which lists the functions with "what" in their name:
¥ S S S ¥ S F U G ¥ ¥ ¥ ¥ b ¥ G ¥ ¥ ¥ F F T ¥ ¥ U > apropos("anova")
[1] "anova"
[5] "anova.lm"
[9] "print.anova"
e e § The help is available in text format for a given function, for instance: "anova.glmlist"
"anovalist.lm"
"stat.anova"
"anova.glm"
"anova.glm.null"
"anova.lm.null"
"anova.mlm"
"print.anova.glm" "print.anova.lm"
¥ ¥ ¥ U c ¥ ¥ c G 7
W > ?lm displays the help file for the function lm(). The function help(lm) or help("lm") has the
same effect. This last function must be used to access the help with nonconventional
characters:
¥ ¥ ¥ ¥ U ¥ G U G T § ¥ ¥ ¥ G c ¥ ¥ U ¥ U ¥ ¥ U ¤ ¥ F F > ?*
Error: syntax error
> help("*")
Arithmetic
R Documentation
e Arithmetic Operators
... package:base ¡ 8
( ¢ 3 Data with R
p £ { 3.1 The ‘objects’
~ z ¦ ¤ § ¦ ¦ ¦ ¦ ¤ R R ¤ ¦ ¤ § ¤ ¦ v R works with objects which all have two intrinsic attributes: mode and length. The mode is
the kind of elements of an object; there are four modes: numeric, character, complex, and
logical. There are modes but these do not characterize data, for instance: function,
expression, or formula. The length is the total number of elements of the object. The
following table summarizes the differents objects manipulated by R.
a ¥ ¥ ¥ ¥ ¥ ¥ ¥ b S F U S F S F c ` ¤ ¥ § ` R ¥ ` ¦ v ¤ ¥ ¥ S b ¦ F F F U S S b S e ` ¦ ¤ ¤ § ¤ a ¥ ¥ S S ¥ F ¥ F ¥ F ¥ S b U F F e ¤ § ¤ ` R ` § R ¤ ¦ ¤ a a ¥ ¥ ¥ ¥ ¥ ¥ S b F b S U F § ¥ e § ¥ ¥ ¥ G U § ¥ T U ¤ ¨ several modes possible in
the same object ?
© ¬ § ± ° ª § « ® ¬ ª § « ¦ ¨ © § ¥ © § numeric, or character numeric, character, complex, or logical No
g factor ª vector ¨ possible modes ¯ object # # j No
g # # # j @ j # # # @ j array numeric, character, complex, or logical matrix numeric, character, complex, or logical data.frame numeric, character, complex, or logical Yes ts numeric, character, complex, or logical Yes numeric, character, complex, logical, function, expression, or
formula Ye s No
g g # # § ¦ # # @ j # # # j @ j # # j # # # j @ j # # j 0 ² # # # j 0 2 4 0 1 D 1 j ¦ 4 4 ` 0 # 1 @ j ¦ @ j R # # # D j 3 # j R j j j # list No ¦ # # # j 3 @ 1 j @ ¦ R ¤ ¦ R § ¦ ¦ A vector is a variable in the commonly admitted meaning. A factor is a categorical variable.
An array is a table with k dimensions, a matrix being a particular case of array with k = 2.
Note that the elements of an array or of a matrix are all of the same mode. A data.frame is a
table composed with several vectors all of the same length but possibly of different modes. A
ts timeseries data set and so contains supplementary attributes such as the frequency and the
dates.
¥ F ³ ¥ F c ¤ ¥ F ¦ S ` R S ¦ ¥ b ¦ ¥ ¥ b § b ¦ S ¦ ³ F ¤ ¦ ¥ F U G S b S S R F c § c ¦ ¥ F y ¥ b ¦ ¥ F S T ¥ b S ¥ ¥ ¥ ¥ ¥ ¥ ¥ § § ¥ § ¥ ¥ G T ¥ ¥ ¥ U c ¥ G c § ¥ ¥ T ¥ U U ¥ ¥ ¥ U ¥ G T G ¥ ¥ ¥ U ¥ Among the nonintrinsic attributes of an object, one is to be kept in mind: it is dim which
corresponds to the dimensions of a multivariate object. For example, a matrix with 2 lines and
2 columns has for dim the pair of values [2,2], but its length is 4.
¤ ¦ ¤ ¦ ¦ ¦ ¦ ¥ v § ¦ ¥ S b ¥ S § ` § ¥ G ¦ ¦ ¥ S ¥ S ¦ ¦ ¥ ¤ ¥ F U S ¥ F S S S S b ¥ ¥ § ¥ ¥ ¥ G ¥ c ¥ G U § ¥ ¥ ¥ U ¤ § ¤ ` U ¤ ` G c ¦ ¦ U ¦ ¤ v R ` ¦ It is useful to know that R discriminates, for the names of the objects, the uppercase
characters from the lowercase ones (i.e., it is casesensitive), so that x and X can be used to
name distinct objects (even under Windows):
¥ F G G ¥ ¥ ¥ b U § ¤ ¥ ¦ ¥ S U ´ ¥ ¥ S S ¦ ¦ b ¦ ¥ S ¥ F ¥ ¥ F ¦ R U ¤ ¥ U ` ¤ ¥ S c S d ¥ S ¥ F b F F F ¦ § ¦ ¦ £ ¥ S F S S U ¥ ¥ S c b S > x < 1; X < 10
> ls()
[1] "X" "x"
>X
[1] 10
>x
[1] 1
{ ¹ · 3.2 Lire des données à partir d’un fichier
µ ~ ¶
¶ ¸ ¶ } 
¶ z § ¦ ` ¤ R ¦ ` ¦ R can read data stored in text (ASCII) files; three functions can be used: read.table()
(which has two variants: read.csv() and read.csv2()), scan() and read.fwf(). For
example, if we have a file data.dat, one can just type:
d ¥ d ¥ ¥ ¥ ¥ ¥ S U S S F U S y F F ¦ S ¤ ¤ ¦ ¤ ¥ F S º ¥ S S F c ¥ G T R ¥ ¥ U S ¦ ` ¤ ` ¦ R ¥ S G c b y > mydata < read.table("data.dat")
9
P § § § § mydata will then be a data.frame, and each variable will be named, by default, V1, V2, ...
and could be accessed individually by mydata$V1, mydata$V2, ..., or by mydata["V1"],
mydata["V2"], ..., or, still another solution, by mydata[,1], mydata[,2], etc2. There are
several options available for the function read.table() which values by default (i.e. those
used by R if omitted by the user) and other details are given in the following table:
» » ¥ ¥ U T c § ¼ § ¼ F T ¼ R ¦ ¦ ¦ T R ¤ T U § R S c S U § ¦ R ¤ R R ¦ a ¥ F ¥ F ¥ S T ¥ F U S § ¥ ¼ F § ¥ ¥ ¥ ¥ U T U c U G c § c § ¥ ¥ ¥ ¥ § ¥ c ¥ U ¥ T T U > read.table(file, header=FALSE, sep="", quote="\"’", dec=".", row.names=,
col.names=, as.is=FALSE, na.strings="NA", skip=0, check.names=TRUE,
strip.white=FALSE) ¿ ¾ ½ º e ¿ Â Á ¾ À ¿ ¾ ½ ! ! file the name of the file (within ""), possibly with its path (the symbol \ is not allowed and must
be replaced by /, even under Windows)
Ã h @ H $ m h H $ H j 0 m 1 H ! f V 1 1 @ 6 4 $ # ! j 0 4 ! 0 2 0 4 0 2 2 0 0 0 0 4 0 header 4 a logical (FALSE or TRUE) indicating if the file contains the names of the variables on its
first line
1 6 3 1 1 1 # m Å p ) © 5 Å B Q h Ä 1 # 1 # 0 4 0 2 0 4 ! 0 0 2 0 0 2 4 0 2 1 0 4 0 2 sep the field separator used in the file, for instance sep="\t" if it is a tabulation
1 @ # 1 1 1 @ j quote ! the characters used to cite the variables of mode character
# # 6 # @ # # dec the character used for the decimal point
! # @ # # row.names ! a vector with the names of the lines which can be a vector of mode character, or the number
(or the name) of a variable of the file (by default: 1, 2, 3, ...)
@ # # # 6 # # H # H 6 j m % 4 ¨ 2 ! 4 h 0 2 2 j j 4 ! 0 2 m @ $ h 6 3 1 j ! col.names ! a vector with the names of the variables (by default: V1, V2, V3, ...)
m % Æ Æ h Æ @ j j $ 6 H # 6 j as.is ! controls the conversion of character variables as factors (if FALSE) or keep them as
characters (TRUE)
m 5 h # 6 # # # 6 m Å p # h ) © # # na.strings the value given to missing data (converted as NA)
m h g 6 # ! 6 ! @ 6 skip ! the number of lines to be skipped before reading the data
@ check.names if TRUE, checks that the variable names are valid for R strip.white (conditional to sep) if TRUE, scan deletes extra spaces before and after the character
variables ! p 6 6 # # j 2 2 ! 4 2 Å # # 1 # p ) 0 4 m © D 0 0 h 1 1 # j 4 § ! 0 6 § Two variants of read.table() are useful because they have different by default options:
¥ ¥ ¥ ¥ ¥ G U T c T U U U c read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".", ...)
read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",", ...) ¿ Â Á e ¿ ¦ ¤ ¦ Â Á e ¤ ¤ R § ¦ R ` ¦ ¦ ` ¤ The function scan() is more flexible than read.table() and has more options. The main
difference is that it is possible to specify the mode of the variables, for example :
a ¥ ¥ a ¥ S b S G F b S S F y b S S U R ` R § ¦ ¤ ` ¤ ` ¥ G b F F y ¦ R ¥ ¦ ¦ ¦ ¥ b c § ¥ G T ¤ ¦ ¥ ` ` ¦ ¥ G S F > mydata < scan("data.dat", what=list("",0,0))
` ¤ ¤ ` ¦ ¦ ` ¤ R § ¦ ¤ R ¦ ` ¤ ¦ reads in the file data.dat three variables, the first is of mode character and the next two are of
mode numeric. The options are as follows.
¥ ¥ ¥ F ¥ S y S ¥ F F ¥ b ¥ F F ¥ ¥ ¥ F c S F ¥ G U > scan(file="", what=double(0), nmax=1, n=1, sep="", quote=if (sep=="\n")
"" else "’\"", dec=".", skip=0, nlines=0, na.strings="NA", flush=FALSE,
strip.white=FALSE, quiet=FALSE)
e e ¾ ½ ¾ À e e ! ! ! the name of the file (within ""), possibly with its path (the symbol \ is not allowed and must be
replaced by /, even under Windows); if file="", the data are input with the keyboard (the
entry is terminated with a blank line)
Ã h @ H $ m h H $ H j ! h k $ 0 0 H @ 1 2 0 0 m ! f V 4 H 1 1 1 @ 6 $ j # 4 k 4 ! 0 0 1 1 0 H 1 3 $ 1 specifies the mode(s) of the data
m 0 j 0 m h what 2 2 0 # 0 4 Nevertheless, there is a difference: mydata$V1 and mydata[,1] are vectors whereas mydata["V1"] is a
data.frame.
g H # 6 1 # 1 6 j 2 3 2 ¿ file ¨ 10
7 ! ! nmax ! the number of data to read, or, if what is a list, the number of lines to read (by default, scan
reads the data up to the end of file)
h @ $ @ j j j @ j m ! @ n ! the number of data to read (by default, no limit)
m h @ $ @ j sep the field separator used in the file
4 0 2 0 4 0 2 1 @ quote the characters used to cite the variables of mode character
2 4 ! 0 0 # # 3 6 # @ # # dec the character used for the decimal point
0 4 0 2 1 0 2 3 # ! 0 k ! @ 0 4 # 2 # ! skip the number of lines to be skipped before reading the data
1 1 0 4 3 2 1 @ ! nlines the number of lines to read
0 1 0 3 1 0 @ 4 na.string the value given to missing data (converted as NA)
m B h g ! 4 2 ! 0 1 6 4 # 1 3 1 6 2 0 @ 4 6 0 flush 4 a logical, if TRUE, scan goes to the next line once the number of columns has been reached
(allows the user to add comments in the data file)
Å # 1 1 3 @ # 3 1 @ # 1 1 D 1 p ) © j 4 m 0 2 # 1 4 1 3 3 # j 0 4 h @ H strip.white (conditional to sep) if TRUE, scan deletes extra spaces before and after the character
2 2 ! 4 2 Å # # 1 # p ) 0 4 m © D 0 0 h 1 1 # j 4 ! 0 variables
! a logical, if FALSE, scan displays a line showing which fields have been read
6 quiet 5 6 # H H $ j # j The function read.fwf() can be used to read in a file some data in fixed width format:
` ¤ ¦ ¥ ¦ ` ¦ R ¥ b ¦ ` ¦ § ¥ F ¦ ` ¥ S y b S ¤ a ¥ F S U S S U º > read.fwf(file, widths, sep="\t", as.is=FALSE, skip=0, row.names,
col.names)
The options are the same than for read.table() except widths which specifies the width of
the fields. For example, if the file data.txt has the following data:
¥ ¥ ¥ ¥ ¥ ¥ G G ¥ ¥ G ¥ ¥ ¥ ¥ ¥ G A1.501.2
A1.551.3
B1.601.4
B1.651.5
C1.701.6
C1.751.7
¤ ¦ ¾
Ç ¤ one can read them with:
¥ ¥ b F S S > mydata < read.fwf("data.txt", widths=c(1,4,3))
> mydata
V1
V2 V3
1 A 1.50 1.2
2 A 1.55 1.3
3 B 1.60 1.4
4 B 1.65 1.5
5 C 1.70 1.6
6 C 1.75 1.7 º º ¼ ¼ ¼ ¾
Ç
È 3.3 Saving data
~ } }
É } z ¦ § ¦ ¦ ` ¤ The function write(x, file="data.txt") writes an object x (a vector, a matrix, or an
array) in the file data.txt. There are two options: nc (or ncol) which defines the number of
columns in the file (by default nc=1 if x is of mode character, nc=5 for the other modes), and
append (a logical) to add the data without erasing those possibly already present in the file
(TRUE), or erasing these (FALSE, the default value).
¥ S F ¥ F y b ¥ F ¥ F ` § ¤ ¦ ` ¤ ¦ a ¥ S c ¤ S ¦ S U º ¤ R ¦ ` ¤ ¦ a ¥ F b U ¥ S S F ¥ S ¥ G F ¥ F ¥ ¥ S y F T F § ¥ ¥ ¥ ¥ ¥ U T U § ¥ ¥ G T ¥ ¥ G T ¥ ¥ ¥ U R R ` ¤ ¤ g ¥ U ¦ c ¤ g U ¦ ` y a ¥ S R ¦ ¥ ¦ F F ¦ ¦ ` ¤ a ¥ ¥ a ¥ F S G S F S S U º > write.table(x, file, append=FALSE, quote=TRUE, sep=" ", eol="\n",
na="NA", dec=".", row.names=TRUE, col.names=TRUE) ¥ The function write.table() writes in a file a data.frame. The options are:
Â e ¿ Â Á ¿ Â Á ¾ º À 11
sep the field separator used in the file
@ col.names a logical indicating whether the names of the columns are written in the file
H @ # H # row.names id. for the names of the lines
0 4 2 2 1 3 1 a logical or a numeric vector; if TRUE, the variables of mode character are quoted with ""; if a
numeric vector, its elements gives the indices of the columns to be quoted with "". In both cases,
the names of the lines and of the columns are also quoted with "" if they are written.
2 0 0 2 4 ! 0 2 Å H A @ p ) 0 0 # # 3 0 4 6 4 © 0 quote # # 6 3 1 @ # j ! 0 ! 4 2 0 0 0 4 0 0 e # 1 H A @ 1 3 @ # # 1 6 1 3 # # 6 3 @ 1 j 0 1 2 0 0 H 4 4 2 $ 0 4 2 H A @ 1 3 0 @ 4 # 0 1 1 2 3 ! 1 the character to be used for the decimal point
1 3 # 0 0 @ 2 # # ! na the character to be used for missing data
0 0 0 4 1 3 2 @ # # ! the character to be used at the end of each line ("\n" is a carriagereturn)
m h 1 @ ¦ ` # 1 # ¤ 1 ` ¦ § @ ¦ # § eol j dec # ` To record a group of objects in a binary form, we can use the function save(x, y, z,
file="Mystuff.RData"). To ease the transfert of data between different machines, the
option ascii=TRUE can be used. The data (which are now called image) can be loaded later in
memory with load("Mystuff.RData"). The function save.image() is a shortcut for
save(list=ls(all=TRUE), file=".RData").
¥ ¥ S S a ¥ U S U b F F T S S G F U F F § ¥ ¥ ¥ ¥ ¥ ¥ ¥ e § § ¥ ¥ ¥ Â G U ` ¤ ¦ ¦ ` ¤ ¤ ¦ a ¥ F ¥ ¥ F U ¥ S S Ê Á U F T e Ê b b Á ¿ Â Á 3.4 Generating data
Ì ~ Ë ~ } }
} ¶
z Ð Ï 3.4.1 Regular sequences
ª ¯ Ò ® Ñ Í Î Î § A regular sequence of integers, for example from 1 to 30, can be generated with:
¥ ¥ ¥ ¥ G U U > x < 1:30
The resulting vector x has 30 éléments. The operator ‘:’ has priority on the arithmetic
operators within an expression:
¥ ¥ ¥ ¥ ¥ ¥ G T ¥ ¥ G c U ¥ ¥ G G > 1:101
[1] 0 1 2 3 4 5 6 7 8 9
> 1:(101)
[1] 1 2 3 4 5 6 7 8 9
R R ` § R ¦ ` ¤ The function seq() can generate sequences oe real numbers as follows:
¥ F b U S F S a ¥ F U S S S S U > seq(1, 5, 0.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 where the first number indicates the start of the sequence, the second one the end, and the
third one the increment to be used to generate the sequence. One can use also:
¤ ¤ ¥ ¤ ¥ S ¤ ¥ S S ` ¤ ¥ S S ¥ ¦ ¥ ¥ ¦ § ¦ ¥ ` ¤ ¤ ¥ F U S F b ¥ S U F F § ¥ U ¥ ¥ ¥ U ¥ ¥ U > seq(length=9, from=1, to=5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
§ It is also possible to type directly the values using the function c() :
¥ ¥ ¥ U U U ¥ c ¥ G T ¥ ¥ G T > c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 which gives exactly the same result, but is obviously longer. We shall see late that the
function c() is more useful in other situations. The function rep() creates a vector with
elements all identical:
¤ ¤ R R R ¤ R R ¦ § ¦ § R ¤ R ¦ ¤ ¦ ¤ £ ¥ ¥ ¥ ¥ ¥ ¥ F ¥ ¥ S T ¥ c U ¥ c U ¥ U ¥ U ¥ F U b ¥ T y c ¥ ¥ U U U ¥ ¥ > rep(1, 30)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
¥ 12
¤ § ¦ ¤
¦ ` ` ¦ ¦ ` ¤ The function sequence() creates a series of sequences of integers each ending by the
numbers given as arguments:
¥ ¥ S T S ¥ F S S a ¥ F F U S S U e ¦ § ¥ S b F U S F b c S U > sequence(4:5)
[1] 1 2 3 4 1 2 3 4 5
> sequence(c(10,5))
[1] 1 2 3 4 5 6
e
e 8 9 10 1 2 3 4 5
§ 7 § The function gl() is very useful because it generates regular series of factor variables. The
usage of this fonction is gl(k, n) where k is the number of levels (or classes), and n is the
number of replications in each level. Two options may be used: length to specify the number
of data produced, and labels to specify the names of the factors. Examples:
¥ ¥ c ¥ ¥ U U U U T c U § ¥ ¥ c § ¤ ` ¥ ¥ U ¦ § ¦ R R ¤ ¦ ¦ ¦ R ` U § a ¥ ¥ ¥ ¥ F b U S G T U b T S G S c S G F F b S U R ` ¤ ` ¤ ` ¦ ` g ¥ ¥ ¥ ¥ ¥ G b F y b S G T S F U G > gl(3,5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
> gl(3,5,30)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
> gl(2,8, label=c("Control","Treat"))
[1] Control Control Control Control Control Control Control Control Treat
[10] Treat
Treat
Treat
Treat
Treat
Treat
Treat
Levels: Control Treat
> gl(2, 1, 20)
[1] 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
Levels: 1 2
> gl(2, 2, 20)
[1] 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2
Levels: 1 2
§ Finally, expand.grid() creates a data.frame with all combinations of vectors or factors
given as arguments:
¥ ¥ ¥ ¥ ¥ c T ¥ U c > expand.grid(h=seq(60,80,10), w=seq(100,300,100), sex=c("Male","Female"))
h
w
sex
1 60 100
Male
2 70 100
Male
3 80 100
Male
4 60 200
Male
5 70 200
Male
6 80 200
Male
7 60 300
Male
8 70 300
Male
9 80 300
Male
10 60 100 Female
11 70 100 Female
12 80 100 Female
13 60 200 Female
14 70 200 Female
15 80 200 Female
16 60 300 Female
17 70 300 Female
18 80 300 Female ½ º
½ ½ ½ ½ Í 3.3.2 Random sequences
¬ Ò § § R ` ¦ ¦ ¥ b U S F ¥ F S ` ¦ R R ¥ ` S ¤ b R ¦ § S ¥ F F ¦ ¦ ` ¥ ¦ ¥ R ¦ R Î ¦ ¥ d ¥ S S ¤ ¦ ` ¦ ¦ R ¦ § § a ¥ b F S ¥ S ¥ U F S ¥ S U S ¥ S U T ¥ S T F > rfunc(n, p[1], p[2], ...) F Í Î ` It is classical in statistics to generate random data, and R can do it for a large number of
probability density functions. These functions are built on the following form:
G 13
% § § § whre func indicates the law of probability, n the number of data to generate and p[1], p[2],
... are the values for the parameters of the law. The following table gives the details for each
law, and the possible default values (if none default value is indicated, this means that the
parameter must be specified by the user).
¥ ¥ ¥ ¥ ¥ U ¥ ¥ G T ¤ ` R ¦ ¤ ¦ R § ¦ R R ` ¤ R ¤ ` ¤ ` R ¤ a ¥ ¥ ¥ F ¤ ¤ ¥ ¥ ¦ ¥ ¤ ¦ ¥ ¥ S ¥ S c ¦ ¥ F ¦ R R ¥ ` ` b ¦ F R S U ¥ F R ¥ b G ` U R § S U S U ¦ ¤ G § ¥ ¥ « « § ¨ G U ª § rnorm(n, mean=0, sd=1) Gaussian (normal)
m h rexp(n, rate=1) 4 9 @ exponential
0 1 1 D gamma rgamma(n, shape, scale=1) Poisson rpois(n, lambda) Weibull rweibull(n, shape, scale=1) Cauchy rcauchy(n, location=0, scale=1) 4 4 ! 0 V @ # $ rbeta(n, shape1, shape2) o @ ! beta
rt(n, df) ‘Student’ (t)
m h 5 Ó 1 rf(n, df1, df2) @ 0 Fisher (F)
m h Ô Ä Pearson (χ2) rchisq(n, df) m h n Õ 1 rbinom(n, size, prob) 4 0 0 ! binomial
rgeom(n, prob) 3 1 0 geometric
# rhyper(nn, m, n, k) 3 0 hypergeometric # rlogis(n, location=0, scale=1) 3 0 0 logistic $ 4 # rlnorm(n, meanlog=0, sdlog=1) lognormal
negative binomial rnbinom(n, size, prob) uniform runif(n, min=0, max=1) ! j ¤ ¦ R ¤ ¦ R 6 rwilcox(nn, m, n), rsignrank(nn, n)
¦ @ Wilcoxon’s statistics R ¥ G T ¬ ¯ S U U R ¥ c § commande
F c ¥ c loi # § V § # ¦ ` ¤ R R Note all these functions can be used by replacing the letter r with d, p or q to get, respectively,
the probability density (dfunc(x)), the cumulative probability density (pfunc(x)), and the
value of quantile (qfunc(p), with 0 < p < 1).
¥ T ¥ G c ¥ ¥ F ¥ F ¥ ¥ ¥ F S G F T S U ¥ S S ¥ U ¤ ¦ ¥ ¦ ¥ S R ¦ § § ¦ R ¥ S T ¤ ¥ ¦ F T G ¥ c b U ¥ U R ¦ § § ¤ ¥ S T ¦ F ¥ G T ¤ ¦ R ¦ ` ¥ R ¥ S U U c 3.4 Manipulating objects
Ö ~ ~
}  ¸
} z ° 3.4.1 Accessing a particular value of an object
© § § Î ¦ ` Í ¦ ` R ¦ ¤ ¤ R Î ` To access, for example, the third value of a vector x, we just type x[3]. If x is a matrix or a
data.frame, the value of the ith line and jth column is accessed with x[i,j]. To change all
values of the third column, we can type:
d ¥ F ¥ F y b G y ¥ ¥ T U ¥ F y c U F c a ¥ G b F y R R ¤ ¤ ¦ ¦ R ¤ ¦ R ¤ ¦ ¤ ` R ¤ a ¥ ¥ S S b ¥ S U ¥ ¥ S U c ¥ G ¥ T ¥ U U c > x[,3] < 10.2
` § ¤ ¦ ¦ ¤ ¦ ¦ R R ¦ ¦ ¦ ¦ ¦ ¤ This indexing system is easily generalised to arrays, with as many indices as the number of
dimensions of the array (for example, a three dimensional array: x[i,j,k], x[,,3], ...). It is
useful to keep in mind that indexing is made with straight brackets , whereas parentheses
are used for the arguments of a function:
¥ F b U ¥ S S T S ¥ b F T a ¥ F F ¥ S b T S T ¥ S y ¥ G T T § ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ G G ¦ ` ` U ¤ ¥ S S U ` ¥ ¥ S U b F U F F U > x(1)
Error: couldn’t find function "x"
e R R ¦ R R ¦ R e R § ¦ Indexing can be used to suppress one or several lines or columns. For examples, x[1,] will
suppress the first line, or x[c(1,15),] will do the same for the 1st and 15th lines.
¥ G b y F S b F U S F F c S F G G U S U S S y ¦ R ¤ ¤ S ¥ S ¥ ` ¤ ¥ R R ¦ ¦ F ¥ R ¦ b ¥ F S ` ¤ F ¥ F G G U d 14
¤ ¦ R ` R ¤ R § ¦ ¦ ¦ ¦ For vectors, matrices and arrays, it is possible to access the values of an element with a
comparaison expression as index:
¥ ¥ ¥ S b S U ¥ ¥ ¥ G c F F T S ¦ b F F c ¦ S ¦ S y ¥ F F G S y F G b > x < 1:10
> x[x >= 5]
>x
[1] 1 2
> x[x == 1]
>x
[1] 25 2 < 20 3 4 20 20 20 20 20 20
< 25 3 4 20 20 20 20 20 20 § The six comparaison operators used by R are: < (lesser than), > (greater than), <= (lesser than
or equal to), >= (greater than or equal to), == (equal to), et != (different from). Note that these
operators return a variable of mode logical (TRUE or FALSE).
¥ ¥ ¥ ¥ ¥ T G U G ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ U ¥ U R g ¥ g y ¦ U R ` R § ¦ a ¥ F b F S c F ¥ F U F F G 3.4.2 Arithmetics and simples functions
¬ § Í Î Î There are numerous functions in R to manipulate data. We have already seen the simplest one,
c() which concatenates the objects listed in parentheses. For example:
£ ¥ ¥ ¥ G T ¥ ¥ c G U R ¥ U ¤ ¦ ¦ R § U U ¤ ¤ ¦ ¤ ¥ G b ¥ F y S F G ¥ ¥ ¥ ¥ S S S > c(1:5, seq(10, 11, 0.2))
[1] 1.0 2.0 3.0 4.0 5.0 10.0 10.2 10.4 10.6 10.8 11.0
¦ ¦ ¤ ¦ R ¦ R ¤ ¦ R ¦ § Vectors can be manipulated with classical arithmetic expressions:
¥ F G ¥ ¥ b y ¥ G U S b S F 5.0
R ¦ ¤ ¤ ¦ ¤ ¦ § ¤ R ` ` ¦ ` Vectors of different lengths can be added; in this case, the shortest vector is recycled.
Examples:
¥ F T ¥ F ¥ ¥ ¥ ¥ F c S ¥ S » ¥ F x < c(1,2,3,4)
y < c(1,1,1,1)
z < x + y
z
[1] 2.0 3.0 4.0 S >
>
>
> » ¥ S S F F R G b g y < c(1,2,3,4)
< c(1,2)
< x+y
¨ >x
>y
>z
>z
[1]
2446 > x < c(1,2,3)
> y < c(1,2)
> z < x+y
Warning message:
longer object length
is not a multiple of shorter object length in: x + y
>z
[1] 2 4 4
e Note that R has given a warning message and not an error message, thus the operation has
been done. If we want to add (or multiply) the same value to all the elements of a vector:
¥ ¥ ¥ G ¥ ¥ ¥ ¥ ¥ U c ¥ G T ¥ ` ¦ ¥ F G F S § ¥ U The arithmetic operators are +, , *, /, and ^ for powers. ¥ ¥ 10 20 30 40 c ¥ c < c(1,2,3,4)
< 10
< a*x ¥ U >x
>a
>z
>z
[1] F F ¤ ¥ F G ¦ ¤ ¥ b F a 15
¦ ¦ ¦ ¦ ` R ` R ` ¤ Two other useful operators are x %% y for “x modulo y”, and x %/% y for integer divisions
(returns the integer part of the division of x by y).
¥ S ¥ F c S F S T b U F y F § F ` ¦ ¦ ¦ ¤ § S T G U F U ` ¦ ¥ T a ¥ F y ¥ ¤ ¥ F G c § ¥ F ¥ S S F F U § The functions available in R are too many to be listed here. One can find all basic
mathematical functions (log, exp, log10, log2, sin, cos, tan, asin, acos, atan, abs, sqrt,
...), special functions (gamma, digamma, beta, besselI, ...), as well as diverse functions useful
in statistics. Some of these functions are detailed in the following table.
¥ ¥ ¥ ¥ T c U ¥ ¥ ¥ U R ` ¦ ` ¦ R R ¦ ¥ S U ` R ¦ ¥ U S F U S c R § ¦ R R ` ¤ ¦ ¥ R ¦ ¦ ¥ ¤ ` ¥ S F S 2 4 1 3 ¥ 3 @ # @ max(x) maximum of the elements of x min(x) minimum of the elements of x which.max(x) returns the index of the greatest element of x which.min(x) returns the index of the smallest element of x range(x) has the same result than c(min(x),max(x)) length(x) number of elements in x mean(x) mean of the elements of x @ @ @ @ @ ! 2 4 @ 2 D 1 3 1 3 median(x) median of the elements of x
2 4 2 D 0 0 2 0 4 4 0 1 3 2 4 1 2 0 variance of the elements of x (calculated on n – 1); if x is a matrix or a data.frame,
the variancecovariance matrix is calculated
m h D 3 1 D # @ × # D 3 var(x) ou cov(x) 1 3 # 1 6 @ j 4 4 0 0 0 0 2 0 0 # @ # D 0 3 0 0 2 # 1 0 2 6 # # 1 0 0 6 4 correlation matrix of x if it is a matrix or a data.frame (1 if x is a vector)
h 2 0 # 6 2 D 4 2 4 D 3 D D 3 ! 1 ! # 0 covariance between x and y, or between the columns of x and the columns of y if they
are matrices or data.frames
$ 1 3 @ # 1 D 1 3 @ # 1
var(x,y) ou
cov(x,y) $ H 1 $ 1 D H # 1 6 # 2 0 0 2 0 0 0 3 4 ! 0 # 3 4 0 linear correlation between x and y, or correlation matrix if they are matrices or
data.frames
# 3 $ D 3 1 # 1 $ 1 D @ j cor(x,y) 4 H 1 cor(x) m ¦ ¥ S 2 product of the elements of x
¦ ¥ b U prod(x) ¦ ¥ S sum of the elements of x
D G U ` ¥ S S sum(x) # 1 j 2 3 These functions return a single value (thus a vector of length one), except range() which
returns a vector of length two, and var(), cov() and cor() which may return a matrix. The
following functions return more complex results.
¥ ¥ ¥ ¥ G ¤ c ¦ ¤ ¦ ¥ U U c ¥ U ¤ U ¤ R ` a ¥ ¥ F y b S F ¥ F U b T S ¥ ¥ S S R R ¥ F U 4 G y b 0 F b 2 S F 4 3 # 1 S S D 1 3 2 4 1 @ reverses the elements of x
1 3 6 sorts the elements of x in increasing order; to sort in decreasing order: rev(sort(x))
# F # R computes the logarithm of x with base base ! H @ # pmin(x,y,...) a vector which ith element is the minimum of x[i], y[i], ...
j @ # # H 6 j pmax(x,y,...) id. for the maximum
0 2 0 3 2 0 3 @ 4 3 D 0 0 cumsum(x) a vector which ith element is the sum from x[1] to x[i]
3 3 @ 1 3 # H # 6 cumprod(x) id. for the product cummin(x) id. for the minimum cummax(x) id. for the maximum match(x,y) returns a vector of same length than x with the elements of x which are in y (else NA) which(x==a) returns a vector of the indices of x if the comparison operation is true (TRUE), i.e. the
values of i for which x[i]==a (or x!=a; the argument of this function must be a variable of
mode logical) # @ @ @ h g $ m # H H # 6 @ h p @ # # # 6 @ j ! ! h 6 @ # @ @ # H @ 6 m ... which(x!=a) m F U R ranks of the elements of x log(x,base) ¦ S rank(x) U sort(x) D ` ¥ F U rev(x) S c ¦ ¥ rounds the elements of x to n decimals
¥ F round(x,n) ` & 16
Ø Ø Ø choose(n,k) ! computes the combinations of k events among n repetitions = n!/[(n – k)!k!]
m Ù h f × × 6 × # @ # na.omit(x) ! suppresses the observations with missing data (NA) (suppresses the corresponding line if x
is a matrix or a data.frame)
h m h g # @ H 6 @ m na.fail(x) returns an error message if x contains NA(s)
m h 0 B 2 0 g 1 1 # D 3 1 1 @ table(x) returns a table with the numbers of the differents values of x (typically for integers or
factors)
0 2 4 4 0 2 4 h 2 1 $ # 2 0 2 $ D @ ! 6 1 0 3 1 @ 4 ! H 1 @ m # subset(x,...) returns a selection of x with respect to criteria (...) depending on the mode of x (typically
comparisons: x$V1 < 10); if x is a data.frame, the option select allows the user to
identify variables to be kept (or dropped using a minus sign )
4 4 0 2 h 0 $ # 3 4 h 0 1 1 1 0 H 1 # 0 2 0 4 # 0 @ 4 0 m $ H 2 0 1 # 1 @ 0 m 1 3 # j m ! h ! @ @ 6 $ Í 3.4.3 Matrix computation
§ ¤ ¦ § ¦ ¦ R ¦ ¦ Í § Î ¤ ¦ ` ¦ ¦ R ¦ Î ` ¤ R has facilities for matric computation and manipulation. A matrix can be created with the
function matrix():
¥ ¥ ¥ ¥ F S ¥ F y b ¥ S U G S b S ¥ ¥ S G U b ¥ F b F ¦ ` ¥ S S U > matrix(data=5, nr=2, nc=2)
[,1] [,2]
[1,]
5
5
[2,]
5
5
> matrix(1:6, nr=2, nc=3)
[,1] [,2] [,3]
[1,]
1
3
5
[2,]
2
4
6
R ¤ ¦ R ¤ ¤ ¦ ¦ ¦ § ¦ ` ¤ The functions rbind() and cbind() bind matrices with respect to the lines or the columns,
respectively:
¥ ¥ ¥ ¥ ¥ ¥ b F U S G F a ¥ S F b S S S S U ¥ T G c > m1 < matrix(data=1, nr=2, nc=2)
> m2 < matrix(data=2, nr=2, nc=2)
> rbind(m1,m2)
[,1] [,2]
[1,]
1
1
[2,]
1
1
[3,]
2
2
[4,]
2
2
> cbind(m1,m2)
[,1] [,2] [,3] [,4]
[1,]
1
1
2
2
[2,]
1
1
2
2
¤ ¦ ¦ R ¦ ¦ ` ¤ ` ¤ The operator for the product of two matrices is ‘%*%’. For example, considering the two
matrices m1 and m2 above:
¥ ¥ ¥ S F S G b F y ¥ F ¥ ¥ b F U a ¥ G F F F G § ¦ ¥ b c S b F b > rbind(m1,m2) %*% cbind(m1,m2)
[,1] [,2] [,3] [,4]
[1,]
2
2
4
4
[2,]
2
2
4
4
[3,]
4
4
8
8
[4,]
4
4
8
8
> cbind(m1,m2) %*% rbind(m1,m2)
[,1] [,2]
[1,]
10
10
[2,]
10
10
¤ ¦ R ¦ ` ¦ ¤ ¦ ` ¤ ¤ ¦ ¦ ¦ ` ¦ ¦ ¤ The transposition of a matrix is done with the function t(); this function also with a
¥ ¥ S ¥ S ¥ U ¥ S S ¥ ¥ S U ¥ F y b ¥ S G S F data.frame.
§ § The function diag() can be used to extract or modify the diagonal of a matrix, or to build
diagonal matrix.
¥ U ¥ ¥ ¥ T ¥ ¥ ¥ U U ¥ a 17
W > diag(m1)
[1] 1 1
> diag(rbind(m1,m2) %*% cbind(m1,m2))
[1] 2 2 8 8
> v < c(10,20,30)
> diag(v)
[,1] [,2] [,3]
[1,]
10
0
0
[2,]
0
20
0
[3,]
0
0
30
> diag(2.1, nr=3, nc=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 2.1 0.0 0.0
0
0
[2,] 0.0 2.1 0.0
0
0
[3,] 0.0 0.0 2.1
0
0 > diag(3)
[,1] [,2] [,3]
[1,]
1
0
0
[2,]
0
1
0
[3,]
0
0
1 > diag(m1) < 10
> m1
[,1] [,2]
[1,]
10
1
[2,]
1
10 18
( Ú 4 Graphics with R
q Û § R offers a remarkable variety of graphics. To get an idea, one can type demo(graphics). It is
not possible to detail here the possibilities of R in terms of graphics, particularly each graphic
function has a large number of options making the production of graphics very flexible. I will
first give a few details on how to manage graphic windows.
¥ ¥ G ¥ ¥ G T T c § ¥ G T ¥ G U § ¥ ¥ G ¥ ¥ ¥ G G § § ¥ T ¥ G c ¥ ¥ G U G ¦ ¦ U ¤ U ¤ R ¦ ` ¥ S G F S ¦ ¦ ¥ b F c ` ¥ S { 4.1 Managing graphic windows
Ë Ö
 } ¶
}
} z ¨ ¨ ¨ Ð Ð 4.1.1 Opening several graphic windows
¬ § Ý ¯ Ü ª Ý ® ® ¯ ¯ Î ¥ ¥ ¥ ¥ G U G G G Î G U § § ¥ T £ ¥ T ¥ G Ï When a graphic function is typed, a graphic window is open with the graph required. It is
possible to open another window by typing:
¥ G T G > x11()
§ § § The window so open becomes the active window, and the subsequent graphs will be displayed
on it. To know the graphic windows which are currently open:
¥ G T ¥ G U ¥ U ¥ G c R ¤ ¦ ¤ ¦ ¦ ¤ ¤ v ¦ a ¥ S G ¥ S T F F F U S G ¥ F S S > dev.list()
windows windows
2
3
§ § The figures displayed under windows are the numbers of the windows which can be used to
change the active window:
¥ ¥ ¥ U U U G T U ¥ ¥ c > dev.set(2)
windows
2
4.1.2 Partitioning a graphic window
¬ § Ý Ý § Î ¦ ¦ ¦ ¤ ¦ ¤ ¦ ¦ ¦ Î ` ¤ The function split.screen() partitions the active graphic window. For instance,
split.screen(c(1,2)) divide the window in two parts which can be selected with
screen(1) or screen(2); erase.screen() erases the last drawn graph.
¥ ¥ S S F S G F ¥ ¥ ¥ S c F G S ¤ ¦ R ¥ § ¤ ¦ ¤ ¦ ¥ ¥ ¦ F S U ¤ ¦ ¥ S a ¥ ¦ ¥ G S S c ¥ ¥ G The function layout() allows more complex partitions: it partitions the active graphic
window in several parts where the graphs will be displayed successively. For example, to
divide the window in four equal parts:
¥ G ¥ ¥ ¥ ¥ ¥ ¥ ¥ G c G G U e R R ¦ R ¦ § R R ¦ ¤ ¤ ¤ R ¦ ¦ ¥ ¥ G b F y T c U G T G ¥ F F F R G F ` S c ¦ S ¦ ¤ ¦ ¥ ¦ ¥ F G F U S U S c > layout(matrix(c(1,2,3,4), 2, 2))
e ¤ ¦ ¦ ¦ ` ¤ ¦ § ¤ ` § ¤ ¦ ¤ ¤ where the vector gives the numbers of the subwindows, and the two figures 2 indicates that
the window will be divided in two rows and two columns. The command:
¥ ¥ ¥ ¥ S F ¥ ¥ S U S ¥ F U b U ¥ S ¥ F c F c § ¥ ¥ U ¥ c > layout(matrix(c(1,2,3,4,5,6), 3, 2))
e ¤ R ¦ ¦ ¤ ¦ § ¦ R R ¦ will create six subwindows, three in row, and two in column, whereas:
¥ F S b ¥ S U S F S ¥ F S U F y > layout(matrix(c(1,2,3,4,5,6), 2, 3))
e will also create six subwindows, but two in row, and three in column. The subwindows may
be of different sizes:
¦ § ¤ R ¦ ¤ ¦ § ¦ § ¦ R R R ¦ a ¥ T b S U S b U S F ¥ S F S ¥ ¥ U S U F y ¥ > layout(matrix(c(1,2,3,3), 2, 2))
e § 19
P § § will open two subwindows in row in the left half of the window, and a third subwindow in
the right half. Finally, to create an inlet in a graphic:
¥ ¥ ¥ ¥ ¥ U G U ¦ ¤ ¦ R ¦ R R ¦ ` R ¤ ¤ ¦ ¤ ¥ G F ¥ S S ¥ S ¥ F S T ¥ F > layout(matrix(c(1,1,2,1), 2, 2), c(3,1), c(1,3))
e ¦ § ¤ ` ¦ ¦ ¦ R ¤ ¦ ¦ ¤ the vectors c(3,1) and c(1,3) giving the relative dimensions of the subwindows.
¥ S ¥ S U S b ¥ ¥ F c S S F c ¥ c To visualize the partition created by layout() before drawing the graphs, we can use the
function layout.show(2), if, for example, two subwindows have been defined.
¤ ¤ ¤ ¦ ¥ ` § § ¦ ¥ S U G ¦ ¥ F S F F T e § ¤ ¥ F ¦ ¥ R ¦ a ¥ S F G U c § ¥ c ¥ G U U e ¹ {
Ë 4.2 Graphic functions
~
 ¸ } ¶ z § Here is a brief overview of the graphics functions in R.
¥ ¥ G U c plot of the values of x (on the yaxis) ordered on the xaxis
m h ß @ Þ 6 plot(x,y) ! bivariate plot of x (on the xaxis) and y (on the yaxis)
h m h $ Þ 6 ß sunflowerplot(x,y) m id. than plot() but the points with similar coordinates are drawn as flowers which
petal number represents the number of points
0 4 2 0 4 0 0 0 # H H 1 H 0 1 # 3 1 H 2 ! 1 3 1 @ 1 @ ! 4 0 ! 0 1 3 1 @ piechart(x) c plot(x) circular piechart
0 4 0 # # @ # boxplot(x) “boxandwhiskers” plot stripplot(x) plot the values of x on a line (an alternative to boxplot()for small sample sizes) 4 k 0 ! 0 m 4 4 4 0 4 à 3 3 2 0 2 0 0 h 4 1 H 2 4 4 6 1 2 D 1 4 1 1 D 2 @ 2 6 4 0 0 coplot(x~yz)
! bivariate plot of x and y for each value of z (if z is a factor)
m h # à à @ # 6 1 $ D 6 interaction.plot
(f1,f2,x) if f1 and f2 are factors, plots the means of y (on the yaxis) with respect to the values
of f1 (on the xaxis) and of f2 (different curves) ; the option fun= allows to choose
the summary statistic of y (by default fun=mean)
m @ 6 h # H $ Þ # j 4 4 0 # H 1 2 m 2 0 2 h ¨ 2 0 m 6 2 # @ 2 h 1 4 m 2 ! 1 2 h 0 $ # $ 1 ß 0 @ D $ 3 3 @ matplot(x,y)
! bivariate plot of the first column of x vs. the first one of y, the second one of x vs. the
second one of y, etc.
â # $ á ã â j # @ á 6 ã 2 # $ 1 1 # j ! dotplot(x) if x is a data.frame, plots a Cleveland dot plot (stacked plots linebyline and
columnbycolumn)
h $ o # 6 j ! m @ ! 4 0 0 ! 4 ! 0 4 4 # $ @ 0 0 # 2 pairs(x)
0 if x is a matrix or a data.frame, draws all possible bivariate plots between the
columns of x
1 H 6 H D 3 D j 2 4 D ! ! " 1 3 @ # plot.ts(x) ! if x is an object of class ts, plot of x with respect to time, x may be multivariate but
the series must have the same frequency and dates
@ 6 @ $ # H j # # j # $ A @ 6 @ ts.plot(x) ! id. but if x is multivariate the series may have different dates and must have the same
frequency3
6 @ 6 $ 6 @ @ ä # $ 1 A @ 2 hist(x) histogram of the frequencies of x barplot(x) histogram of the values of x qqnorm(x) quantiles of x with respect to the values expected under a normal law # A @ H @ # @ 6 @ 6 # H @ A qqplot(x,y) quantiles of y with respect to the quantiles of x
! A @ # H $ creates a contour plot (data are interpolated to draw the curves), x and y must be
vectors and z must be a matrix so that dim(z)=c(length(x),length(y))
m h @ $ A @ contour(x,y,z) 6 # @ H @ # # j ! @ # 6 image(x,y,z) id. but with colours (actual data are plotted) persp(x,y,z) id. but in 3D (actual data are plotted) m ! h # @ @ m ¦ ¤ ` ¦ R ¤ ¦ # H R ¤ ¤ @ ! h % ¦ # @ ` @ § ¦ ¤ ¦ ` ¤ For each function, the options may be found with the online help in R. Some of these options
are identical for several graphic functions; here are the main ones (with their possible default
values):
¥ ¥ S ¥ G R b ` R § ¦ S ¦ ¤ ¥ ¤ G U G S ¦ ¥ ¥ ¥ S ¦ S ¤ ¥ U F S S b T ¤ ¦ ` ¥ ¥ S ¦ b F F ¥ G S ¤ S R ` F U R ¦ ¦ ¥ ¥ S S G U F F F c S F R U 0 0 4 0 2 0 2 0 k 0 0 0 2 The function ts.plot() is in the package ts and not in base as for the other graphic functions listed in the
table (see § 5 for details on packages in R).
© 3 1 1 # 1 @ # 1 1 1 # 1 0 m 1 k 4 0 2 r # 1 @ 4 h ! 1 # 1 % c 20
7 ¦ ¦ ` ¦ ¦ ¤ R ¤ ` ¦ if TRUE superposes the plot on the previous one (if it exists)
if FALSE does not draw the axes
specifies the type of plot, "p": points, "l": lines, "b": points connected by
lines, "o": id. but the lines are over the points, "h": vertical lines , "s": steps,
the data are represented by the top of the vertical lines, "S": id. but the data
are represented by the bottom of the vertical lines.
xlab=, ylab= annotates the axes, must be variables of mode character (either a character
variable, or a string within "")
main=
main title, must be a variable of mode character
subtitle (written in a smaller font)
sub=
g ¥ ¥ ¥ S y U F c ¥ G y a ¥ S G G F add=FALSE
axes=TRUE
type="p"
¿ G U ¤ ` g ¥ ¦ ¥ ¾ ¿ F S y Â ½ Á § ¥ ¥ ¥ ¥ ¥ ¥ G T G G G G T § ¥ ¥ ¥ G ¥ ¤ § ¥ ¥ ¦ ¦ R R ¤ ¤ ¥ F R R ¤ ¥ F § ¥ F ¦ ¥ S ` ¥ ¤ ¤ ¥ T ¦ S ¤ § ¥ G c ¦ ¥ U ` ¥ S ¥ c ¦ ¥ U ¥ G c ¥ ¤ ¥ R § F ¥ S F T ¦ ¥ F ¥ b c ` G § § G F F ¤ ¥ ¥ ¥ ¥ ¥ F F F b F c b U S y S § ¥ ¥ c § § ¥ ¥ ¥ c ` R U R ¦ ¦ ¥ ¥ R ¦ ¥ § ¥ ¥ S F b S S F U e
4.3 Lowlevel plotting commands
~
} ~
 É z R R R ¤ ¤ ¦ ¦ R ` ` ¤ ¦ ¤ ¦ ` ¦ ¤ ` ¤ R has a set of graphic functions which affect an already existing graph: they are called lowlevel plotting commands. Here are the main ones:
¥ F ¥ G T F ¥ S y F T ¥ S ¥ S ¦ S G U F ¤ ¦ R R R Y ¥ S S ! ¥ b F 0 F S h @ 1 # 1 0 1 4 0 ! 0 0 4 0 0 1 H ! @ 0 adds text given by labels at coordinates (x, y); a typical usage is:
plot(x,y,type="n"); text(x,y,names)
m h @ # $ $ D 1 # 1 $ 6 D G c text(x,y,labels,...) ¥ S id. but with lines
b 0 adds points (the option type= can be used) lines(x,y) m b points(x,y) j segments(x0,y0,x1,y1) draws a line from point (x0,y0) to point (x1,y1) arrows(x0,y0,x1,y1,
angle=30, code=2) id. with an arrow at point (x0,y0) if code=2, at point (x1,y1) if code=1, or at both
points if code=3; angle controls the angle from the shaft of the arrow to the
edge of the arrow head m h m h Þ ß Þ ! m m Þ Þ H H j H å j ß å H ß j h j å j h ß å j # H abline(a,b) draws a line of slope b and intercept a abline(h=y) draws a horizontal line at ordinate y abline(v=x) draws a vertical line at abcissa x abline(lm.obj) draws the regression line given by lm.obj (see § 5) rect(x1,y1,x2,y2) draws a rectangle which left, right, bottom, and top limits are x1, x2, y1, and y2,
respectively # 0 0 4 1 1 ! 0 1 4 à 4 m H # H 0 0 0 4 1 # 6 H ! h $ 6 æ H ! $ $ j j j j j j # H # H j 4 0 $ # 6 polygon(x,y) draws a polygon linking the points with coordinates given x and y legend(x,y,legend) adds the legend at the point (x,y) with symbols given by legend 0 0 0 1 $ 1 D 6 0 1 ! 1 m 6 0 1 4 4 1 $ 1 $ H h k H ! $ 0 # H $ j title() ! adds a title and optionally a subtitle
@ $ axis(side,vect) ! adds an axis at the bottom (side=1), on the left (2), at the top (3), or on the right
(4); vect (optional) gives the abcissa (or ordinates) where tickmarks are drawn
m h m h m ! h m # j m j h j H H h # m h 6 rug(x) draws the data x on the xaxis as small vertical lines locator(n, type="n",
...) returns the coordinates (x,y) after the user has clicked n times on the plot with the
mouse; also draws symbols (type="p") or lines (type="l") with respect to
optional graphic parameters (...); by default nothing is drawn (type="n") 0 4 0 k H 0 # 4 6 1 # # 2 3 H ß m 0 h @ Þ 1 # 1 @ ß j 0 m 0 h 4 m 4 h ! 4 # H m 0 h 0 1 4 1 H 1 2 ! m $ H 0 h 3 @ 4 @ $ 3 0 1 3 # 1
§ Note the possibility to add mathematical expressions on a plot with text(x, y,
where the function expression() transforms its argument in a
mathematical equation according to a coding used in the typesetting TeX. For example,
text(x, y, expression(Uk[37]==over(1, 1+e^{epsilon*(Ttheta)}))) will display,
on the plot, the following equation at point of coordinates (x,y):
¥ ¥ ¥ G ¥ ¥ ¥ G ¥ ¥ G T ¦ ¦ ¥ ` ¦ ¥ S S b ¥ F U b F S ç ¥ R ¦ R R ¤ S F U ¦ ¦ ¦ ¦ R ¦ ¤ a F y ¤ ¥ S ¦ b ¤ ¥ F R G ` expression(...)), ¥ S ¥ G ¥ ¥ S T ¥ S U S F ¥ S ¥ b U b ¦ Â ` ¦ F ¥ S G è ë
ó ò ñ ð
ï î í ì ê ¦ ¥ S ¦ ¥ é T 1e G 1 T Uk 37
¦ ¥ S R R ` S ¤ R ¥ U ¤ ¥ ¥ G S 21
¤ ¦ ¤ ¦ ` ¤ R § ¦ ¦ ¦ R ¦ To include in an expression a variable we can use the function substitute() together with
the function as.expression(); for example to include a value of R2 (previously computed
and stored in an object named Rsquared):
¥ ¥ ¥ ¥ a ¥ F S e R S U S U F S c F G S y S S U e ¦ ` R R ¦ R ` ¦ ` ¤ ô ¥ ¥ G U b T U F c G U c ¥ S G U b F y S S ¥ U § ¥ ¥ e > text(x, y, as.expression(substitute(R^2==r, list(r=Rsquared))))
Á Á e e e will display on the plot at the point of coordinates (x,y):
¦
` ¦ ¥ T ¤ R ¥ S y ¥ F S ¥ ¤ R ¥ ¦ R R ¦ ¥ G G S G T R2 = 0.9856298
õ R R ` ¤ ` ¦ R ¦ ¤ R R ¦ To display only three decimals, we can modify the code as follows:
¥ a ¥ b T S b F S T G T > text(x, y, as.expression(substitute(R^2==r, list(r=round(Rsquared,3)))))
e e e e ¦ R R R ¦ ¤ ¦ ¤ which will result in:
¥ S F U R2 = 0.986
õ ¦ R ¦ ¤ ¤ ¦ R ¦ ¦ ¤ ¦ R R ¦ Finally, to write the R in italics (as are the mathematical conventions):
y ¥ ¥ S S ¥ S c ¥ b ¥ b ¥ F ¥ ¥ S F S T >text(x, y, as.expression(substitute(italic(R)^2==r,
list(r=round(Rsquared,3)))))
Á e e e e
R2 = 0.986
ô { 4.4 Graphic parameters
~ } }   } z § In addition to lowlevel plotting commands, the presentation of graphics can be improved
with graphic parameters. They can be used either as options of graphic functions (but it does
not work for all), oe with the function par() to change permanently the graphic parameters,
i.e. the subsequent plots with respect to the parameters specified by the user. For instance,
l’instruction suivante:
¥ G c ¥ ¥ G ¥ ¥ ¥ G G § ¥ ¥ c § ¥ ¥ ¥ U G U ¥ U ¥ G ¥ G ¥ G ¥ ¥ ¥ ¥ G T ¥ G G T ¥ ¥ U ¦ ¤ § ¦ ` ¦ ¤ ¤ ¦ R § ¤ ¦ ¥ ¥ S S F F ¥ U G ¥ F T b F ¥ ¥ ¥ G G ¥ ¥ F ¥ G S U U ¦ ¦ ¦ ¥ ¥ S c ¥ S U U F S > par(bg="yellow")
¦ ¤ ¤ v § R R ¤ ¦ R § R R R R ¦ will draw all subsequent plots with a yellow background. There are 68 graphic parameters,
some of them have very close functions. The exhaustive list of graphic parameters can be read
with ?par; I will limit the following table to the most usual ones.
a ¥ ¥ F b F G G F F § F ¦ S ¤ ` ¦ F U R ¥ ¥ G T ¦ ¤ ¤ ¦ S ` U F U R ¤ ¤ ` a ¥ F S ¥ F b F G G ¥ F ¥ c U ¥ S y R ¤ R S § U F T ¦ R R ` c ¤ b c ¦ ¦ R R R b ¦ ¤ ¦ d ¥ S U ¥ ¥ ¥ ¥ b U ¥ ¥ S b " " " adj controls text justification (0 leftjustified, 0.5 centered, 1 rightjustified)
m 7 @ 7 # h j @ # @ # j bg specifies the colour of the background (e.g.: bg="red", bg="blue", ... the list of the 657 available
colours is displayed with colors())
4 ! 4 0 & 2 0 4 6 k h W ! 2 4 0 j 1 @ # 2 0 @ # # j m 0 4 0 0 4 H 4 4 4 4 ! $ 2 @ # 4 bty controls the type of box drawn around the plot, allowed values are: "o", "l", "7", "c", "u" or "]"
(the box looks like the corresponding character); if bty="n" the box is not drawn
j j j @ 6 H ! 2 0 1 @ m 1 H D $ 1 # j 0 H j 1 1 D 0 k 0 4 k 4 ! # # 1 1 # h D ! cex a value controling the size of texts and symbols with respect to the default; the following parameters
have the same control for numbers on the axes, cex.axis, annotations on the axes, cex.lab, the
title, cex.title, and the subtitle, cex.sub
H @ j D # H $ 0 1 1 ! 2 1 1 j j D # 4 1 3 1 @ @ 6 1 # 3 6 j 4 0 ! 4 0 @ 1 j j j ! controls the colour of symbols; as for cex there are: col.axis, col.lab, col.title, col.sub
j j ! $ @ # ! an integer which controls the style of text (0: normal, 1: italics, 2: bold, 3: bold italics); as for cex
there are: font.axis, font.lab, font.title, font.sub
m % 7 # # j j j # h font col $ # # H j j j j R 22
las an integer which controls the orientation of annotations on the axes (0: parallel to the axes, 1:
horizontal, 2: perpendicular to the axes, 3: vertical)
7 h # # H j m % # 6 # @ j & j ! lty controls the type of lines, can be an integer (1: solid, 2: dashed, 3: dotted, 4: dotdash, 5: longdash, 6:
twodash), or a string of up to eight characters (between "0" and "9") which specifies alternatively the
length, in points or pixels, of the drawn elements and the blanks, for example lty="44" will have the
same effet than lty=2
% j 4 0 $ j 4 h j 0 2 0 j 0 1 # # # 1 H 1 $ # j ! h j m 6 H 0 2 # 0 m # @ 1 H j 4 4 0 4 2 k 4 ! 4 6 H 3 D 1 2 1 4 0 0 1 3 1 H 0 j D 4 1 1 j 1 j 2 1 2 3 lwd a numeric which controls the width of lines
! H # # # H @ ! mar a vector of 4 numeric values which control the space between the axes and the border of the figure of
the form c(bottom, left, top, right), the default values are c(5.1, 4.1, 4.1, 2.1)
@ H # @ # # H @ # 6 @ # 6 6 @ j mfcol a vector of the form c(nr,nc) which partitions the graphic window as a matrix of nr lines and nc
columns, the plots are then drawn in columns (cf. § 4.1.2)
H # H m # H # 6 h # @ # H @ # j mfrow id. but the plots are drawn in rows (cf. § 4.1.2) pch controls the type of symbol, either an integer between 1 and 25, or any single character within "" m ! h # # $ H H @ ! # H ! H j $ $ # j 4 ! 2 0 0 0 4 0 0 ps an integer which controls the size in points of texts and symbols
4 3 $ 1 0 D 0 0 1 4 1 à 2 0 2 1 # # 0 0 H 1 1 pty a character which specifies the type of the plotting region, "s": square, "m": maximal
3 D 3 A @ 1 j 0 2 4 4 1 $ # # H # # 2 j 0 2 k k 0 2 4 0 2 0 0 4 tck a value which specifies the length of tickmarks on the axes as a fraction of the smallest of the width
or height of the plot; if tck=1 a grid is drawn
H 3 1 # D 1 3 # 0 1 0 2 # 0 # 4 H 2 @ H 1 a value which specifies the length of tickmarks on the axes as a fraction of the height of a line of text
(by default tcl=0.5)
# # # # H 4 m @ 6 2 ! h @ $ if xaxt="n" the xaxis is set but not drawn (useful in conjonction with axis(side=1, ...)) yaxt if yaxt="n" the yaxis is set but not drawn (useful in conjonction with axis(side=2, ...)) xaxt 6 0 tcl " m ! h # H # @ # @ @ ß ! # @ h H H " m @ H @ Þ 23
% 5 Statistical analyses with R
ö ÷ q w r p q p r ø § § Even more than for graphics, it is impossible here to go in the details of the possibilities
offered by R with respect to statistical analyses. A wide range of functions is available in the
base package and in others distributed with base.
¥ ¥ ¥ ¥ ¥ ¥ G ¤ ¦ R § R ¥ G ¦ ¦ ¦ ` ` ¦ G R R ¦ c ¦ ¤ ¦ § ` ` ¥ ¥ S ¥ S c S S U F ¥ ¥ ¥ ¥ ¥ S T ¤ G ¦ § ¥ ¦ ¦ ¥ F ¤ ¦ ¥ F T v ¥ F U F S S G Several contributed packages increase the potentialities of R. They are distributed separately
and must be loaded in memory to be used by R. An exhaustive list of the contributed
packages, together with their descriptions, is at the following URL: http://cran.rproject.org/src/contrib/PACKAGES.html. Among the most remarkable ones, there are:
R § ¦ ¦ ¤ ` ¦ ¦ R ¦ ¤ ¦ v § ¦ R a ¥ ¥ F T ¥ G § ¦ ¥ F F U ¤ ` ¦ ¥ ¥ ¥ S T R ¦ ¤ ¥ G F § S ¥ G § F U ¦ S R F c § ¥ ¥ ¥ F U ¥ ¥ S
¥ c U S y T ¥ U F T b b S b U S
¥ ¥ ¥ ¥ ¥ ¥ G ¥ ¥ ¥ G § §
¥ ¥ ¥
¥ ¥ generalised estimating equations;
multivariate analyses, includes correspondance analysis
(by contrats to mva which is distributed with base) ;
linear and nonlinear models with mixedeffects;
survival analyses;
trees and classification;
timeseries analyses
(has more methods than ts which is distributed with base).
¥ ¥ R R ¦ R ¦ ¦ R ¥ S T S S ¤ G F F ¦ ¥ S U § ¦ ¦ ¥ ¦ ¦ ¥ S T ¤ G ¥ U ¦ G
¥ gee
multiv F c b U ¤ e § ¥ ¥ ¥ ¥ F U F ¥ S T ¥ T c c U e ¥ R ¦ ¦ § ¥ ¦ ¦ ¥ ¦ ¤ ¦ ¤ ¤ F ¥ b ¤ ¥ ¥ ¦ S T ¤ nlme
survival5
tree
tseries
¤ ¥ ¥ F U § S b F b § Jim K. Lindsey distributes on his site (http://alpha.luc.ac.be/~jlindsey/rcode.html) several
interesting packages:
¥ ¥ c T G U ¥ ¥ ¥ f ¥ G U T ¥ ¥ G ¦ R ` ` R R ` ¤ R ¦ R R ` ¦ R ¦ manipulation of molecular sequences (includes the ports of ClustalW and of flip)
nonlinear generalized models;
probability functions and generalized regressions for stable distributions;
models for normal repeated measures;
models for nonnormal repeated measures;
models and procedures for historical processes (branching, Poisson, ...)
tools for nonlinear regressions and repeated measures.
£ ¥ G ¥ S ¥ F U ¥ G S U S F U b U R ¦ S F S F ¥ ¥ S § ¥ b R S § ¥ S ¦ b § G U R dna
gnlm
stable
growth
repeated
event
rmutil
S § ¥ U U ¥ G U ¥ G U § ¥ G G U ¦ ¦ R ` R ¥ F b U G F S S F F F S S S G T ¥ F e “An Introduction to R” (pp 5163) gives an excellent introduction to statistical models with R.
Only some points are given here in order that a new user can make his first steps. There are
five main statistical functions in the base package:
¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ U ¥ G c ¥ ¥ G ¥ ¥ G U ¥ ¥ U G c ¥ ¥ ¥ T ¥ ¥ G U R ¦ linear models;
generalised linear models;
analysis of variance
comparison of models;
loglinear models;
nonlinear minimisation of functions.
b R ¦ b F R ¦ F c R lm
glm
aov
anova
loglin
nlm S S R F c S T G ¥ ¥ U ¤ ¦ ¦ § ¦ ` ¤ ¦ ¤ ¤ ` ¦ R For example, if we have two vectors x and y each with five observations, and we wish to
perform a linear regression of y on x:
¥ S ¥ c F c ¥ T S y F ¥ c G c b y F T e Coefficients: S Call:
lm(formula = y ~ x) ¥ > x < 1:5
> y < rnorm(5)
> lm(y~x) G 24
x
0.1809
§ ¦ ¦ § ` R (Intercept)
0.2252
¤ ¦ ¦ ` ` As for any function in R, the result of lm(y~x) can be copied in an object:
¥ ¥ S S G ¥ S ¥ F U S S S U S T F > mymodel < lm(y~x)
§ if we type mymodel, the display will be the same than previously. Several functions allow the
user to display details relative to a statistical model, among the useful ones summary()
displays details on the results of a model fitting procedure (statistical tests, ...), residuals()
displays the regression residuals, predict() displays the values predicted by the model, and
coef() displays a vector with the parameter estimates.
¥ ¥ ¥ U c T U ¥ ¥ G c ¥ G T G T R ` ¤ R R ¦ ¦ ¥ S U ¦ ¥ S U b ¥ ¥ R ¥ R ¦ R ¥ b ¦ ¥ ¥ F c G T F U e R ¥ ¦ ¦ ¥ ¥ ¦ ¥ ¥ ¦ ¥ F F U G ` R ` R ¥ ¤ R ¥ S ¦ R ¥ b ¦ ¥ F U S G T e § ¥ ¥ ¥ G T U ¥ c G T U G T ¥ ¥ ¥ ¥ ¥ ¥ G c G T > summary(mymodel)
e Call:
lm(formula = y ~ x)
e Residuals:
1
2
3
4
1.0070 1.0711 0.2299 0.3550
Á e 5
0.6490 Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept)
0.2252
1.0062
0.224
0.837
x
0.1809
0.3034
0.596
0.593
ù e ¿ ¿ Residual standard error: 0.9594 on 3 degrees of freedom
Multiple RSquared: 0.1059,
Adjusted Rsquared: 0.1921
pvalue: 0.593
Fstatistic: 0.3555 on 1 and 3 degrees of freedom,
e e e Á e e e ½ > residuals(mymodel)
1
2
3
4
1.0070047 1.0710587 0.2299374 0.3549681
5
0.6489594 e > predict(mymodel)
1
2
3
4
5
0.4061329 0.5870257 0.7679186 0.9488115 1.1297044
> coef(mymodel)
(Intercept)
x
0.2252400
0.1808929
§ § It may be useful to use these values in subsequent computations, for instance, to compute
predicted values with by a model for new data:
¥ ¥ ¥ ¥ ¥ ¥ G U U ¥ G U U U ¥ c ¥ U U U T § ¥ ¥ ¥ T U c > a < coef(mymodel)[1]
> b < coef(mymodel)[2]
> newdata < c(11, 13, 18)
> a + b*newdata
[1] 2.215062 2.576847 3.481312
º To list the elements in the results of an analysis, we can use the function names(); in fact, this
function may be used with any object in R.
¥ ¥ ¥ U ¥ U T ¥ ¥ ¥ § § ¥ ¥ T U T U "rank"
"df.residual"
"model"
e ¥ U ¥ "effects"
"qr"
"terms" ¥ > names(mymodel)
[1] "coefficients" "residuals"
[5] "fitted.values" "assign"
[9] "xlevels"
"call"
e e G 25
> names(summary(mymodel))
[1] "call"
"terms"
[5] "sigma"
"df"
[9] "fstatistic"
"cov.unscaled"
"residuals"
"r.squared" "coefficients"
"adj.r.squared"
e e e e ¦ R R ` e ¤ ¦ § R ¤ The elements may be extracted in the following way:
¥ ¥ S T ¥ S a ¥ F y b T S b > summary(mymodel)["r.squared"]
$r.squared
[1] 0.09504547
e e e Formulae are a keyingrediant in statistical analyses with R: the notation used is actually the
same for (almost) all functions. A formula is typically of the form y ~ model where y is the
analysed response and model is a set of terms for which some parameters are to be estimated.
These terms are separated with arithmetic symbols but they have here a particular meaning.
¥ ¥ T ¥ U ¥ ¥ ¥ ¥ U ¥ ¥ ¥ T ¥ T ¥ ¥ G T U ¥ T ¥ U U § ¥ ¥ ¥ ¥ ¥ ¥ G ¦ R ¦ G ¤ ¤ ¤ § ¥ S S b F ¥ F U G F c R § ¦ ¤ ¥ ¦ ¥ T b U ¤ T ¦ ¥ ¤ ¥ b ¥ a ¥ F T F G F b F additive effects of a and of b
a+b
a:b
interactive effect between a and b
a*b
identical to a+b+a:b
poly(a,n) polynomials of a up to degree n
^n
includes all interactions up to level n, i.e. (a+b+c)^n
a+b+c+a:b+a:c+b:c
The effetcs of b are nested in a (identical to a+a:b)
b%in%a
ab
removes the effect of b, for examples: (a+b+c)^na:b
a+b+c+a:c+b:c, y~x1 forces the regression through the origin
0+y~x)
¥ ¥ c § ¥ ¥ ¥ ¥ c ¥ ¥ ` R ¦ R ¥ F G b U S G T R ¦ ¦ is identical to
¥ ¦ ¦ ¥ R R ¦ ¦ ¥ ¥ R R R G c S U F S ¥ ¦ ¦ ¦ ¥ S ¥ ` F ¦ F R ` ` ` ` ¥ G b F y b c F ¦ ¦ ¤ ¤ ¤ ¦ ¥ ¤ ` ¥ ¥ S F F U S F F F We see that arithmetic operators of R have in a formula a different meaning than the one they
have in a classical expression. For example, the formula y~x1+x2 defines the model y = β1x1 +
β 2x2 + α, and not (if the operator + would have is usual meaning) y = β(x1 + x2) + α. To
include arithmetic operations in a formula, we can use the function I(): the formula
y~I(x1+x2) defines the model y = β (x1 + x2) + α.
¥ ¥ ¥ ¥ ¥ T U ¥ ¥ ¥ £ ¥ G c ¥ è ¤ ¥ ¥ R S U ¥ ¥ ¦ ¥ S is identical to
(id. for y~x+0, or ¥ G U é G c ¥ è è U é U c ¥ ¥ G U è ú R ` ¤ ¦ ¥ b U ` ¤ R ¥ F ` ¦ ¦ ¦ ¥ S S ¤ ¥ U S U b U F S ¥ S F R ¦ R b ¤ ¦ ¥ G ¦ F S U ` ¥ b è è S é õ § § The following table lists standard packages which are distributed with base.
¥ ¥ ¥ ¥ ¥ ¥ G U þ Description
¤ ¤ P a ck a g e
¡ ¦ ¨ § ¥ £ ý ¢ classical tests (Fisher, “Student”, Wilcoxon, Pearson, Bartlett, KolmogorovSmirnov, ...)
5 5 V j h # j @ j # # j ! ! $ $ $ # © # 6 # # $ 6 @ 0 1 0 4 1 1 1 0 0 2 0 ! 0 0 4 0 4 1 1 # 1 @ 1 0 @ 4 # 3 0 § $ 1 0 3 § ¥ U ¥ T G U > library(eda) A package must be loaded in memory to be used: timeseries analyses empirical distribution functions @ ts stepfun $ splines j splines nonlinear regression nls multivariate analyses mva modern regression: smoothing and local regression 6 résistant regression and estimation of covariance modreg û ü lqs 6 ý methods described in “Exploratory data analysis” by Tukey j ü eda m ÿ ctest & 26
6 The programming language R
w p p x 6.1 Loops and conditional executions
~
~ }
}  z § An advantage of R compared to softwares with pulldown menus is the possibility to program
simply a series of analyses which will be executed successively. Let us consider a few
examples to get an idea.
¥ ¥ G ¥ ¥ G T U ¥ ¥ ¥ G U G c § ¥ ¥ U T c U U T G T ¥ ¥ G ¤ ¦ § R ¤ ¤ ¦ ` R ¤ ` ¤ Suppose we have a vector x, and for each element of x with the value b, we want to give the
value 0 to another variable y, else 1. We first create a vector y of the same length than x:
¥ ¥ ¥ ¥ S c U ¤ ¤ ¥ ¥ c R ¥ S y ¤ b F ` ¦ S ` F y c R R § G c ¦ G U ¤ R £ ¥ ¥ ¥ S y S ¥ b ¥ F T ¥ ¥ F c F F T ¥ F c S U c > y < numeric(length(x))
> for (i in 1:length(x)) if (x[i] == b) y[i] < 0 else y[i] < 1
e § § Several instructions can be executed if they are placed within braces:
¥ ¥ G ¥ ¥ T ¥ U U c > for (i in 1:length(x))
>{
> y[i] < 0
...
>} > if (x[i] == b)
>{
> y[i] < 0
...
>}
§ Another possible situation is to execute an instruction as long as a condition is true:
¥ ¥ ¥ ¥ U ¥ ¥ U ¥ ¥ ¥ G U U > while (myfun > minimum)
>{
...
>} ¤ ¤ ¦ ` ¦ R ¦ ` ¦ e ¦ e º ¦ R R ¦ Typically, an R program is written in a file saved in ASCII format and named with the
extension .R. In the following example, we want to do the same plot for three different
species, the data being in three distinct files, the file names and species names are so used as
variables. The first command partitions the graphic window in three arranged as rows.
d ¥ ¥ d ¥ b ` ` ¦ S ¥ S ¤ b ` F S R S c ¤ F b ¦ R F R F ` G S ¥ S F ¥ F ¥ F G ¥ G ¦ ` ¤ R ¦ F b S G S b b ¦ ¥ S ¦ ¤ ¦ ¥ S ¥ S y ` ¥ U y ¥ S R ¦ S § S y ¦ ¥ F S ¤ ¥ S T ¦ ¥ b ¦ G T ¤ d ¥ a ¥ S R ¥ S G § ¥ ¥ ¥ ¥ ¥ G G # partition the window
layout(matrix(c(1,2,3), 3, 1,))
for(i in 1:3) {
if (i==1) { file < "Swal.dat"; species < "swallow" }
if (i==2) { file < "Wren.dat"; species < "wren" }
if (i==3) { file < "Dunn.dat"; species < "dunnock" }
data < read.table(file)
# read the data
plot(data$V1, data$V2, type="l")
title(species)
# adds the title
}
c e e e ¼ ¼ The character # is used to add comments in a program, R then goes to the next line. Note that
there are no brackets "" around file in the function read.table() since it is a variable of
mode character. The command title adds a title on a plot already displayed. A variant of
this program is given by:
¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ G ` R § ¦ ¦ ¦ ¦ ¦ U ` ¥ ¥ ¤ ¦ v ¥ § ¤ ¥ ¥ F ` S c ¦ R ¦ R S R S R U S S F U F ¦ S F F ¤ a ¥ ¥ ¥ ¥ S F c T G T F G S S b b T § S ¦ c ¦ b ¦ F F G b ¤ ¥ 27
W layout(matrix(c(1,2,3), 3, 1,))
# partition the window
species < c("swallow", "wren", "dunnock")
file < c("Swal.dat" , "Wren.dat", "Dunn.dat")
for(i in 1:3) {
data < read.table(file[i])
# read the data
plot(data$V1, data$V2, type="l")
title(species[i])
# add the title
}
e e Ê º º e º º These programs will work correctly if the data files *.dat are located in the working directory
of R; if they are not, the user must either change the working directory, or specifiy the path in
the program (for example: file < "C:/data/Swal.dat"). If the program is written in the
file Mybirds.R, it will be called by typing:
¥ ¥ ¥ ¥ ¥ ¥ ¥ T G T ¥ ¥ ¥ G G T ¥ ¥ ¥ ¥ ¥ ¥ T U ¥ ¥ ¥ U T ¥ G G G ¥ G § § § ¥ ¥ T T T > source("C:/data/Mybirds.R")
Á e or selecting it with the appropriate pulldown menu under Windows. Note you must use the
symbol “slash” (/) and not “backslash” (\), even under Windows.
¤ ¦ R R ¦ ¤ ¤ ¦ ¦ ¦ R £ ¥ ¥ ¥ U U S T F S U S b U S ¥ G U F G F G ¥ ¥ ¥ G S § F §
£ ¥ U c T ¹
6.2 Writing your own functions
~
~
¸ ¶
¶ z We have seen that most of the work of R is done with functions with arguments given within
parentheses. The user can actually write his/her own functions, and they will have the same
properties than any functions in R. Writing your own functions allows you a more efficient,
flexible, and rational use of R. Let us come back to the above example of reading data in a
file, then plotting them. If we want to do a similar analysis with any species, it may be a good
a function to do this job:
¥ ¥ ¥ c ¥ ¥ U ¥ ¥ ¥ £ ¥ U c
¥ ¥ c ¥ ¥ T ¥ U T ¥ U G U £ ¥ ¥ U ¥ T U U ¥ T § U ¥ G T § ¥ ¥ G ¥ G § ¥ ¥ c ¥ U U § ¥ ¥ G T ¥ T ¥ ¥ ¥ ¥ ¥ G T § ¦ ¤ ¦ ¥ ` ¥ ¥ S S U myfun < function(S, F) {
data < read.table(F)
plot(data$V1, data$V2)
title(S)
}
½ e e ¼ ¼ Then, we can, with a single command, read the data and plot them, for example
myfun("swallow", "Swal.dat"). To do as in the two previous programs, we can type:
¥ ¥ ¥ G ¥ ¥ G ¥ G ¥ G T U ¥ G c e layout(matrix(c(1,2,3), 3, 1,))
myfun("swallow", "Swal.dat")
myfun("wren", "Wrenn.dat")
myfun("dunnock", "Dunn.dat")
>
>
>
> e e ¦ R § º e Ê ¤ ¦ ` e ¦ ¤ e e R As a second example, here is a function to get a bootstrap sample with a pseudorandom resampling of a variable x. The technique used here is to select randomly an observation with
the pseudorandom number generator according to the uniform law ; the operation is repeated
as many times as the number of observations. The first step is to extract the sample size of x
with the function length and store it in n ; then x is copied in a vector named sample (this
operation insures that sample will have the same characteristics (mode, ...) than x). A random
number uniformly distributed between 0 and n is drawn and rounded to the next integer value
using the function ceiling() which is a variant of round() (see ?round for more details and
other variants) : this results in drawing randomly an integer between 1 and n. The
corresponding value of x is extracted and stored in sample which is finally returned using the
function return().
¥ F ¤ b S ¦ F ¥ G U ¦ G § b G R ¥ ¥ ¥ S ¦ ¥ F R S ¤ F U ¦ ¤ G ¤ R b § S y ¦ ` ¦ R a ¥ ¥ S ¥ F c S b T S ¥ ¥ F F U S U F y S c G b § ¥ ¥ ¥ G ¥ G ¥ ¥ U U § ¥ ¥ ¥ ¥ ¥ G ¥ § ¥ ¥ G ¥ c ¥ ¥ U ¥ ¥ T ¥ ¥ ¥ ¥ G c ¥ G U U ¥ ¥ ¥ ¥ ¥ ¥ ¥ c G U R ¦ ¤ ¥ U F c ¥ ¦ S ¥ § ¥ S y § ¥ S F U S S F S ¦ ¦ ¥ S R ` ¦ § ¥ F U b T F S F U b S U R ¦ ` ` ¥ ¦ ¦ ¤ ¦ ¤ ¦ ` ¥ S F b F e ¤ ¦ ¥ S F ¥ S c S S U U e § ¥ ¥ ¥ T ¥ ¥ U U ¥ ¥ U ¥ T ¥ ¥ c ¥ U G c ¦ ¥ e S S U ` 28
( bootsamp < function(x) {
n < length(x)
sample < x
for (i in 1:n) {
u < ceiling(runif(1, 0, n))
sample[i] < x[u]
}
return(sample)
}
e e e e e § Thus, one can, with a few, relatively simple lines of code, program a method of bootstrap with
R. This function can then be called by a second function to compute the standarderror of an
estimated parameter, for instance the mean:
¥ ¥ ¥ ¥ ¥ G G ` G ¤ ¦ T ` ¥ c § U R R § ¤ ¦ ` ¦ ¤ a ¥ S F F F F ¥ ¥ ¥ S G U ¥ b ¥ S S S U ¤ S ¦ ¥ S ¥ S T S U ` ¦ ¥ b S ¥ S S F ¥ F b F ¥ G b meanboot < function(x, rep=500) {
M < numeric(rep)
for (i in 1:rep) M[i] < mean(bootsamp(x))
print(var(M))
}
e e § § Note the value by default of the argument rep, so that it can be omitted if we are satisfied with
500 réplications. The two following commands will thus have the same effect:
¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ U U T U ¥ c ¥ ¥ ¥ c ¥ ¥ G U > meanboot(x)
> meanboot(x, rep=500)
R ` ¦ ¦ R ` § ¤ ¦ ` If we want to increase the number of replications, we can do, for example:
¥ G b F y S ¥ S G F F b ¥ S U F d ¥ S S > meanboot(x, rep=5000)
R ` ` ¦ ¦ ` ` ¦ ¦ ¦ ` ¤ ¦ R R ` ` ¤ The use of default values in the definition of a function is, of course, very useful and adds to
the flexibility of the system.
¥ ¥ S U U F T F c ¥ S U S ¥ S U S U c U ¤ U ` ¥ ¦ ¥ b § a ¥ S R ¦ § ¦ R ` ¤ ¥ T T ¥ y § The last example of function is not purely statistical, but it illustrates well the flexibility of R.
Consider we wish to study the behaviour of a nonlinear model: Ricker’s model defined by:
¥ ¥ ¥ ¥ ¥ T ¥ ¥ U ¥ ¥ ¥ U T ¥ ¥ G U G U § § ¥ T U Nt
K
¦ ` ` R R ¥ U N t exp r 1 ¤ T G " ¦ y ¦ ! 1 ¦ ¥ c Nt ¦ R ¦ R ¦ ¦ R ¦ ¤ This model is widely used in population dynamics, particularly of fish. We want, using a
function, simulate this model with respect to the growth rate r and the initial number in the
population N0 (the carrying capacity K is often taken equal to 1 and this value will be taken as
default); the results will be displayed as a plot of numbers with respect to time. We will add
an option to allow the user to display only the numbers in the last time steps (by default all
results will be plotted). The function below can do this numerical analysis of Ricker’s model.
£ ¥ S ¤ ¥ S U ¦ T § R ¦ ¥ ¦ ¦ F b G b ¤ ¥ S U S G U G ¤ ¥ S S T ¤ ¥ S a ¥ F U ¤ ¥ S F ¥ F ¥ ¦ ¥ F S U b T R ¦ ¤ R ¥ G ¥ F ¦ ¦ ¥ ` ¥ b b U S S U § ¥ ¥ U ¥ ¥ c ¥ ¥ U ¥ G T ¥ T G U # § § £ ¥ ¥ ¥ ¥ ¥ G § ¥ T ¥ G T ¥ U U § ¥ U ¥ G U ¥ ¥ ¥ ¥ G ¥ U T ¥ G T ¥ G § ¥ T ¥ U § ¥ ¥ U ¥ ¥ G U U ricker < function(nzero, r, K=1, time=100, from=0, to=time) {
N < numeric(time+1)
N[1] < nzero
for (i in 1:time) N[i+1] < N[i]*exp(r*(1  N[i]/K))
Time < 0:time
plot(Time, N, type="l", xlim=c(from,to))
}
e À $ À À À e À G 29
P Try it yourself with:
¥ ¥ U T T layout(matrix(1:3, 3, 1))
ricker(0.1, 1); title("r = 1")
ricker(0.1, 2); title("r = 2")
ricker(0.1, 3); title("r = 3")
e >
>
>
> 30
7 % 7 How to go farther with R ?
q x £ £ § § The basic reference on R is a collective document by its developers (the “R Development Core
Team”):
¥ ¥ G ¥ G c ¥ c ¥ T U c ¦ R R Development Core Team. 2000. An Introduction to R.
http://cran.rproject.org/doc/manuals/ Rintro.pdf.
y d ¥ ¥ % S R
y ¥ F U ¦ a ¥ S ` S b F S
b G c ¤
y ¥ G ¥ F S S U b ¥ F F G F S F ¥ G If you install the last version of R (1.1.1), you will find in the directory RHOME/doc/manual/
(RHOME is the path where R is installed), three files in PDF format, including the “An
Introduction to R”, and the reference manual “The R Reference Index” (refman.pdf) detailing
all functions of R.
¥ U ¥ ¥ T ¥ U ¥ T ¥ ¥ c ¥ U ¥ ¥ T ¥ G U ¥ ¥ G ¥ ¥ ¥ U U ¥ U ¦ ¦ R ¦ ¤ The RFAQ gives a very general introduction to R:
http://cran.rproject.org/doc/FAQ/RFAQ.html
¥ ¥ R ¤ F U
S F S F T c y a c
y ¥ S
¤
¥ ¥ b ¥ F F G F S F ¥ G For those interested in the history and development of R:
Ihaka R. 1998. R: Past and Future History.
http://cran.rproject.org/doc/html/interface98paper/paper.html.
¥ ¥ G c ¥ G G G ¥ U
¥
¥ ¥ G G § ¦ ¥ ¥ U
¥ ¥ ¥ T
¥ ¥ T § ¦ ¥ R G ¦ ¦ ¤ ¤ There are three discussion lists on R; to subscribe see:
http://cran.rproject.org/doc/html/mail.html
¥ F R ¤ R ¦ ¥ ¤ ¥
S b F U
¥ b ¥ S U R
¥ b ¥ F F G F S F F ¤
F ¥ G Several statisticians have written documents on R, for examples:
Altham P.M.E. 1998. Introduction to generalized linear modelling in R. University of
Cambridge, Statistical Laboratory. http://www.statslab.cam.ac.uk/~pat.
Maindonald J.H. 2000. Data Analysis and Graphics Using R—An Introduction. Statistical
Consulting Unit of the Graduate School, Australian National University.
http://room.anu.edu.au/~johnm/
¥ ¥ G ¥ ¥ U ¥ ¥ c c ¥ ¥ T ¥ ¥ c ¥ U §
¥ ¥ G §
¥ ¥ ¥ ¥ G U § ¥ ¥ ¥ T ¥ ¥ ¥ ¥ f ¥ ¥ G U ¥ ¥ T T c ¥ ¥ U ¥ ¥ ¥ U U ¤
¤
¥ b S U U S U b F ¥ G § Finally, if you mention R in a publication, cite original article:
Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics. Journal of
Computational and Graphical Statistics 5: 299314.
¥ ¥ ¥ ¥ U G U T T & s ¥ G v T ¥ U w ' x v v u v v v v u t a % 8 Index
( w ¦ ¤ ¦ ¦ ¦ ` ¤ ¦ ¦ ¦ ¤ This index is on the functions and operators introduced in this document.
¥ %
% &
& & (
(
&
%
( (
P
&
( (
7
W
(
P
&
8
P ¨ P
P
P
P
&
7
7
P 16
16
15
20
20
8
19 t
table
tan
text
title
ts
ts.plot &
7
7
( ¨ ¨ P %
&
%
%
%
P
P
7 var 15 which
which.max
which.min
while
write
write.table 16
15
15
26
10
10 ( & 7
&
7
7 ¨ % % 8
% 18
( W % % ¨ ¨ &
j j j
j &
%
j &
% &
P %
%
&
(
& ¨ 7
(
(
7 ¨ j ¨ 7 ¨ (
% ¨ 7 ¨
¨ & % ( S 7
W
P
W & P
& ) 18
18
20
8,15
25
20
8
23
11
20
15
15
15
8
23
5
6 P 20 P y ( ( layout
layout.show
legend
length
library
lines
list
lm
load
locator
log
log10
log2
logical
loglin
ls
ls.str % 15
15
13
16
13
13
13
9
9
10
8
20
11
24
15
13
13
13
13
13 7 & W range
rank
rbeta
rbind
rbinom
rcauchy
rchisq
read.cvs
read.cvs2
read.fwf
read.table
rect
rep
residuals
rev
rexp
rf
rgamma
rgeom
rhyper % ( 19
19 11
11
9
18
20
11
11
15
15
27
18
15
19
16
21
15
24
19 % qqnorm
qqplot save
save.image
scan
screen
segments
seq
sequence
sin
sort
source
split.screen
sqrt
stripplot
subset
substitute
sum
summary
sunflowerplot % S U 7 ¨ P
P
&
( 24
8
19
19
15
15
15 19
21
19
19
19
19
16
15
20
25
20
24
15 25
26
19 I
if
image pairs
par
persp
piechart
plot
plot.ts
pmax
pmin
points
poly
polygon
predict
prod % 7
6
19 help
help.start
hist 16
16
24
23
8 13
13
6
13
13
15
13
13
13
20
13
13
13
% 15
12
23 j ¥ S & gamma
gl
glm P & 26
27 ( for
function na.fail
na.omit
names
nlm
numeric rlnorm
rlogis
rm
rnbinom
rnorm
round
rpois
rsignrank
rt
rug
runif
rweibull
rwilcox & S 26
18
15
12 & S else
erase.screen
exp
expand.grid
expression & G 8
18
18
17
15
8
19 16
8,16
19
15
15
15
15
8 % ¥ F data.frame
dev.list
dev.set
diag
digamma
dim
dotplot match
matrix
matplot
max
mean
median
min
mode % ¥ F 16 S 11
16
8 F c
cbind
character
choose
coef
complex
contour
coplot
cor
cos
cov U 19
15
15
19 S barplot
beta
besselI
boxplot ¥ 20
15
23
23
6
8
20
15
15
15
20 abline
abs
anova
aov
apropos
array
arrows
asin
acos
atan
axis U 16
16
16
16 b cummax
cummin
cumprod
cumsum S 26
15
16
15
25
15,25
15,25
15,25
15
14
11,25
6
14
14
5
14
14
14
7
13
15,25
26
25 ¥ #
%%
%*%
%/%
%in%
*
+
/
!=
:
;
<
<=
<==
>
>=
? ^
{}
~ x11 31 P P ( 8 ¨ a ...
View
Full
Document
This note was uploaded on 11/17/2011 for the course STOR 664 taught by Professor Staff during the Fall '11 term at UNC.
 Fall '11
 Staff

Click to edit the document details