Descriptive methods 27
exponential function is given as
DEA
If A > 0 the numbers A
3901) = BENAla
are positive and
Z cram-A)? = exp(/\) 2 g = exp(/\) GXP()\) = 1.
n=0 n:
Hence (p(n)nEN can be regarded
34 Probability Theory
Figure 2.5: The logistic distribution function (left, see Example 2.6.4). The Gumbel
distribution function (right. see Example 2.6.5). Note the characteristic Sshape of
both di
Probability measures on the real line 33
We immediately observe that since (oo,y] U (y,:1:] = (oo,:r] for y <: a: and that
the sets (oo, y] and (y,:r:] are disjoint, the additive property implies that
28 Probability Theory
cfw_J 2 4 E B IEI D1234SE?BB1113151?19212325
cfw_1.243
1121]
0.15
[L15
Probability
[I 10
Probahllll'y
u w
lit-05
0.135
ELIE
D
n H
Figure 2.3: Example of a theoretical bar plot.
44 Probability Theory
sesses a density f. A histogram is then an approximation to the density f , thus it
is a function
A
JURA [0,DG).
When plotting this function the convention is, however. to plot r
Descriptive methods 31
If the observations are all realizations from the same probability measure P the
sample mean is an estimate of the (unknown) mean under P provided there is a
mean. Likewise, the
Probability measures on discrete sets 23
Denition 2.4.4. If E is discrete we call (p(a:)$EE a sector of point probabilities
indemed by E ifU S FEE) S 1 and
2 Mr) = 1. (2.8)
:tEE
We dene a probability
Probability measures on the real line 41
'JJ 11.5 1 .1
Figure 2.10: The density for the B-distributien (Example 2.6.12) with parameters
(A1, A2) = (4,2) (left) and (A1,)2) = (0.5. 3) (right)
Example
24 Probability Theory
R Box 2.4.1 (Vectors). A fundamental data structure in R is a vector of e.g.
integers or reals. A vector of real valued numbers can be typed in by a command
like
b x (1*- C(l.302
36 Probability Theory
The distribution function for such a probability measure is given. by
Pa) = f may.
The reader may be unfamiliar with doing integrations over an arbitrary event A. If
f is a conti
50 Probability Theory
R Box 2.7.2 (Kernel density estimation). The density function computes
the evaluation of the density estimate for a dataset in a nite number of
points. Default choice of kernel i
54 Probability Theory
LEI
[LE
[LIE
[L4
[LE
Ill]
1.3 1.2 2.0 2.1 2.2 1.9 2.13 2.1 2.2 2.3
Figure 2.15: The empirical distribution function for the log (base 2) expression levels
for the gene 4048U_s_
Probability measures on the real line 35
is given by
FEE) = P( -m,:cl) = 2296;)
sis
E " W E "-"'"'";.5'_w'H*HE9+9
Figure 2.6: The distribution function for the Poisson distribution, with A = 5 (left
Descriptive methods 49
Observe that by the denition of a kernel
oo . 1 7 oo
[mrJr 2100 (:t,:e):r
Popular choices of kernels include the Gaussian kernel
Edam) 1 EXP (M) i
and the Epanechnikov kernel
26 Probability Theory
R Box 2.4.2 (Biostrings). The package Biostrings comes as part of the
Bioconductor standard bundle of packages for R. With
.> library (Bi ostrings)
you get access to some represe
48 Probability Theery
(2.13). Rearranging that equation then says that fer small h :2 0
fan) '2 rst an: + a).
and if we then use P([:r h,i13 + h]) E s([:r h, a: + h]) frem the frequency inter
pretatie
Descriptive methods 29
R Box 2.5.1 (Bar plots). We can use either the standard plot function in
R or the barplot function to produce the bar plot. The theoretical bar plot
of the point probabilities f
42 Probability Theory
Figure 2.11: The density for the logistic distribution (left. see Example 2.6.14) and
the density for the Gumbel distribution (right. see Example 2.6.15). The density
for the G
Descriptive methods 53
on the quantilcs, which are more suitable. We dene the quantiles for the dataset
rst and then subsequently we dene quantiles for a theoretical distribution in terms
of its distr
Descriptive methods 4'7
literallyI
Figure 2.14: Histograms and examples of kernel density estimates (with a Gaussian
kernel) of the log (base 2) expression levels for the gene 1635_at from the ALL
m
32 Probability Theory
Exercise 2.5.7. Compute the mean, variance and standard deviation for the uniform
distribution on cfw_1, . . . , 977.
w Exercise 2.5.8. Show by using the denition of mean and var
40 Probability Theory
That is, f is constantly equal to 1/(b a) on [a b] and 0 outside. Then we nd that
ows: [Irma
libads
1
=1.
box(b o.)
Since f is clearly positive it is a density for a probabi
Probability measures on the real line 37
Figure 2.7: The density (left) and the distribution function (right) for the normal
distribution.
Example 2.6.8 (The Normal Distribution). The normal or Gaus
30 Probability Theory
3 E
E
|. | |
'3 s l-
D1234EEFB
E H] 11 IE
1W
It'll
60
Frequency
Frequency
40'
4E!
n n
Figure 2.4: Example of an empirical bar plot. This is a plot of the tabulated values
of 50
Descriptive methods 55
R Box 2.7.4 (Quantiles). If x is a numeric vector then
:> quantile (X)
computes the 0%, 25%, 50%, 75%, and 100% quantiles. That is, the mini-
mum, the lower quartile, the median
Descriptive methods 43
e Exercise 2.6.2. Argue that the function
F(:r) = 1 :rg:r_, :r 3 mg :1" D
for 6 I 0 is a distribution function on [art], so). It is called the Pareto distribution on
the interva
38 Probability Theory
and it is unfortunately not possible to give a (more) closed form expression for this
integral. It is, however, common usage to always denote this particular distribution
functio
Descriptive methods 45
Example 2.7.2. We consider the histogram of 100 and 1000 random variables whose
distribution is N(0. 1). The are generated by the computer. We choose the breaks
to be equidistan
52 Probability Theory
which is clearly seen to be symmetric, cf. Example 2.7.9. Moreover, with the substi
tution y = 3:2 / 2 we have that dy = soda, so
[jetsam = % [feetmay
l
= <DCJ.
x/
Thus by Exampl