0.0
Relative
Frequency
0.005
0.015
Relative frequency histogram
(n=20)
0
20
50
80
160
240
Times
Note that when the y-axis is labelled with relative frequencies, the area under the histogram is
always
8
Begin
Formulate the research problem
Define population and sample
Collect the data
Do descriptive data analysis
Use appropriate statistical methods to solve the research problem
Report the results
E
Chapter 1
Introduction to Statistics
1.1
Introduction, examples and definitions
1.1.1 Introduction
We begin the module with some basic data analysis. Since Statistics involves the collection and
inter
Since we only have 10 categories, there is no need to amalgamate them.
For continuous data, the choice of categories is more arbitrary. We usually use 8 to 12 nonoverlapping consecutive intervals of e
3.6.1 Poisson as the limit of a binomial
Let X B(n, p). Put = E(X) = np and let n increase and p decrease so that remains constant.
n k
P(X = k) =
p (1 p)nk ,
k = 0, 1, 2, . . . , n.
k
Replacing p b
81
5. Like the mean and standard deviation, the correlation is strongly affected by few outlying observations. Use r with caution when outliers
appear in the scatterplot.
Y
Y
Example 11.3. What are th
so it is not a fixed characteristic of the population, and cannot be used to compare variability of
different sized samples.
Mean absolute deviation (M.A.D.)
This is the average absolute deviation fro
Therefore, the density of Z, which is usually denoted (z), is given by
1
1 2
(z) = exp z ,
< z < .
2
2
It is important to note that the PDF of the standard normal is symmetric about zero. The distrib
but for a continuous random quantity we have the continuous analogue
FX (x) = P(X x)
= P( X x)
=
Z x
fX (z) dz.
Just as in the discrete case, the distribution function is defined for all x IR, even if
Example
The histogram for Example 2 is as follows.
6
4
0
2
Frequency
8
10
Raw frequency histogram
(n=20)
0
40
80
120
160
200
240
Times
Note that here we have labelled the y-axis with the raw frequenci
and plotted graphically as follows.
0.10
0.00
0.05
Probability
0.15
Probability Mass Function
2
3
4
5
6
7
8
9
10
11
12
Value
3.1.3 Cumulative distribution functions (CDFs)
For any discrete random quan
if it has PDF
(
ex , x 0,
fX (x) =
0,
otherwise.
The distribution function, FX (x) is therefore given by
(
0,
x < 0,
FX (x) =
x
1 e , x 0.
The PDF and CDF for an Exp(1) are shown below.
PDF for X ~ Ex
by
Var (X) =
xSX
(x E(X)2 P(X = x) .
The variance is often denoted 2X , or even just 2 . Again, this is a known function of the probability
distribution. It is not random, and it is not the sample v
Chapter 4
Continuous Probability Models
4.1
Introduction, PDF and CDF
4.1.1 Introduction
We now a have a fairly good understanding of discrete probability models, but as yet we havent
developed any te
0.0
Relative
Frequency
0.004 0.008
0.012
Relative frequency histogram
(n=20)
0
40
80
120
160
200
240
Times
The y-axis values are chosen so that the area of each rectangle is the proportion of observat
where I j Bern(p), j = 1, 2, . . . , n, and the I j are mutually independent. So we then have
!
n
E(X) = E
Ij
j=1
n
=
E Ij
j=1
n
=
(expectation of a sum)
p
j=1
= np
and similarly,
n
Var (X) = Var
Ij
Example
Throw a die, and let X be the number showing. We have
SX = cfw_1, 2, 3, 4, 5, 6
and each value is equally likely. Now suppose that we are actually interested in the square of the
number showin
Example
For the light bulb lifetime, X, what is the median, upper and lower quartile of the distribution?
4.2
Properties of continuous random quantities
4.2.1 Expectation and variance of continuous ra
50
Example 7.3.
Ushaped
0
5
Frequency
10
Relative Frequency
15
Distribution of sample mean (n=100)
Low
High
Values of the Variable
0.40
0.45
0.50
0.55
0.60
Values of mean
Figure 12: U-shaped and Sampl
0.6
0.4
0.0
0.2
Cumulative probability
0.8
1.0
Cumulative Distribution Function
0
2
4
6
8
10
12
14
Value
It is clear that for any random variable X, for all x IR, FX (x) [0, 1] and that FX (x) 0 as
x
respectively. The expectation of a uniform random quantity is
E(X) =
=
Z
Z a
x fX (x) dx
x fX (x) dx +
Z b
a
x fX (x) dx +
Z
b
x fX (x) dx
Z b
x
dx + 0
= 0+
a ba
b
x2
=
2(b a) a
b2 a2
2(b a)
a+b
=
.
since then
Z
(
)
1
1 x 2
exp
fX (x) dx =
dx
2
2
Z
1
1 2
=
exp 2 z dz
2
2
r
1
=
2 1/22
1 p
=
22
2
= 1.
Z
(putting z = x )
Now we know that the given PDF represents a valid density, we can
Consequently we have
P(x X x + x) =
Z x+x
x
fX (y) dy
' fX (x)x,
P(x X x + x)
fX (x) '
x
(for small x)
and so we may interpret the PDF as
fX (x) = lim
x0
P(x X x + x)
.
x
Example
The manufacturer of
4.2.2 PDF and CDF of a linear transformation
Let X be a continuous random quantity with PDF fX (x) and CDF FX (x), and let Y = aX + b where
a > 0. What is the PDF and CDF of Y ? It turns out to be eas