Fitting distributions with R 3
1.0 Introduction
Fitting distributions consists in finding a mathematical function which represents in a good way a statistical
variable. A statistician often is facing with this problem: he has some observations of a quantitative character
x
1
, x
2
,… x
n
and he wishes to test if those observations, being a sample of an unknown population, belong
from a population with a pdf (probability density function) f(x,
θ
), where
θ
is a vector of parameters to
estimate with available data.
We can identify 4 steps in fitting distributions:
1)
Model/function choice: hypothesize families of distributions;
2) Estimate parameters;
3)
Evaluate quality of fit;
4)
Goodness of fit statistical tests.
This paper aims to face fitting distributions dealing shortly with theoretical issues and practical ones using
the statistical environment and language R
1
.
R is a language and an environment for statistical computing and graphics flexible and powerful. We are
going to use some R statements concerning graphical techniques (§ 2.0), model/function choice (§ 3.0),
parameters estimate (§ 4.0), measures of goodness of fit (§ 5.0) and most common goodness of fit tests (§
6.0).
To understand this work a basic knowledge of R is needed. We suggest a reading of “
An introduction to R
”
2
.
R statements, if not specified, are included in
stats
package.
2.0 Graphics
Exploratory data analysis can be the first step, getting descriptive statistics (mean, standard deviation,
skewness, kurtosis, etc.) and using graphical techniques (histograms, density estimate, ECDF) which can
suggest the kind of pdf to use to fit the model.
We can obtain samples from some pdf (such as gaussian, Poisson, Weibull, gamma, etc.) using R statements
and after we draw a histogram of these data. Suppose we have a sample of size n=100 belonging from a
normal population N(10,2) with mean=10 and standard deviation=2:
x.norm<-rnorm(n=200,m=10,sd=2)
We can get a histogram using
hist()
statement (Fig. 1):
hist(x.norm,main="Histogram of observed data")
1
R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL:
http://www.r-project.org
.