Chapter 3
Validation
In many textbooks author will say something like &The waiting times between
buses arriving at a bus stop follows from an exponential distribution±. But.
..How
do we know this is true? It²s all good and well to say that a set of data follows a
particular distribution, but how do we show that it does? As in most statistical
tests, we cannot directly prove something to be true.
It is important to remember that hypothesis tests are not conclusive. They
test to see if a particular trend is present given both the data and a margin of
error. So how can we determine what distribution to use in modeling, knowing
that we will never e/ectively be able to prove that a particular set of data has
a given distribution? Consider the following set of completely random data.
Data Point
Value
1
1
4
2
1
2
3
3
4
4
1
Pretend that it isn²t painfully obvious that this data seems to follow a uni
form distribution with parameters (0,1). How can we examine the data in such
a way that we can pick a distribution that seems to ³t? The best solution is to
graph it and compare it to known distributions.
What is meant by the
Empircal CDF
? It is a cumulative distribution func
tion, which is determined entirely by the data set that it represents. Notice
that it is a step function taking values from 0 to 1. Even though we cannot
strictly prove that this is a Uniform(0,1) distribution, we can show that it likely
that it is. Given su¢ ciently many data points and a small enough margin of
error it becomes more reasonable to use certain models despite not necessarily
being the &right±one. There are two major tests that we have available for our
data. One assumes that our data is discrete and one assumes that the data is
continuous.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document3.1
&
2
Test for Goodness of Fit
For discrete models we can use the
&
2
Test for Goodness of Fit (sometimes
referred to as the
Pearson&s
&
2
Test
. If we have data which can be categorized
(into groups called
bins
), then we can use this test to compare a theoretical
distribution to that of a set of data. Consider the following example to illustrate
the concept of what a bin is, among other aspects of the test.
A study is shown which compares two subgroups of a population (males
and females) and looks at whether or not the members of that population are
currently engulfed in &ames. The surveying team found 351 participants and
recorded the following data:
Males
Females
Total
On Fire
104
73
137
Not on Fire
89
85
174
Total
193
158
351
The interest of this study is to determine whether or not being on ±re is
independent of one²s sex. Recall for hypothesis tests that it is common to
assume that something is true and the see the probability of that given the
data. For this experiment, we assume that the two are independent. Let the
Null Hypothesis be that these two are independent, and let the Alternative
Hypothesis be that sex and being on ±re are not independent.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Winter '12
 RILEY
 Normal Distribution, Null hypothesis, Statistical hypothesis testing, Cumulative distribution function, CDF

Click to edit the document details