This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Methods I (EXST 7005) Page 14 SAS example (#1a) from Freund & Wilson (1997) Table 1.1
PROC UNIVARIATE DATA=HouseSales PLOT; VAR SP;
TITLE4 'Proc Univariate of house sales price'; RUN;
See SAS output for results Probability distributions
PROBABILITY – a measure of the likelihood of the occurrence of some event
An event can be any outcome (e.g. verbal, mathematical or graphical) Some rules of Probability
If an event (A) is certain to occur, the probability is 1 (one, unity), so P(A) = 1
If the event is certain to NOT occur, the probability is 0 (zero, null), so P(A) = 0
The probability of an event will always range be between 0 and 1 (inclusive). 0 ≤ P(A) ≤ 1
The sum of the probability of all possible events, where the events are mutually exclusive, is
one (1).
Where a number of mutually exclusive events are denoted Ai, for i = 1, 2, ..., r, ΣP(Ai) = 1
when summed across all of the possible events
Note that for truly continuous variables the probability of a given number is zero (0). The Binomial Distribution
First example of a distribution
A binomial distribution consists of a set of binomial observations or Bernoulli trials.
These are observations with two possible, mutually exclusive, outcomes.
Examples of binomial observations or Bernoulli trials, where each observation takes one of
two possible values.
• Have you taken EXST7005? Yes, No • A given fish is: Male, Female • A given fish is: Tagged, Untagged • A coin toss is: Heads, Tails James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 15 Our first experiment, toss three coins, each coin having two possible outcomes. For the three
coins together there are 8 possible outcomes. We are interested in the distribution of the
outcomes.
Outcome 1 2 3 4 5 6 7 8 Coin 1 Tail Tail Tail Head Tail Head Head Head Coin 2 Tail Tail Head Tail Head Tail Head Head Coin 3 Tail Head Tail Tail Head Head Tail Head Frequency of heads 0 1 1 1 2 2 2 3 Note that each event is equally likely and mutually exclusive. Prepare a frequency table of
the results.
Number of Heads 0 1 2 3 Total frequency (f) 1 3 3 1 8 relative frequency (r.f.) 1/8 3/8 3/8 1/8 1 Probability P(0)=0.125 P(1)=0.375 P(2)=0.375 P(3)=0.125 1 3.5
3
2.5
2
1.5
1
0.5
0
0 1 2 3 Number of heads This chart represents the distribution of all possible outcomes of tossing 3 coins, each with a
binomial outcome. This is the binomial distribution.
Probability defined: If an event can occur in “n” mutually exclusive and equally likely ways, and
if “m” of these ways hold the attribute “A”, the probability of the occurrence of “A” will be
the ratio of “m” to “n”.
Where A is some particular attribute
n is the number of possible outcomes (Trials)
m is the number of ways A can occur (Successes)
Then P(A) = m / n , or the number of successes over the number of trials
Example from our 3 coins, find the probability of event A where A is the attribute “2
heads”.
n = 8 possible outcomes (HHH, HHT, HTH, THH, TTH, THT, HTT, TTT)
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 16 the outcomes are equally likely.
m = 3 outcomes with the chosen attribute (3 heads)
P(A) = m / n = 3/8 = 0.375 Working with Probabilities
We will be working first with Probability Distributions, similar to the bar charts we examined
earlier.
The probability will be the proportion of the area under the graph between given limits.
The probability is the relative frequency of occurrence of observations within the set limits. The Uniform Distribution Relative Frequency A uniform distribution is a distribution where every outcome has an equal probability of
occurrence. Discrete Uniform (1,10)
0.1 0 1 2 3 4 5 6
Y value 7 8 9 10 We will consider two similar uniform distributions, discrete and continuous.
Discrete Uniform Distribution (1, 10)
Relative Frequency Uniform (1,10)
0.1 0 1 2 3 4 5 6 7 8 9 10 Y value The probability in each cell is 1/10 = 0.1.
Continuous Uniform (0, 10): The probability between any two integers is 1/10 = 0.1 Relative Frequency Uniform (0,10)
0.1 0 0 1 2 3 4 5
Y value 6 7 8 9 10 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 17 Finding Probabilities from the DISCRETE Uniform Distribution Find P(Yi) > 5 , Note that 5 itself is excluded. Relative Frequency Uniform (1,10)
0.1 0 1 2 3 4 5 6 7 8 9 10 Y value Find P(Yi) ≥ 5, Now 5 is included. Relative Frequency Uniform (1,10)
0.1 0 1 2 3 4 5 6 7 8 9 10 Y value Find the following probabilities for a discrete Uniform (1,10) distribution.
a) 2 ≤ P(Yi) ≤ 7
b) P(Yi) = 9
c) P(Yi) ≥ 9
d) P(Yi) > 10 a) 2 ≤ P(Yi) ≤ 7
b) P(Yi) = 9 Relative Frequency Finding Probabilities from the Uniform Distribution
Uniform (1,10)
0.1 0 1 2 3 4 5 6 7 8 9 10 This is the probability of a single cell. c) P(Yi) ≥ 9
d) P(Yi) > 10 Relative Frequency Uniform (1,10)
0.1 0 1 2 3 4 5 6 7 8 9 10 Y value This probability is zero Find the following probabilities for a discrete Uniform (1,10) distribution.
a) 2 ≤ P(Yi) ≤ 4 OR 6 ≤ P(Yi) ≤ 9
This type of statement is true if either of the two statements is true, so the individual
probabilities are added
b) 2 ≤ P(Yi) ≤ 5 AND 4 ≤ P(Yi) ≤ 9
This type of statement is true only if BOTH of the statements are true; we determine the
area overlap between the two statements
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 18 Relative Frequency Finding Probabilities from the Uniform Distribution 2 ≤ P(Yi) ≤ 4 OR 6 ≤ P(Yi) ≤ 9 Uniform (1,10)
0.1 0 1 2 3 4 5 6 7 8 9 8 10 9 Relative Frequency Y value
Uniform (1,10)
0.1 0 2 ≤ P(Yi) ≤ 5 AND 4 ≤ P(Yi) ≤ 9 1 2 3 4 5 6 7 10 Y value Finding Probabilities from the CONTINUOUS Uniform Distribution
Find the following probabilities for a continuous Uniform (0,10) distribution.
This is a little trickier because we don't just count cells; we consider the range between limits.
a) 2 ≤ P(Yi) ≤ 7
b) P(Yi) = 9
c) P(Yi) ≥ 8 b) P(Yi) = 9 c) P(Yi) ≥ 8 Continuous Uniform (0,10)
0.1 0 0 1 2 3 4 5 6
Y value 7 8 9 10 This probability is zero
Relative Frequency a) 2 ≤ P(Yi) ≤ 7 Relative Frequency The Continuous Uniform Distribution Continuous Uniform (0,10)
0.1 0 0 1 2 3 4 5 6
Y value 7 8 9 10 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 19 The Normal Distribution: N(μ, σ2) μ
The equation for the Normal Distribution is: f ( y ) = 1
σ 2π (Y − μ ) 2
−
2σ 2
e Note that there are two separate and distinct parameters, mu (μ) and sigma (σ) Characteristics of the Normal Distribution
For a variable distributed normally N(μ, σ2)
• The distribution is symmetric about the mean • The distribution has only two parameters The probability that a random observation will fall within specified limits is given by the area
under the curve between those limits. The middle 68% of the distribution is included in the interval μ ± 1σ
34% 34% 16% 16% μ1σ μ μ+1σ The middle 95% of the distribution is included in the interval μ ± 1.96σ 2.5% 47.5% 47.5% μ1.96σ μ 2.5% μ+1.96σ The middle 99% of the distribution is included in the interval μ ± 2.576σ
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 20 0.5% 49.5% μ2.576σ 49.5% μ 0.5% μ+2.576σ Examples from the Normal Distribution
The knowledge of the ranges on the previous pages allows us to make some probability
statements from the normal distribution.
Suppose we are examining the height (in inches) of adult males. For the particular population
of interest, the mean is μ = 5' 10" = 70"
The standard deviation, σ = 3"
The middle 68% of the population of all individuals is between what limits?
From out previous discussion we know that 68% fall between μ ± 1σ. μ ± 1σ = 70 ± 1(3)
So, the lower limit is 70 – 3 = 67 and the upper limit is 70 + 3 = 73, and we can state that P(67 ≤ Y ≤ 73) = 0.68 16% 68% 16%
67 73 95% of individuals are included in what interval? From our previous discussion we know that
95% fall between μ ±1.96 σ. μ ± 1.96σ = 70 ± 1.96(3)
So, the limits are 70 – 5.88 = 64.12 and 70 + 5.88 = 75.88
2.5% P(64.12 ≤ Y ≤ 75.88) = 0.95 95% 2.5%
75.88 64.12 99% of all individuals are included in what interval? From out previous discussion we know
that 99% fall between μ ±2.576 σ. μ ± 2.576σ = 70 ± 2.576(3)
So, the limits are 70 – 7.73 = 62.27 and 70 + 7.73 = 77.73
0.5% P(62.27 ≤ Y ≤ 77.73)=0.99 62.27 99% 0.5%
77.73 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 21 The empirical rule
Sometimes refered to as the three sigma rule, it states that approximately 68% and 95% of
the observations are within one and two standard deviations of the mean, respectively.
Nearly all of the observations (99.74%) will be within 3 standard deviation units of the
mean. Other distributions
Previously mentioned were the
Binomial (a discrete distribution)
Where π is the true population probability of an event (estimated in a sample by “p”), and
where n is the sample size;
• Mean = n π (for a sample, Mean = np) • Variance = n π (1– π) (for a sample, Var = np(1–p))
note that the variance is less than the mean Uniform (can be either discrete, but most of our distributions will be continuous)
• Mean = (Max + Min)/2 • Variance = 2 (Max–Min) /12 Normal (a continuous distribution)
• Mean μ • Variance σ2
the variance and mean are two distinct parameters Poisson – a discrete distribution
• Mean = λ • Variance = λ
a single parameter describes both variance and the mean Negative binomial – a discrete distribution with a parameter k that provides an index of
dispersion.
• Mean = μ • Variance = μ + kμ2
the variance is greater than the mean Log normal – a continuous distribution.
The logarithm of the values in this distribution are normally distributed.
Standard normal – a normal distribution with mean = 0 and variance =1
The distributions that we will be most concerned with are the normal and the standard normal. James P. Geaghan Copyright 2010 ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.
 Fall '08
 Geaghan,J

Click to edit the document details