EXST7005 Fall2010 04a Probability distributions 01

EXST7005 Fall2010 - Statistical Methods I(EXST 7005 Page 14 SAS example#1a from Freund& Wilson(1997 Table 1.1 PROC UNIVARIATE DATA=HouseSales

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Methods I (EXST 7005) Page 14 SAS example (#1a) from Freund & Wilson (1997) Table 1.1 PROC UNIVARIATE DATA=HouseSales PLOT; VAR SP; TITLE4 'Proc Univariate of house sales price'; RUN; See SAS output for results Probability distributions PROBABILITY – a measure of the likelihood of the occurrence of some event An event can be any outcome (e.g. verbal, mathematical or graphical) Some rules of Probability If an event (A) is certain to occur, the probability is 1 (one, unity), so P(A) = 1 If the event is certain to NOT occur, the probability is 0 (zero, null), so P(A) = 0 The probability of an event will always range be between 0 and 1 (inclusive). 0 ≤ P(A) ≤ 1 The sum of the probability of all possible events, where the events are mutually exclusive, is one (1). Where a number of mutually exclusive events are denoted Ai, for i = 1, 2, ..., r, ΣP(Ai) = 1 when summed across all of the possible events Note that for truly continuous variables the probability of a given number is zero (0). The Binomial Distribution First example of a distribution A binomial distribution consists of a set of binomial observations or Bernoulli trials. These are observations with two possible, mutually exclusive, outcomes. Examples of binomial observations or Bernoulli trials, where each observation takes one of two possible values. • Have you taken EXST7005? Yes, No • A given fish is: Male, Female • A given fish is: Tagged, Untagged • A coin toss is: Heads, Tails James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 15 Our first experiment, toss three coins, each coin having two possible outcomes. For the three coins together there are 8 possible outcomes. We are interested in the distribution of the outcomes. Outcome 1 2 3 4 5 6 7 8 Coin 1 Tail Tail Tail Head Tail Head Head Head Coin 2 Tail Tail Head Tail Head Tail Head Head Coin 3 Tail Head Tail Tail Head Head Tail Head Frequency of heads 0 1 1 1 2 2 2 3 Note that each event is equally likely and mutually exclusive. Prepare a frequency table of the results. Number of Heads 0 1 2 3 Total frequency (f) 1 3 3 1 8 relative frequency (r.f.) 1/8 3/8 3/8 1/8 1 Probability P(0)=0.125 P(1)=0.375 P(2)=0.375 P(3)=0.125 1 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 Number of heads This chart represents the distribution of all possible outcomes of tossing 3 coins, each with a binomial outcome. This is the binomial distribution. Probability defined: If an event can occur in “n” mutually exclusive and equally likely ways, and if “m” of these ways hold the attribute “A”, the probability of the occurrence of “A” will be the ratio of “m” to “n”. Where A is some particular attribute n is the number of possible outcomes (Trials) m is the number of ways A can occur (Successes) Then P(A) = m / n , or the number of successes over the number of trials Example from our 3 coins, find the probability of event A where A is the attribute “2 heads”. n = 8 possible outcomes (HHH, HHT, HTH, THH, TTH, THT, HTT, TTT) James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 16 the outcomes are equally likely. m = 3 outcomes with the chosen attribute (3 heads) P(A) = m / n = 3/8 = 0.375 Working with Probabilities We will be working first with Probability Distributions, similar to the bar charts we examined earlier. The probability will be the proportion of the area under the graph between given limits. The probability is the relative frequency of occurrence of observations within the set limits. The Uniform Distribution Relative Frequency A uniform distribution is a distribution where every outcome has an equal probability of occurrence. Discrete Uniform (1,10) 0.1 0 1 2 3 4 5 6 Y value 7 8 9 10 We will consider two similar uniform distributions, discrete and continuous. Discrete Uniform Distribution (1, 10) Relative Frequency Uniform (1,10) 0.1 0 1 2 3 4 5 6 7 8 9 10 Y value The probability in each cell is 1/10 = 0.1. Continuous Uniform (0, 10): The probability between any two integers is 1/10 = 0.1 Relative Frequency Uniform (0,10) 0.1 0 0 1 2 3 4 5 Y value 6 7 8 9 10 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 17 Finding Probabilities from the DISCRETE Uniform Distribution Find P(Yi) > 5 , Note that 5 itself is excluded. Relative Frequency Uniform (1,10) 0.1 0 1 2 3 4 5 6 7 8 9 10 Y value Find P(Yi) ≥ 5, Now 5 is included. Relative Frequency Uniform (1,10) 0.1 0 1 2 3 4 5 6 7 8 9 10 Y value Find the following probabilities for a discrete Uniform (1,10) distribution. a) 2 ≤ P(Yi) ≤ 7 b) P(Yi) = 9 c) P(Yi) ≥ 9 d) P(Yi) > 10 a) 2 ≤ P(Yi) ≤ 7 b) P(Yi) = 9 Relative Frequency Finding Probabilities from the Uniform Distribution Uniform (1,10) 0.1 0 1 2 3 4 5 6 7 8 9 10 This is the probability of a single cell. c) P(Yi) ≥ 9 d) P(Yi) > 10 Relative Frequency Uniform (1,10) 0.1 0 1 2 3 4 5 6 7 8 9 10 Y value This probability is zero Find the following probabilities for a discrete Uniform (1,10) distribution. a) 2 ≤ P(Yi) ≤ 4 OR 6 ≤ P(Yi) ≤ 9 This type of statement is true if either of the two statements is true, so the individual probabilities are added b) 2 ≤ P(Yi) ≤ 5 AND 4 ≤ P(Yi) ≤ 9 This type of statement is true only if BOTH of the statements are true; we determine the area overlap between the two statements James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 18 Relative Frequency Finding Probabilities from the Uniform Distribution 2 ≤ P(Yi) ≤ 4 OR 6 ≤ P(Yi) ≤ 9 Uniform (1,10) 0.1 0 1 2 3 4 5 6 7 8 9 8 10 9 Relative Frequency Y value Uniform (1,10) 0.1 0 2 ≤ P(Yi) ≤ 5 AND 4 ≤ P(Yi) ≤ 9 1 2 3 4 5 6 7 10 Y value Finding Probabilities from the CONTINUOUS Uniform Distribution Find the following probabilities for a continuous Uniform (0,10) distribution. This is a little trickier because we don't just count cells; we consider the range between limits. a) 2 ≤ P(Yi) ≤ 7 b) P(Yi) = 9 c) P(Yi) ≥ 8 b) P(Yi) = 9 c) P(Yi) ≥ 8 Continuous Uniform (0,10) 0.1 0 0 1 2 3 4 5 6 Y value 7 8 9 10 This probability is zero Relative Frequency a) 2 ≤ P(Yi) ≤ 7 Relative Frequency The Continuous Uniform Distribution Continuous Uniform (0,10) 0.1 0 0 1 2 3 4 5 6 Y value 7 8 9 10 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 19 The Normal Distribution: N(μ, σ2) μ The equation for the Normal Distribution is: f ( y ) = 1 σ 2π (Y − μ ) 2 − 2σ 2 e Note that there are two separate and distinct parameters, mu (μ) and sigma (σ) Characteristics of the Normal Distribution For a variable distributed normally N(μ, σ2) • The distribution is symmetric about the mean • The distribution has only two parameters The probability that a random observation will fall within specified limits is given by the area under the curve between those limits. The middle 68% of the distribution is included in the interval μ ± 1σ 34% 34% 16% 16% μ-1σ μ μ+1σ The middle 95% of the distribution is included in the interval μ ± 1.96σ 2.5% 47.5% 47.5% μ-1.96σ μ 2.5% μ+1.96σ The middle 99% of the distribution is included in the interval μ ± 2.576σ James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 20 0.5% 49.5% μ-2.576σ 49.5% μ 0.5% μ+2.576σ Examples from the Normal Distribution The knowledge of the ranges on the previous pages allows us to make some probability statements from the normal distribution. Suppose we are examining the height (in inches) of adult males. For the particular population of interest, the mean is μ = 5' 10" = 70" The standard deviation, σ = 3" The middle 68% of the population of all individuals is between what limits? From out previous discussion we know that 68% fall between μ ± 1σ. μ ± 1σ = 70 ± 1(3) So, the lower limit is 70 – 3 = 67 and the upper limit is 70 + 3 = 73, and we can state that P(67 ≤ Y ≤ 73) = 0.68 16% 68% 16% 67 73 95% of individuals are included in what interval? From our previous discussion we know that 95% fall between μ ±1.96 σ. μ ± 1.96σ = 70 ± 1.96(3) So, the limits are 70 – 5.88 = 64.12 and 70 + 5.88 = 75.88 2.5% P(64.12 ≤ Y ≤ 75.88) = 0.95 95% 2.5% 75.88 64.12 99% of all individuals are included in what interval? From out previous discussion we know that 99% fall between μ ±2.576 σ. μ ± 2.576σ = 70 ± 2.576(3) So, the limits are 70 – 7.73 = 62.27 and 70 + 7.73 = 77.73 0.5% P(62.27 ≤ Y ≤ 77.73)=0.99 62.27 99% 0.5% 77.73 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 21 The empirical rule Sometimes refered to as the three sigma rule, it states that approximately 68% and 95% of the observations are within one and two standard deviations of the mean, respectively. Nearly all of the observations (99.74%) will be within 3 standard deviation units of the mean. Other distributions Previously mentioned were the Binomial (a discrete distribution) Where π is the true population probability of an event (estimated in a sample by “p”), and where n is the sample size; • Mean = n π (for a sample, Mean = np) • Variance = n π (1– π) (for a sample, Var = np(1–p)) note that the variance is less than the mean Uniform (can be either discrete, but most of our distributions will be continuous) • Mean = (Max + Min)/2 • Variance = 2 (Max–Min) /12 Normal (a continuous distribution) • Mean μ • Variance σ2 the variance and mean are two distinct parameters Poisson – a discrete distribution • Mean = λ • Variance = λ a single parameter describes both variance and the mean Negative binomial – a discrete distribution with a parameter k that provides an index of dispersion. • Mean = μ • Variance = μ + kμ2 the variance is greater than the mean Log normal – a continuous distribution. The logarithm of the values in this distribution are normally distributed. Standard normal – a normal distribution with mean = 0 and variance =1 The distributions that we will be most concerned with are the normal and the standard normal. James P. Geaghan Copyright 2010 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online