This preview shows pages 1–6. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STA 3032 (7661)
Engineering Statistics Rob Gordon
University of Florida Fall 2011 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 1 / 251 Introduction: Sampling & Descriptive Statistics
Deﬁnition
A population is the entire collection of objects or outcomes about which
information is sought.
Examples:
Entire United States
Entire State of Florida
All UF students Deﬁnition
A sample is a subset of the population, containing the objects or
outcomes that are actually observed. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 2 / 251 Introduction: Sampling & Descriptive Statistics
A common question might be: “How do I know if a sample is truly
representative of its population? ”
Ideally, the best way to accomplish this goal is to select the members of
the sample in the most unbiased possible way. Throughout the rest of this
course we will assume our samples will follow the deﬁnition of a simple
random sample: Deﬁnition
A simple random sample of size n is a sample chosen by a method in
which each collection of N population items is equally likely to comprise a
sample (as in a lottery). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 3 / 251 Introduction: Sampling & Descriptive Statistics
“Summary Statistics” help the important features of a sample stand out. Deﬁnition
Let x1 , x2 , . . . , xn denote numbers in a sample. The value of the sample
mean is
n
1
x=
¯
xi .
n
i =1 Deﬁnition
The value of the sample variance
1
s=
n−1 n 1
(xi − x ) =
¯
n−1 n 2 2 i =1 ¯
xi2 − nx 2
i =1 Deﬁnition √ The value of the sample standard deviation is s =
Rob Gordon (University of Florida) STA 3032 (7661) s 2.
Fall 2011 4 / 251 Introduction: Sampling & Descriptive Statistics
Deﬁnition
Outliers are points in the sample that are much smaller/larger than the
rest.
Outliers often result from data entry errors (e.g. incorrect decimal place)
and can present many problems for statisticians (more on this later).
Caution: Only delete an outlier if it exists due to error! Deﬁnition
The sample median is numerical value separating the higher half of a
sample from the lower half.
x=
˜ Rob Gordon (University of Florida) 1
2 x n+1 ,
2
xn/2 + xn/2+1 , STA 3032 (7661) if n is odd
if n is even. Fall 2011 5 / 251 Introduction: Sampling & Descriptive Statistics
The median divides the sample in halves, while quartiles divide the data
into quarters.
Let Q1 = 1st quartile = # greater than 25% of all data points.
Q2 = 2nd quartile = # greater than 50% of all data points.
Q3 = 3rd quartile = # greater than 75% of all data points. Note: Sometimes quartiles are not numbers in the sample. Deﬁnition
Q1 = Rob Gordon x0.25(n+1)
avg of values above and below (University of Florida) STA 3032 (7661) if 0.25(n + 1) is an integer
otherwise. Fall 2011 6 / 251 Introduction: Sampling & Descriptive Statistics
Example 1
Sample = { 1, 2, 3, 4, 5, 6, 7}
Q1 = x0.25(7+1) = x2 = 2 Example 2
Sample = {1, 2, 3, 4, 5, 6, 7, 8, 9}
Q1 = x0.25(10) = x2.5
1
1
=
(x2 + x3 ) = (2 + 3) = 2.5
2
2 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 7 / 251 Introduction: Sampling & Descriptive Statistics
Similarly, Deﬁnition
Q2 = median
x0.75(n+1)
avg of values above and below Q3 = if 0.75(n + 1) is an integer
otherwise. We are not restricted to 25, 50 and 75%. Deﬁnition
The pth percentile of a sample, for a number p between 0 and 100 divides
the sample so that as nearly as possible p % of the sample values are less
than the p th percentile, and is calculated as x(p/100)(n+1) . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 8 / 251 Introduction: Sampling & Descriptive Statistics
Sometimes its nice to get a visual picture of the data.
Stem & Leaf Plot
⇒ Each item is divided into 2 parts:
1 Stem: Leftmost 1 or 2 (usually 1) digit(s). 2 Leaf: Consists of next digit Example
Sample = {400, 410, 411, 550, 600, 612, 613}
Stem (hundreds)
4
4
5
6 Rob Gordon (University of Florida) STA 3032 (7661) Leaf
0
11
5
011 Fall 2011 9 / 251 Introduction: Sampling & Descriptive Statistics
Boxplots
Graphs presenting median, Q1 , Q3 and outliers.
We previously deﬁned outliers as really big or really small. What do we
mean by really big or really small? Deﬁnition
Interquartile Range (IQR) = Q3 − Q1 . Deﬁnition
An outlier is any point in the sample that is either
1.5 x IQR above Q3 or
1.5 x IQR below Q1 . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 10 / 251 Introduction: Sampling & Descriptive Statistics 21 22 23 24 25 26 27 Sample Boxplot Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 11 / 251 Introduction: Sampling & Descriptive Statistics The previous graph was made using R (http://www.rproject.org/). For
your convenience, the code used to generate the plot from the previous
slide is below:
> x = rnorm(100, 24)
> boxplot(x, main=”Sample Boxplot”, ﬁle = ”boxplot.pdf”)
You will never be tested on the speciﬁcs of R code. Slides like these are
provided only as a convenience. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 12 / 251 Introduction: Sampling & Descriptive Statistics There are other ways to visually represent data. Some examples are
Dot Plots
Histograms
Please read chapter one of your textbook. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 13 / 251 Chapter 4: Probability
Pierre Simon Laplace:The most important questions of life are indeed,
for the most part, really only problems of probability. Deﬁnition
An experiment is a process whose outcomes cannot be predicted in
advance with absolute certainty. Deﬁnition
The set of all possible outcomes of an experiment is called the sample
space of the experiment. The sample space is often denoted by S. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 14 / 251 Chapter 4: Probability
Deﬁnition
The empty set or null set is a set containing zero elements. You may see
it denoted as {} or ∅. Deﬁnition
A subset of a sample space is called an event.
Example: Roll one 6sided die. S = {1, 2, 3, 4, 5, 6}.
Let E be the event I roll an even number.
E = {2, 4, 6}.
Is the empty set an event?
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 15 / 251 Chapter 4: Probability
Deﬁnition
The empty set or null set is a set containing zero elements. You may see
it denoted as {} or ∅. Deﬁnition
A subset of a sample space is called an event.
Example: Roll one 6sided die. S = {1, 2, 3, 4, 5, 6}.
Let E be the event I roll an even number.
E = {2, 4, 6}.
Is the empty set an event? Yes, since the empty set is a subset of every set.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 15 / 251 Set Theory
To understand probability on a basic level some discussion about set
theory is needed. Some deﬁnitions: Deﬁnition
A set is a list or collection of objects. Deﬁnition
Let A and B be two arbitrary events deﬁned on the same sample space.
The union of A and B , denoted A ∪ B , is an event containing all elements
of both A and B .
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}, A = {1, 2}
E ∪A= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 16 / 251 Set Theory
To understand probability on a basic level some discussion about set
theory is needed. Some deﬁnitions: Deﬁnition
A set is a list or collection of objects. Deﬁnition
Let A and B be two arbitrary events deﬁned on the same sample space.
The union of A and B , denoted A ∪ B , is an event containing all elements
of both A and B .
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}, A = {1, 2}
E ∪ A = {1, 2, 4, 6} Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 16 / 251 Set Theory
Deﬁnition
Let A and B be two arbitrary events deﬁned on the same sample space.
The intersection of A and B , denoted A ∩ B , or AB , is the set of
outcomes that belong to both A and B .
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}, A = {1, 2}
E ∩A= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 17 / 251 Set Theory
Deﬁnition
Let A and B be two arbitrary events deﬁned on the same sample space.
The intersection of A and B , denoted A ∩ B , or AB , is the set of
outcomes that belong to both A and B .
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}, A = {1, 2}
E ∩ A = {2} Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 17 / 251 Set Theory
Deﬁnition
Let A and B be two arbitrary events deﬁned on the same sample space.
The intersection of A and B , denoted A ∩ B , or AB , is the set of
outcomes that belong to both A and B .
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}, A = {1, 2}
E ∩ A = {2} Deﬁnition
Let A be an event deﬁned on some sample space. The complement of an
¯
event A, denoted by A (also Ac and A ), is the set of outcomes in the
sample space not belonging to A.
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}
¯
E=
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 17 / 251 Set Theory
Deﬁnition
Let A and B be two arbitrary events deﬁned on the same sample space.
The intersection of A and B , denoted A ∩ B , or AB , is the set of
outcomes that belong to both A and B .
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}, A = {1, 2}
E ∩ A = {2} Deﬁnition
Let A be an event deﬁned on some sample space. The complement of an
¯
event A, denoted by A (also Ac and A ), is the set of outcomes in the
sample space not belonging to A.
Example: S = {1, 2, 3, 4, 5, 6}, E = {2, 4, 6}
¯
E = {1, 3, 5}
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 17 / 251 Set Theory
The previous deﬁnitions are examples of set operations. Venn Diagrams
help illustrate these: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 18 / 251 Set Theory
Deﬁnition
Let A and B be two events deﬁned on a sample space. A and B are said to
be mutually exclusive if they have no outcomes in common, i.e. AB = ∅. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 19 / 251 Probability
Deﬁnition
The probability of an event is a quantitative measure of how likely an
event is to occur.
Given an experiment and some event A, deﬁned on a sample space:
P (A) denotes the probability that event A occurs.
P (A) is the proportion of times event A would occur in the long run,
if the experiment is repeated over and over again.
Consider a regular twosided coin. If I ﬂip it 10 times, how many heads will
I get? What if I ﬂip it 100 times? 1000? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 20 / 251 Probability
Let S denote the sample space.
Axioms of Probability
1 P (S ) = 1. 2 For any event A, 0 ≤ P (A) ≤ 1. 3 If A and B are mutually exclusive events, P (A ∪ B ) = P (A) + P (B ). From these axioms we can say
P (Ac ) = 1 − P (A)
P (∅) = 0
Why? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 21 / 251 Probability
What if A and B are not mutually exclusive? Theorem
Given two events A and B deﬁned on some sample space S,
P (A ∪ B ) = P (A) + P (B ) − P (A ∩ B ). (1) The theorem above can be proven using set theory, but it is quicker to use
a Venn Diagram to think it through. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 22 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 P (either E or T or both) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 P (either E or T or both) = P (E ∪ T ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 P (either E or T or both) = P (E ∪ T ) = P (E ) + P (T ) − P (E ∩ T )
= 0.10 + 0.02 − 0.01 = 0.11 2 P (neither E nor T) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 2 P (either E or T or both) = P (E ∪ T ) = P (E ) + P (T ) − P (E ∩ T )
= 0.10 + 0.02 − 0.01 = 0.11
P (neither E nor T) = P ((E ∪ T )c ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 P (either E or T or both) = P (E ∪ T ) = P (E ) + P (T ) − P (E ∩ T )
= 0.10 + 0.02 − 0.01 = 0.11 2 P (neither E nor T) = P ((E ∪ T )c ) = 1 − P (E ∪ T ) = 1 − 0.11 = 0.89 3 P (E but not T) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 P (either E or T or both) = P (E ∪ T ) = P (E ) + P (T ) − P (E ∩ T )
= 0.10 + 0.02 − 0.01 = 0.11 2 P (neither E nor T) = P ((E ∪ T )c ) = 1 − P (E ∪ T ) = 1 − 0.11 = 0.89 3 P (E but not T) = P (E ∩ T c ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 P (either E or T or both) = P (E ∪ T ) = P (E ) + P (T ) − P (E ∩ T )
= 0.10 + 0.02 − 0.01 = 0.11 2 P (neither E nor T) = P ((E ∪ T )c ) = 1 − P (E ∪ T ) = 1 − 0.11 = 0.89 3 P (E but not T) = P (E ∩ T c ) = P (E ) + P (T c ) − P (E ∪ T c ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 Probability
Example: Let E be the event a new car requires engine work, and T be
the event that it requires transmission work. Assume that P (E ) = 0.10,
P (T ) = 0.02, P (E ∩ T ) = 0.01. Find the probability that the car needs:
1 either E or T or both. 2 neither E nor T . 3 E but not T . Answers:
1 2
3 P (either E or T or both) = P (E ∪ T ) = P (E ) + P (T ) − P (E ∩ T )
= 0.10 + 0.02 − 0.01 = 0.11
P (neither E nor T) = P ((E ∪ T )c ) = 1 − P (E ∪ T ) = 1 − 0.11 = 0.89
P (E but not T) = P (E ∩ T c ) = P (E ) + P (T c ) − P (E ∪ T c )
= 0.10 + 0.98+??? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 23 / 251 How do we calculate P (E ∪ T c )?
First think about what we mean by the union of E and T c . Remember
that they are just two symbols representing sets.
The union is a set containing all elements found in either E or T c . If we
were to draw what that looks like in a Venn diagram, we get this: The red area is the part we’re concerned with. Notice that it contains all
that is not T in addition to all of E .
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 24 / 251 How do we calculate P (E ∪ T c )? P (E ∪ T c ) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 25 / 251 How do we calculate P (E ∪ T c )? P (E ∪ T c ) = 1 − P (T ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 25 / 251 How do we calculate P (E ∪ T c )? P (E ∪ T c ) = 1 − P (T ) +P (E ∩ T ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 25 / 251 How do we calculate P (E ∪ T c )? P (E ∪ T c ) = 1 − P (T ) +P (E ∩ T )
Now put it all together. We get:
P (E ∩ T c ) = P (E ) + P (T c ) − P (E ∪ T c )
= 0.1 + 0.98 − (1 − P (T ) + P (T ∩ E ))
= 0.1 + 0.98 − (1 − 0.02 + 0.01) = 0.09.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 25 / 251 Calculating Probabilities
Let A be an event deﬁned on some sample space S . Then the classical
deﬁnition of probability is the following:
P (A) = # of elements in A
# of ways A occurs
=
# of possible outcomes
# of elements in S Sometimes it is diﬃcult (and often tedious to list the items in an event.
Its easier instead to count the number of ways the event occurs. Homework: Sections 4.1 and 4.2, all oddnumbered problems (not to be
handed in).
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 26 / 251 Counting Methods
The Fundamental Principle of Counting Theorem
Assume k operations are performed. If there are n1 ways to perform the
1st operation, n2 ways to perform the second, . . . , nk ways to perform the
kth operation, then the total number of ways to perform the sequence of k
operations is
k ni = n1 · n2 · · · nk .
i =1 Example: How many ways can I ﬂip a coin and roll a sixsided die? How
many ways can I ﬂip a head and roll an even number? What is the
probability that I ﬂip a head and roll an even number?
Answer: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 27 / 251 Counting Methods
The Fundamental Principle of Counting Theorem
Assume k operations are performed. If there are n1 ways to perform the
1st operation, n2 ways to perform the second, . . . , nk ways to perform the
kth operation, then the total number of ways to perform the sequence of k
operations is
k ni = n1 · n2 · · · nk .
i =1 Example: How many ways can I ﬂip a coin and roll a sixsided die? How
many ways can I ﬂip a head and roll an even number? What is the
probability that I ﬂip a head and roll an even number?
Answer: (2 ways to ﬂip a coin)(6 ways to roll a die) = 12 ways to do both
(1 way to ﬂip a head)(3 ways to roll an even number) = 3 ways to do both
P (ﬂip head & roll even number) = 3/12 = 1/4.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 27 / 251 Counting Methods
Sometimes an Event is described as the number of ways a collection of
objects is arranged. Deﬁnition
A permutation is an ordering of a collection of objects. Example: There are 6 permutations of letters ABC.
ABC ACB BAC BCA CAB CBA
What if we have more than 3 objects we want to arrange? What if we
have 1000? It is signiﬁcantly more diﬃcult to list all the possibilities. We
can derive a formula using the Fundamental Principle of Counting. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 28 / 251 Counting Methods How many permutations exist for a collection of n objects? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? n Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? n
How many ways can you place an object in the second place? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? n
How many ways can you place an object in the second place? n − 1 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? n
How many ways can you place an object in the second place? n − 1
.
.
.
How many ways can you place an object in the last place? 1 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? n
How many ways can you place an object in the second place? n − 1
.
.
.
How many ways can you place an object in the last place? 1
Using the fundamental theorem of counting, how many permutations do
we have?
Answer: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods How many permutations exist for a collection of n objects?
Think about the problem as placing n objects in n places.
How many ways can you place an object in the ﬁrst place? n
How many ways can you place an object in the second place? n − 1
.
.
.
How many ways can you place an object in the last place? 1
Using the fundamental theorem of counting, how many permutations do
we have?
Answer: n! Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 29 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? 12 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? 12
How many ways can a coach assign a player from his roster to the second
starting spot? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? 12
How many ways can a coach assign a player from his roster to the second
starting spot? 11 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? 12
How many ways can a coach assign a player from his roster to the second
starting spot? 11
.
.
.
How many ways can a coach assign a player from his roster to the 5th
starting spot? 8 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? 12
How many ways can a coach assign a player from his roster to the second
starting spot? 11
.
.
.
How many ways can a coach assign a player from his roster to the 5th
starting spot? 8
Then the number of starting lineup permutations are (12)(11)(10)(9)(8). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
How many ways can we order k objects from n total?
Example: A basketball coach needs to choose 5 players from a roster of
12 to be starters. The coach is awful and wants to randomly choose a
starting lineup by picking 5 numbers out of a hat. Also assume that order
matters so that the 1st player chosen plays point guard, the 2nd player is
shooting guard, etc. How many starting lineup permutations are there?
Consider a similar method to how we answered the last question:
How many ways can a coach assign a player from his roster to the ﬁrst
starting spot? 12
How many ways can a coach assign a player from his roster to the second
starting spot? 11
.
.
.
How many ways can a coach assign a player from his roster to the 5th
starting spot? 8
Then the number of starting lineup permutations are (12)(11)(10)(9)(8).
Based on this reasoning, can we come to some conclusion for general n
and k ?
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 30 / 251 Counting Methods
Deﬁnition
A permutation is the number of ordered arrangements, of k objects
selected from n distinct objects (k ≤ n). It is given by
nPr = Rob Gordon (University of Florida) n!
(n − k )! STA 3032 (7661) (2) Fall 2011 31 / 251 Counting Methods
Deﬁnition
A permutation is the number of ordered arrangements, of k objects
selected from n distinct objects (k ≤ n). It is given by
nPr = Notice that
(12)(11)(10)(9)(8) = Rob Gordon (University of Florida) n!
(n − k )! (12)(11)(10)(9)(8)(7)(6)(5)(4)(3)(2)(1)
(7)(6)(5)(4)(3)(2)(1) STA 3032 (7661) (2) = 12!
7! = 12!
(12−5)! Fall 2011 31 / 251 Counting Methods
Deﬁnition
A permutation is the number of ordered arrangements, of k objects
selected from n distinct objects (k ≤ n). It is given by
nPr = Notice that
(12)(11)(10)(9)(8) = n!
(n − k )! (12)(11)(10)(9)(8)(7)(6)(5)(4)(3)(2)(1)
(7)(6)(5)(4)(3)(2)(1) (2) = 12!
7! = 12!
(12−5)! Remember, permutations require that the order of objects is of particular
importance. What if we want to pick objects with no regard to order? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 31 / 251 Counting Methods
Deﬁnition
Each distinct group of objects that can be selected, without regard to
order, is called a combination.
Back to our basketball coach example: How many combinations of 5 from
12 can we choose? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 32 / 251 Counting Methods
Deﬁnition
Each distinct group of objects that can be selected, without regard to
order, is called a combination.
Back to our basketball coach example: How many combinations of 5 from
12 can we choose?
For each group of 5, we only count it once, regardless of how many
permutations exist. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 32 / 251 Counting Methods
Deﬁnition
Each distinct group of objects that can be selected, without regard to
order, is called a combination.
Back to our basketball coach example: How many combinations of 5 from
12 can we choose?
For each group of 5, we only count it once, regardless of how many
permutations exist.
# permutations of 5 from 12
So # of combinations of starting 5 =
# of permutations of 5 from 5
Based on this reasoning, can we come to some conclusion for general n
and k ? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 32 / 251 Counting Methods
Deﬁnition
Combination is the number of distinct subsets, or combinations, of size k
that can be selected from n distinct objects (k ≤ n). It is given by
n
k = n!
k !(n − k )! (3) What if we need to partition 3 or more groups instead of 2? Deﬁnition
The number of ways of partitioning n distinct objects into k groups
containing n1 , n2 , . . . , nk objects, respectively, is
n!
where
n1 !n2 ! · · · nk ! Rob Gordon (University of Florida) STA 3032 (7661) k ni = n. (4) i =1 Fall 2011 33 / 251 Counting Methods
To Review
If we need to count the number of ways to arrange objects from a larger
set of objects:
Use permutations if order matters.
Use combinations if order doesn’t matter.
Use partitioning if we want combinations for groups of 3 or more. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 34 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 35 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: P (1 each) = Rob Gordon (University of Florida) # ways to pick 1 each
# ways to pick any 3 STA 3032 (7661) Fall 2011 35 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: P (1 each) = Rob Gordon (University of Florida) # ways to pick 1 each
(# 1J)(# 1R)(# 1M)
=
# ways to pick any 3
# any 3 STA 3032 (7661) Fall 2011 35 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: P (1 each) =
= Rob Gordon (University of Florida) # ways to pick 1 each
(# 1J)(# 1R)(# 1M)
=
# ways to pick any 3
# any 3
10
1 5
1
21
3 6
1 STA 3032 (7661) Fall 2011 35 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: P (1 each) =
= # ways to pick 1 each
(# 1J)(# 1R)(# 1M)
=
# ways to pick any 3
# any 3
10
1 5
1
21
3 6
1 = (10)(5)(6)
(21)(20)(19)
(3)(2) = ··· 30
= 0.23
133 P (2R + 1M) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 35 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: P (1 each) =
= P (2R + 1M) = Rob Gordon (University of Florida) # ways to pick 1 each
(# 1J)(# 1R)(# 1M)
=
# ways to pick any 3
# any 3
10
1 10
0 5
1
21
3 6
1 5
2
21
3 6
1 = (10)(5)(6)
(21)(20)(19)
(3)(2) STA 3032 (7661) = ··· 30
= 0.23
133 Fall 2011 35 / 251 Counting Methods
Example: A friend owns 10 Jazz CDs, 5 rap CDs, 6 Metal CDs. He picks
3 CDs to bring on a car ride in a completely random way.
1 What is the probability that he brings 1 of each? 2 What is the probability he brings 2 Rap and 1 Metal? Answer: P (1 each) =
= P (2R + 1M) = Rob Gordon (University of Florida) # ways to pick 1 each
(# 1J)(# 1R)(# 1M)
=
# ways to pick any 3
# any 3
10
1 10
0 5
1
21
3 6
1 5
2
21
3 6
1 = = (10)(5)(6)
(21)(20)(19)
(3)(2)
(5)(4)
2 30
= 0.23
133 (6) (21)(20)(19)
(3)(2) STA 3032 (7661) = ··· = ··· 6
= 0.05
133 Fall 2011 35 / 251 Counting Methods Homework: Section 4.3  all (not to be handed in). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 36 / 251 Conditional Probability & Independence
Recall that when we ﬁnd a probability, we do it in terms of all possible
outcomes (i.e. in reference to the entire sample space.)
If we know some additional information, we eﬀectively reduce the sample
space. P (A) = Area of A , but what if we are told that B has already occurred?
Area of S
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 37 / 251 Conditional Probability & Independence If B occurs, we may want to modify our calculation of A’s probability.
In this case we are only concerned about the instances when A occurs at
the same time that B occurs. We’ll keep track of the occurrence of B
when we write: Deﬁnition
Conditional Probability: P (AB ) = P (A∩B )
P (B ) . Note: Read P (AB ) as P (A given B ).
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 38 / 251 Conditional Probability & Independence
From the deﬁnition of conditional probability we deﬁne the notion of
independence: Deﬁnition
Two events A and B are independent if the probability of each event
remains the same whether or not the other occurs, i.e. A and B are
independent if P (B A) = P (B ) and P (AB ) = P (A).
Caution: Independence does not mean mutually exclusive. Why?
As a consequence of A and B being independent, we can say
P (A ∩ B ) = P (A)P (B ).
Why? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 39 / 251 Conditional Probability & Independence
From the deﬁnition of conditional probability we deﬁne the notion of
independence: Deﬁnition
Two events A and B are independent if the probability of each event
remains the same whether or not the other occurs, i.e. A and B are
independent if P (B A) = P (B ) and P (AB ) = P (A).
Caution: Independence does not mean mutually exclusive. Why?
As a consequence of A and B being independent, we can say
P (A ∩ B ) = P (A)P (B ).
Why? P (A∩B )
P (B ) Rob Gordon = P (AB ) (University of Florida) STA 3032 (7661) Fall 2011 39 / 251 Conditional Probability & Independence
From the deﬁnition of conditional probability we deﬁne the notion of
independence: Deﬁnition
Two events A and B are independent if the probability of each event
remains the same whether or not the other occurs, i.e. A and B are
independent if P (B A) = P (B ) and P (AB ) = P (A).
Caution: Independence does not mean mutually exclusive. Why?
As a consequence of A and B being independent, we can say
P (A ∩ B ) = P (A)P (B ).
Why? P (A∩B )
P (B ) Rob Gordon = P (AB ) = P (A) (University of Florida) STA 3032 (7661) Fall 2011 39 / 251 Conditional Probability & Independence
From the deﬁnition of conditional probability we deﬁne the notion of
independence: Deﬁnition
Two events A and B are independent if the probability of each event
remains the same whether or not the other occurs, i.e. A and B are
independent if P (B A) = P (B ) and P (AB ) = P (A).
Caution: Independence does not mean mutually exclusive. Why?
As a consequence of A and B being independent, we can say
P (A ∩ B ) = P (A)P (B ).
Why? P (A∩B )
P (B ) Rob Gordon = P (AB ) = P (A)
⇒ P (A ∩ B ) = P (A)P (B ) (University of Florida) STA 3032 (7661) Fall 2011 39 / 251 Conditional Probability & Independence
Back to a previous example: What is the probability of ﬂipping a coin and
seeing a Head, and rolling a 6sided die and getting an even number? P (ﬂip head & roll even) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 40 / 251 Conditional Probability & Independence
Back to a previous example: What is the probability of ﬂipping a coin and
seeing a Head, and rolling a 6sided die and getting an even number? P (ﬂip head & roll even) = P (ﬂip head)P (roll even)
Since the two outcomes are independent Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 40 / 251 Conditional Probability & Independence
Back to a previous example: What is the probability of ﬂipping a coin and
seeing a Head, and rolling a 6sided die and getting an even number? P (ﬂip head & roll even) = P (ﬂip head)P (roll even)
Since the two outcomes are independent
1
1
1
=
=
2
4
4 Homework: Section 4.4  all oddnumbered problems (do not hand
in)
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 40 / 251 More consequences of Conditional Probability
Slight problem: It is often diﬃcult to know P (A ∩ B ). If this is the case
and A and B are dependent, how do we calculate conditional probabilities?
Solution: Notice that P (AB ) = P (A∩B )
P (B ) and P (B A) = P (A∩B )
P (A) . We can manipulate these formulas. Replace P (A ∩ B ) with P (B A)P (A)
and say
P (AB ) = P (B A)P (A)
.
P (B ) We can take this even further. What if we don’t know P (B )? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 41 / 251 The Law of Total Probability and Bayes’ Rule
Consider the following example: In a factory many machines make
lightbulbs. Each machine makes the same lightbulb and each machine
makes a certain % of defective lightbulbs. Let Ai be the event that a
lightbulb is from machine i . Let B be the event a defective item is
produced. Then the Venn Diagram looks like this: The entire sample space is partitioned according to the number of
machines (Ai s ) in the factory (10 in this case). The blue oval represents
the event (we’ll call it B ) that a lightbulb from a machine is defective.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 42 / 251 The Law of Total Probability and Bayes’ Rule Intuitively we see that
P (B ) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 43 / 251 The Law of Total Probability and Bayes’ Rule Intuitively we see that
10 P (B ) = P (Pieces of B ) =
j =1 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 43 / 251 The Law of Total Probability and Bayes’ Rule Intuitively we see that
10 P (B ) = 10 Rob Gordon P (Aj ∩ B ) P (Pieces of B ) =
j =1 (University of Florida) (Law of Total Probability) j =1 STA 3032 (7661) Fall 2011 43 / 251 The Law of Total Probability and Bayes’ Rule Intuitively we see that
10 P (B ) = 10 P (Aj ∩ B ) P (Pieces of B ) =
j =1 (Law of Total Probability) j =1 10 P (B Aj )P (Aj ) =
j =1
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 43 / 251 The Law of Total Probability and Bayes’ Rule
Now consider what we’ve done so far. We have shown Deﬁnition
Bayes’ Rule: P (Ai B ) = P (B Ai )P (Ai )
n
j =1 P (B Aj )P (Aj ) (5) Proof: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 44 / 251 The Law of Total Probability and Bayes’ Rule
Now consider what we’ve done so far. We have shown Deﬁnition
Bayes’ Rule: P (Ai B ) = P (B Ai )P (Ai )
n
j =1 P (B Aj )P (Aj ) (5) Proof: Given the past few slides, we can rewrite our previous formula:
P (Ai B ) =
= Rob Gordon P (Ai ∩ B )
P (B )
P (B Ai )P (Ai )
P (B ) (University of Florida) (Def. of Conditional Probability) STA 3032 (7661) Fall 2011 44 / 251 The Law of Total Probability and Bayes’ Rule
Now consider what we’ve done so far. We have shown Deﬁnition
Bayes’ Rule: P (Ai B ) = P (B Ai )P (Ai )
n
j =1 P (B Aj )P (Aj ) (5) Proof: Given the past few slides, we can rewrite our previous formula:
P (Ai B ) =
=
= Rob Gordon P (Ai ∩ B )
P (B )
P (B Ai )P (Ai )
(Def. of Conditional Probability)
P (B )
P (B Ai )P (Ai )
(Law of Total Probability)
P (B Aj )P (Aj ) (University of Florida) STA 3032 (7661) Fall 2011 44 / 251 Bayes’ Rule Example:
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 45 / 251 Bayes’ Rule Example:
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer: Another way to ask the question is: “What is the probability the
lightbulb is from M1 , given it is defective?” Let D denote the event that a
lightbulb is defective.
P (M1 D ) = Rob Gordon P (M1 ∩ D )
P (D ) (University of Florida) STA 3032 (7661) Fall 2011 45 / 251 Bayes’ Rule Example:
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer: Another way to ask the question is: “What is the probability the
lightbulb is from M1 , given it is defective?” Let D denote the event that a
lightbulb is defective.
P (M1 D ) = Rob Gordon P (M1 ∩ D )
(but we aren’t given these explicitly.)
P (D ) (University of Florida) STA 3032 (7661) Fall 2011 45 / 251 Bayes’ Rule Example:
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer: Another way to ask the question is: “What is the probability the
lightbulb is from M1 , given it is defective?” Let D denote the event that a
lightbulb is defective.
P (M1 D ) =
= Rob Gordon P (M1 ∩ D )
(but we aren’t given these explicitly.)
P (D )
P (D M1 )P (M1 )
P (D ) (University of Florida) STA 3032 (7661) Fall 2011 45 / 251 Bayes’ Rule Example:
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer: Another way to ask the question is: “What is the probability the
lightbulb is from M1 , given it is defective?” Let D denote the event that a
lightbulb is defective.
P (M1 D ) =
=
= Rob Gordon P (M1 ∩ D )
(but we aren’t given these explicitly.)
P (D )
P (D M1 )P (M1 )
(closer, but we still don’t know P (D ))
P (D )
P (D M1 )P (M1 )
P (D M1 )P (M1 ) + P (D M2 )P (M2 ) + P (D M3 )P (M3 ) (University of Florida) STA 3032 (7661) Fall 2011 45 / 251 Bayes’ Rule Example (continued):
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 46 / 251 Bayes’ Rule Example (continued):
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer:
P (M1 D ) = Rob Gordon P (D M1 )P (M1 )
P (D M1 )P (M1 ) + P (D M2 )P (M2 ) + P (D M3 )P (M3 ) (University of Florida) STA 3032 (7661) Fall 2011 46 / 251 Bayes’ Rule Example (continued):
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer:
P (M1 D ) =
= Rob Gordon P (D M1 )P (M1 )
P (D M1 )P (M1 ) + P (D M2 )P (M2 ) + P (D M3 )P (M3 )
(0.05)(0.4)
(0.05)(0.4) + (0.06)(0.35) + (0.08)(0.25) (University of Florida) STA 3032 (7661) Fall 2011 46 / 251 Bayes’ Rule Example (continued):
Suppose a factory has 3 machines. Machine 1 (M1 ) makes 40% of the
lightbulbs and has a 5% defect rate. M2 makes 35% at a 6% defect rate
and M3 makes 25% at an 8% defect rate. A quality assurance manager
selects 1 lightbulb from a pile of defects. What is the probability it is from
M1 ?
Answer:
P (D M1 )P (M1 )
P (D M1 )P (M1 ) + P (D M2 )P (M2 ) + P (D M3 )P (M3 )
(0.05)(0.4)
=
(0.05)(0.4) + (0.06)(0.35) + (0.08)(0.25)
= 0.33 P (M1 D ) = The hardest part is always determining what you know versus what you
need to ﬁnd. The only way to get good at this is to practice! Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 46 / 251 End of Chapter 4
Homework: read example 4.39 and do all problems from Section 4.5 (at
the very least read the example and do 4.37, 4.39, 4.40, 4.41, 4.43, 4.47) We will skip section 4.6 (Odds and Odds Ratios) for now. We may come
back to it later in the semester if time allows.
Quiz Announcement: Friday September 2nd. The quiz will take place
at the beginning 20 minutes (or so) of class. A lecture will follow once the
quiz is complete. Covers Chapters 1 and 4. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 47 / 251 Chapter 5: Discrete Probability Distributions
Dr. Hani Doss: “A Random Variable is neither random nor a variable.” Deﬁnition
A random variable (RV) assigns a numerical value to each outcome in a
sample space, and is denoted by a capital letter (usually at the bottom of
the alphabet, e.g. X, Y, Z, U, V, etc.)
You can think of a random variable as a function (or map if you prefer)
from a sample space to a number, e.g.
X :S →R
It is easier to think of a RV as a “regular” variable that assumes certain
values with some element of chance involved. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 48 / 251 Simple Examples
Rolling a 6sided die: S = {1, 2, 3, 4, 5, 6} → {1, 2, 3, 4, 5, 6}
Flipping a Coin: S = {H , T } → {0, 1}
Roll 2 6sided die: S = {(1, 1), (1, 2), . . . , (6, 6)} → {2, 3, . . . , 12}
Methods for working with Random Variables are more or less the same if
you use some advanced mathematics, but we will make a distinction
between two types of Random Variables: discrete and continuous. Deﬁnition
A Random Variable is discrete if its possible values form a discrete set.
By “discrete” set we mean there are gaps between items in the set.
Examples
{1.5, 2.4, 35, 50.3}
set of all integers
3 examples above
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 49 / 251 Chapter 5
Remember: we are still talking about computing probabilities. Deﬁnition
The probability mass function (pmf) of a discrete Random Variable, X ,
is the function p (x ) = P (X = x ).
You might also see a pmf denoted as f (x ). Deﬁnition
The cumulative distribution function (cdf) of X is the function
F (x ) = P (X ≤ x ) = p (t ) =
t ≤x P (X = t ) (6) t ≤x Furthermore, the above are deﬁned such that
p (x ) =
x
Rob Gordon (University of Florida) P (X = x ) = 1. (7) x
STA 3032 (7661) Fall 2011 50 / 251 Oh man, what? This is a radical shift in the way we think about calculating probabilities.
Once you get everything wrapped around your head, you’ll see that
redeﬁning probability this way actually makes things much easier.
Think of a RV as representing a population, and that speciﬁc observations
of a RV as numbers in our sample.
Recall that on the ﬁrst day of class we talked about things like mean and
variance of a sample. We can talk about the means and variances of
Random Variables as well. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 51 / 251 Expectation
Example: Think about the 6sided die. The mean of the numbers is
x=
¯ 1
6 i 1
xi = (1 + 2 + 3 + 4 + 5 + 6) = 3.5
6 Think about it another way: each face of the die has a
coming up. 1
6 chance of x=
¯ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 52 / 251 Expectation
Example: Think about the 6sided die. The mean of the numbers is
x=
¯ 1
6 i 1
xi = (1 + 2 + 3 + 4 + 5 + 6) = 3.5
6 Think about it another way: each face of the die has a
coming up.
x=
¯ 1
6 chance of 1
1
1
1
1
1
(1) + (2) + (3) + (4) + (5) + (6)
6
6
6
6
6
6 = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 52 / 251 Expectation
Example: Think about the 6sided die. The mean of the numbers is
x=
¯ 1
6 i 1
xi = (1 + 2 + 3 + 4 + 5 + 6) = 3.5
6 Think about it another way: each face of the die has a
coming up. 1
6 chance of 1
1
1
1
1
1
(1) + (2) + (3) + (4) + (5) + (6)
6
6
6
6
6
6
= (1)P (X = 1) + (2)P (X = 2) + (3)P (X = 3) + (4)P (X = 4) x=
¯ +(5)P (X = 5) + (6)P (X = 6)
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 52 / 251 Expectation
Example: Think about the 6sided die. The mean of the numbers is
x=
¯ 1
6 i 1
xi = (1 + 2 + 3 + 4 + 5 + 6) = 3.5
6 Think about it another way: each face of the die has a
coming up. 1
6 chance of 1
1
1
1
1
1
(1) + (2) + (3) + (4) + (5) + (6)
6
6
6
6
6
6
= (1)P (X = 1) + (2)P (X = 2) + (3)P (X = 3) + (4)P (X = 4) x=
¯ +(5)P (X = 5) + (6)P (X = 6)
(possibilities)(probabilities) =
x Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 52 / 251 Expectation and Variance
Deﬁnition
Let X be a RV with pmf P (X = x ). The mean or expectation of X is
denoted by µ (or µX ) and is given by
µ= xP (X = x ) (8) x You may also see the expectation be referred to as the expected value
and other symbols denoting it are EX , E (X ) and E [X ]. Deﬁnition
2
Let X be a discrete RV. The variance is denoted by σ 2 (or σX ) and is
given by x 2 P (X = x ) − µ2 = E X 2 − [EX ]2 σ 2 = Var (X ) = (9) x Homework: Show E (X − µ)2 = EX 2 − (EX )2
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 53 / 251 Linear Operators
Deﬁnition
We say L is a linear operator if for any functions f and g and any
constants a and b if
L[af + bg ] = aL[f ] + bL[g ]
Examples: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 54 / 251 Linear Operators
Deﬁnition
We say L is a linear operator if for any functions f and g and any
constants a and b if
L[af + bg ] = aL[f ] + bL[g ]
Examples:
From previous courses: ∂
∂x , , From this course: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 54 / 251 Linear Operators
Deﬁnition
We say L is a linear operator if for any functions f and g and any
constants a and b if
L[af + bg ] = aL[f ] + bL[g ]
Examples:
From previous courses: ∂
∂x , , From this course: E [·]
What about Var (·)? Is that a linear operator too? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 54 / 251 Linear Operators
Deﬁnition
We say L is a linear operator if for any functions f and g and any
constants a and b if
L[af + bg ] = aL[f ] + bL[g ]
Examples:
From previous courses: ∂
∂x , , From this course: E [·]
What about Var (·)? Is that a linear operator too? Nope. Prove it for
homework as an exercise.
Also for Homework: read Example 5.3 from Section 5.1, and do #5.13
and 5.15
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 54 / 251 Types of Discrete Distributions We will model our experiment with a Random Variable depending on the
type of experiment.
Section 5.3: Bernoulli Distribution
Imagine an experiment that results in 1 of 2 outcomes, one labelled
“Success” and the other “Failure.” This is called a Bernoulli Trial. We
deﬁne a Random Variable, X, as
X= 1,
0, success
failure X is a discrete RV with pmf deﬁned by P (X = 1) = p , P (X = 0) = 1 − p . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 55 / 251 Types of Discrete Distributions
Examples:
Fair coin: P (X = 1) = P (X = 0) = Rob Gordon (University of Florida) 1
2 STA 3032 (7661) Fall 2011 56 / 251 Types of Discrete Distributions
Examples:
Fair coin: P (X = 1) = P (X = 0) = 1
2
Let a success be rolling a 6 on a 6sided die and any other # a
failure, i.e.
roll a 6
roll a # ∈ {1, 2, 3, 4, 5}
1
p = P (X = 1) =
6
5
1 − p = P (X = 0) =
6
X EX Rob Gordon = 1,
0, = (University of Florida) STA 3032 (7661) Fall 2011 56 / 251 Types of Discrete Distributions
Examples:
Fair coin: P (X = 1) = P (X = 0) = 1
2
Let a success be rolling a 6 on a 6sided die and any other # a
failure, i.e.
roll a 6
roll a # ∈ {1, 2, 3, 4, 5}
1
p = P (X = 1) =
6
5
1 − p = P (X = 0) =
6
X EX = = 1,
0, xP (X = x ) = 1P (X = 1) + 0P (X = 0) = P (X = 1) = p Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 56 / 251 Types of Discrete Distributions
Examples:
Fair coin: P (X = 1) = P (X = 0) = 1
2
Let a success be rolling a 6 on a 6sided die and any other # a
failure, i.e.
roll a 6
roll a # ∈ {1, 2, 3, 4, 5}
1
p = P (X = 1) =
6
5
1 − p = P (X = 0) =
6
X EX = = 1,
0, xP (X = x ) = 1P (X = 1) + 0P (X = 0) = P (X = 1) = p
Var (X ) =
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 56 / 251 Types of Discrete Distributions
Examples:
Fair coin: P (X = 1) = P (X = 0) = 1
2
Let a success be rolling a 6 on a 6sided die and any other # a
failure, i.e.
roll a 6
roll a # ∈ {1, 2, 3, 4, 5}
1
p = P (X = 1) =
6
5
1 − p = P (X = 0) =
6
X EX = = 1,
0, xP (X = x ) = 1P (X = 1) + 0P (X = 0) = P (X = 1) = p
Var (X ) = EX 2 − (EX )2 =
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 56 / 251 Types of Discrete Distributions
Examples:
Fair coin: P (X = 1) = P (X = 0) = 1
2
Let a success be rolling a 6 on a 6sided die and any other # a
failure, i.e.
roll a 6
roll a # ∈ {1, 2, 3, 4, 5}
1
p = P (X = 1) =
6
5
1 − p = P (X = 0) =
6
X EX = = 1,
0, xP (X = x ) = 1P (X = 1) + 0P (X = 0) = P (X = 1) = p
Var (X ) = EX 2 − (EX )2 = P (X = 1) − (P (X = 1))2 = p − p 2
= p (1 − p )
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 56 / 251 Discrete Uniform Distribution
Suppose we have an experiment where each outcomes is equally likely. and
that there are k possible outcomes.
Then the pmf is deﬁned the following way:
1
, where x ∈ {1, 2, . . . , k }
k
Then the Expectation and Variance is given by:
P (X = x ) = k EX xP (X = x ) = =
x =1 1
k k 1
k x=
x =1 (10) k (k + 1)
2 k +1
2 = (11) k x 2 P (X = x ) − Var (X ) =
x =1 k +1
2 2 = ··· = k2 − 1
12 (12) Examples: 6sided die, Fair coin, etc.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 57 / 251 More Discrete Distributions
Suppose we have a situation where we are ﬂipping a coin more than once.
I might ask: What is the probability that I get k heads in n ﬂips?
We have a distribution for that: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 58 / 251 More Discrete Distributions
Suppose we have a situation where we are ﬂipping a coin more than once.
I might ask: What is the probability that I get k heads in n ﬂips?
We have a distribution for that:
Suppose a total of n Bernoulli trials are conducted and
Trials are independent
Each trial has the same success probability p
The RV X represents the # of successes in n trials
The # n is ﬁxed and known when the experiment starts.
Then X has the binomial distribution with parameters n and p , denoted
by X ∼ Bin(n, p ). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 58 / 251 Binomial Distribution
What is the pmf of the Binomial distribution? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 59 / 251 Binomial Distribution
What is the pmf of the Binomial distribution?
Say we go back to coin tossing. We toss the coin (n=)3 times. Let’s say
we want P (X = 2), i.e. the probability of seeing (k=)2 successes (we
deﬁne heads as success, where P (H ) = p for this example).
We could get 2 heads a few ways: HHT , HTH , THH . So P (X = 2) = P (HHT or HTH or THH )
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 59 / 251 Binomial Distribution
What is the pmf of the Binomial distribution?
Say we go back to coin tossing. We toss the coin (n=)3 times. Let’s say
we want P (X = 2), i.e. the probability of seeing (k=)2 successes (we
deﬁne heads as success, where P (H ) = p for this example).
We could get 2 heads a few ways: HHT , HTH , THH . So P (X = 2) = P (HHT or HTH or THH )
= P (HHT ) + P (HTH ) + P (THH ) (why?)
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 59 / 251 Binomial Distribution
What is the pmf of the Binomial distribution?
Say we go back to coin tossing. We toss the coin (n=)3 times. Let’s say
we want P (X = 2), i.e. the probability of seeing (k=)2 successes (we
deﬁne heads as success, where P (H ) = p for this example).
We could get 2 heads a few ways: HHT , HTH , THH . So P (X = 2) = P (HHT or HTH or THH )
= P (HHT ) + P (HTH ) + P (THH ) (why?)
= P (H )P (H )P (T ) + P (H )P (T )P (H ) + P (T )P (H )P (H )
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 59 / 251 Binomial Distribution
What is the pmf of the Binomial distribution?
Say we go back to coin tossing. We toss the coin (n=)3 times. Let’s say
we want P (X = 2), i.e. the probability of seeing (k=)2 successes (we
deﬁne heads as success, where P (H ) = p for this example).
We could get 2 heads a few ways: HHT , HTH , THH . So P (X = 2) = P (HHT or HTH or THH )
= P (HHT ) + P (HTH ) + P (THH ) (why?)
= P (H )P (H )P (T ) + P (H )P (T )P (H ) + P (T )P (H )P (H )
= p 2 (1 − p ) + p (1 − p )p + (1 − p )p 2 = 3p 2 (1 − p ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 59 / 251 Binomial Distribution
More generally speaking, what if we have n trials and we want P (X = k )? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 60 / 251 Binomial Distribution
More generally speaking, what if we have n trials and we want P (X = k )?
From our previous example of n = 3, k = 2 we saw that
P (X = 2) = 3p 2 (1 − p ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 60 / 251 Binomial Distribution
More generally speaking, what if we have n trials and we want P (X = k )?
From our previous example of n = 3, k = 2 we saw that
P (X = 2) = 3p 2 (1 − p ) = (constant )p k (1 − p )n−k
Its tempting to say, well the constant = n, but that’s not always the case.
Let’s ﬁgure out when the constant is.
For a binomial distribution we are not asking whether or not the the
successes and failures come in a certain order, we only care about the total
number of successes.
Then the number of arrangements of successes and failures is given by Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 n
k . 60 / 251 Binomial Distribution
Deﬁnition
The pmf of the Binomial Distribution is given by the following:
f (x ; n, p ) = P (X = x ) = nx
p (1 − p )n−x , where x ∈ {0, 1, . . . , n} (13)
x Special Case of Binomial Distribution: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 61 / 251 Binomial Distribution
Deﬁnition
The pmf of the Binomial Distribution is given by the following:
f (x ; n, p ) = P (X = x ) = nx
p (1 − p )n−x , where x ∈ {0, 1, . . . , n} (13)
x Special Case of Binomial Distribution:
n = 1 : P (X = x ) = p x (1 − p )1−x (Bernoulli Distribution)
It turns out that a Binomial RV is a sum of independent identically
distributed Bernoulli Random Variables (we’ll talk more about this in
detail in Chapter 7).
Also notice that by the Binomial Theorem:
n P (X = x ) = (p + 1 − p )n = 1
x =0
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 61 / 251 Binomial Distribution
Finding the expectation and variance is tricky. EX Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 62 / 251 Binomial Distribution
Finding the expectation and variance is tricky.
n EX = n xP (X = x ) =
x =0
n = Rob Gordon x =0 x
x =1 x nx
p (1 − p )n−x
x nx
p (1 − p )n−x =
x (University of Florida) n x
x =1 STA 3032 (7661) n!
x !(n − x )! p x (1 − p )n−x Fall 2011 62 / 251 Binomial Distribution
Finding the expectation and variance is tricky.
n EX = n xP (X = x ) =
x =0
n =
=
x =1 Rob Gordon x =0 x
x =1
n x nx
p (1 − p )n−x
x nx
p (1 − p )n−x =
x n x
x =1 n!
x !(n − x )! p x (1 − p )n−x n!
p x (1 − p )n−x
(x − 1)!(n − x )! (University of Florida) STA 3032 (7661) Fall 2011 62 / 251 Binomial Distribution
Finding the expectation and variance is tricky.
n EX = n xP (X = x ) =
x =0
n =
=
x =1 Rob Gordon x =0 x
x =1
n x nx
p (1 − p )n−x
x nx
p (1 − p )n−x =
x n x
x =1 n!
x !(n − x )! p x (1 − p )n−x n!
p x (1 − p )n−x Let y = x − 1.
(x − 1)!(n − x )! (University of Florida) STA 3032 (7661) Fall 2011 62 / 251 Binomial Distribution
Finding the expectation and variance is tricky.
n EX = n xP (X = x ) =
x =0
n =
=
x =1
n −1 =
y =0 Rob Gordon x =0 x
x =1
n x nx
p (1 − p )n−x
x nx
p (1 − p )n−x =
x n x
x =1 n!
x !(n − x )! p x (1 − p )n−x n!
p x (1 − p )n−x Let y = x − 1.
(x − 1)!(n − x )!
n!
p y +1 (1 − p )n−y −1
y !(n − y − 1)! (University of Florida) STA 3032 (7661) Fall 2011 62 / 251 Binomial Distribution
Finding the expectation and variance is tricky.
n EX n = xP (X = x ) =
x =0
n = x =0 x
x =1
n =
x =1
n −1 =
y =0 nx
p (1 − p )n−x
x nx
p (1 − p )n−x =
x n x
x =1 n!
x !(n − x )! p x (1 − p )n−x n!
p x (1 − p )n−x Let y = x − 1.
(x − 1)!(n − x )!
n!
p y +1 (1 − p )n−y −1
y !(n − y − 1)! n−1 = np
y =0 Rob Gordon x (University of Florida) n−1 y
p (1 − p )n−1−y = np (1)
y STA 3032 (7661) Fall 2011 62 / 251 Binomial Distribution
Finding the expectation and variance is tricky.
n EX n = xP (X = x ) =
x =0
n = x =0 x
x =1
n =
x =1
n −1 =
y =0 x nx
p (1 − p )n−x
x nx
p (1 − p )n−x =
x n x
x =1 n!
x !(n − x )! p x (1 − p )n−x n!
p x (1 − p )n−x Let y = x − 1.
(x − 1)!(n − x )!
n!
p y +1 (1 − p )n−y −1
y !(n − y − 1)! n−1 = np
y =0 n−1 y
p (1 − p )n−1−y = np (1)
y (14) Var (X ) = np (1 − p ) Prove for HW
Rob Gordon (University of Florida) STA 3032 (7661) (15)
Fall 2011 62 / 251 Binomial Distribution: Examples
A die is rolled 6 times. Let a success be the event I roll 6. Find the
probability I roll (A) 2 sixes, (B) less than 5 sixes and (C) at least 2 and at
most 4 sixes. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 63 / 251 Binomial Distribution: Examples
A die is rolled 6 times. Let a success be the event I roll 6. Find the
probability I roll (A) 2 sixes, (B) less than 5 sixes and (C) at least 2 and at
most 4 sixes.
1
First, recognize that X ∼ Bin n = 6, p = 6 (A) P (roll 2 sixes) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 63 / 251 Binomial Distribution: Examples
A die is rolled 6 times. Let a success be the event I roll 6. Find the
probability I roll (A) 2 sixes, (B) less than 5 sixes and (C) at least 2 and at
most 4 sixes.
1
First, recognize that X ∼ Bin n = 6, p = 6 (A) P (roll 2 sixes) = P (X = 2) = 6
2 1
6 2 5
6 6−2 = 0.2 (B) P (less than 5 sixes) = P (X ≤ 4) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 63 / 251 Binomial Distribution: Examples
A die is rolled 6 times. Let a success be the event I roll 6. Find the
probability I roll (A) 2 sixes, (B) less than 5 sixes and (C) at least 2 and at
most 4 sixes.
1
First, recognize that X ∼ Bin n = 6, p = 6
6
125
2
6
6
(B) P (less than 5 sixes) = P (X ≤ 4) = 1 − P (X > 4)
(A) P (roll 2 sixes) = P (X = 2) = 6−2 = 0.2 = 1 − [P (X = 5) + P (X = 6)] Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 63 / 251 Binomial Distribution: Examples
A die is rolled 6 times. Let a success be the event I roll 6. Find the
probability I roll (A) 2 sixes, (B) less than 5 sixes and (C) at least 2 and at
most 4 sixes.
1
First, recognize that X ∼ Bin n = 6, p = 6
6−2 6
125
2
6
6
(B) P (less than 5 sixes) = P (X ≤ 4) = 1 − P (X > 4)
(A) P (roll 2 sixes) = P (X = 2) = = 0.2 = 1 − [P (X = 5) + P (X = 6)]
= 1− 6
5 1
6 5 5
6 1 + 6
6 1
6 6 5
6 0 = 0.999 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 63 / 251 Binomial Distribution: Examples Continued X ∼ Bin n = 6, p = 1
6 (C) P (2 ≤ X ≤ 4) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 64 / 251 Binomial Distribution: Examples Continued X ∼ Bin n = 6, p = 1
6 (C) P (2 ≤ X ≤ 4) = P (X ≤ 4) − P (X ≤ 1)
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 64 / 251 Binomial Distribution: Examples Continued X ∼ Bin n = 6, p = 1
6 (C) P (2 ≤ X ≤ 4) = P (X ≤ 4) − P (X ≤ 1)
= F (4) − F (1) where F is the cdf for X
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 64 / 251 Binomial Distribution: Examples Continued X ∼ Bin n = 6, p = 1
6 (C) P (2 ≤ X ≤ 4) = P (X ≤ 4) − P (X ≤ 1)
= F (4) − F (1) where F is the cdf for X
= 0.9993 − 0.7368
= 0.26256
Homework: Section 5.4 (all odds) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 64 / 251 Poisson Distribution
For just a second we skip section 5.5. We’ll do 5.6 then immediately go
back to 5.5.
To review:
Use the binomial distribution when we have a sum of independent bernoulli
trials (each trial is a success or failure, each trial independepent, etc.)
What if we have a situation where a binomial RV is appropriate, but
parameters are extreme?
If n → ∞, p → 0 but np → λ (a constant), we have what’s called a
Poisson Random Variable. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 65 / 251 Poisson Distribution
Deﬁnition
We say X ∼ Poisson(λ) where λ > 0 if the pmf is given by
f (x ; λ) = P (X = x ) = e −λ λx
, x ∈ {0, 1, . . .}
x! (16) Remember that this is just a very extreme case of the binomial
distribution. Its possible to use calculus/magic/voodoo/kungfu to show
lim n→∞,p →0 nx
e λ λx
p (1 − p )n−x =
x!
x A derivation of this can be found on page 247 of the book. Take a look at
it if you’re curious, but you don’t need to memorize it. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 66 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
E (X ) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 67 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
∞ E (X ) = x
x =0 Rob Gordon e −λ λx
=
x! (University of Florida) STA 3032 (7661) Fall 2011 67 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
∞ E (X ) = x
x =0 Rob Gordon e −λ λx
=
x! (University of Florida) ∞ x
x =1 e −λ λx
=
x! STA 3032 (7661) Fall 2011 67 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
∞ E (X ) = x
x =0 Rob Gordon e −λ λx
=
x! (University of Florida) ∞ x
x =1 e −λ λx
=
x! STA 3032 (7661) ∞
x =1 e −λ λ x
=
(x − 1)! Fall 2011 67 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
∞ E (X ) = x
x =0 Rob Gordon e −λ λx
=
x! (University of Florida) ∞ x
x =1 e −λ λx
=
x! STA 3032 (7661) ∞
x =1 e −λ λ x
= e −λ
(x − 1)! ∞
y =0 Fall 2011 λy +1
y! 67 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
∞ E (X ) = x
x =0 e −λ λx
=
x!
∞ = λ e −λ
y =0 Rob Gordon (University of Florida) λy
y! ∞ x
x =1 e −λ λx
=
x! ∞
x =1 e −λ λ x
= e −λ
(x − 1)! ∞
y =0 λy +1
y! = STA 3032 (7661) Fall 2011 67 / 251 Poisson Distribution
What does n → ∞, p → 0, np → λ mean in the real world though?
Use the Poisson distribution to model rare events, or events that happen
eventually over a long enough times span.
Examples:
Car crashes
A pitcher throwing a nohitter
A stenographer making a typographic error
∞ E (X ) = x
x =0 e −λ λx
=
x!
∞ = λ e −λ
y =0 λy
y! ∞ x
x =1 e −λ λx
=
x! ∞
x =1 e −λ λ x
= e −λ
(x − 1)! ∞
y =0 = λe −λ e λ = λ (17) Var (X ) = λ (prove for HW)
Rob Gordon (University of Florida) λy +1
y! (18)
STA 3032 (7661) Fall 2011 67 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = Rob Gordon (University of Florida) e −(300)(0.002) (300x 0.002)2
2! STA 3032 (7661) = 0.099 Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6
(3) σ = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6
(3) σ = Rob Gordon Var (X ) = (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6
√
√
(3) σ = Var (X ) = λ = 0.6 = 0.77 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6
√
√
(3) σ = Var (X ) = λ = 0.6 = 0.77
(4) P (board works) =
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6
√
√
(3) σ = Var (X ) = λ = 0.6 = 0.77
(4) P (board works) = P (X = 0) =
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing. Find..
(1) the probability exactly 2 diodes fail.
(2) the mean number of diodes that fail.
(3) the standard deviation.
(4) the probability that the board works.
(5) the probability that out of 5 boards that are shipped to a customer, 4
or more work.
We have a large number of “trials” and a very small probability of success.
Use X ∼ Poisson(λ = (300)(0.002)).
(1) P (exactly 2 fail) = P (X = 2) = e −(300)(0.002) (300x 0.002)2
2! = 0.099 (2) µ = (300)(0.002) = 0.6
√
√
(3) σ = Var (X ) = λ = 0.6 = 0.77
(4) P (board works) = P (X = 0) =
Rob Gordon (University of Florida) e −0.6 0.60
0! STA 3032 (7661) = 0.55
Fall 2011 68 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work.
This is actually a pretty tricky problem because we have to “combine” two
distributions to solve this. We need to deﬁne success on 2 levels: success
in that a diode works, and success that all diodes work. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work.
This is actually a pretty tricky problem because we have to “combine” two
distributions to solve this. We need to deﬁne success on 2 levels: success
in that a diode works, and success that all diodes work.
Let X ∼ Poisson(λ = 0.6), and let Y ∼ Binom(n = 5, p = P (X = 0)). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work.
This is actually a pretty tricky problem because we have to “combine” two
distributions to solve this. We need to deﬁne success on 2 levels: success
in that a diode works, and success that all diodes work.
Let X ∼ Poisson(λ = 0.6), and let Y ∼ Binom(n = 5, p = P (X = 0)).
P (4 or more work) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work.
This is actually a pretty tricky problem because we have to “combine” two
distributions to solve this. We need to deﬁne success on 2 levels: success
in that a diode works, and success that all diodes work.
Let X ∼ Poisson(λ = 0.6), and let Y ∼ Binom(n = 5, p = P (X = 0)).
P (4 or more work) = P (Y ≥ 4) = P (Y = 4) + P (Y = 5) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work.
This is actually a pretty tricky problem because we have to “combine” two
distributions to solve this. We need to deﬁne success on 2 levels: success
in that a diode works, and success that all diodes work.
Let X ∼ Poisson(λ = 0.6), and let Y ∼ Binom(n = 5, p = P (X = 0)).
P (4 or more work) = P (Y ≥ 4) = P (Y = 4) + P (Y = 5)
5
4
5−4
=
e −0.6
1 − e −0.6
4
5
5
5−5
+
e −0.6
1 − e −0.6
5 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Examples
A circuit board has 300 diodes with probability 0.002 of failing.
(5) Find the probability that out of 5 boards that are shipped to a
customer, 4 or more work.
This is actually a pretty tricky problem because we have to “combine” two
distributions to solve this. We need to deﬁne success on 2 levels: success
in that a diode works, and success that all diodes work.
Let X ∼ Poisson(λ = 0.6), and let Y ∼ Binom(n = 5, p = P (X = 0)).
P (4 or more work) = P (Y ≥ 4) = P (Y = 4) + P (Y = 5)
5
4
5−4
=
e −0.6
1 − e −0.6
4
5
5
5−5
+
e −0.6
1 − e −0.6
5
= 0.25.
Homework: 5.6 odds
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 69 / 251 Back to Section 5.5
Now let’s do another modiﬁcation of the binomial experiment. Recall one
of the assumptions for using a Binomial RV: ﬁxed n (number of trials).
Now let’s do the opposite: Instead of ﬁxing the number of trials (n) and
testing the number of successes (x ), let’s do an experiment a variable
number of times until we get the desired number of successes.
Essentially we are reversing the roles of x and n.
Example: Roll a die until you get 2 sixes. The sequence might look like
this: (F )(F )(S )(F )(F )(F )(S ).
By independence of the die rolling, the probability most likely looks
something like
P [(F )(F )(S )(F )(F )(F )(S )] = cP (F )P (F )P (S )P (F )P (F )P (F )P (S )
where c is some constant.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 70 / 251 Why do we have a constant?
Remember we only care about rolling the die on until we get 2 sixes.
If we see (F )(F )(S )(F )(F )(F )(S ), we notice that the second success
happens on our 7th roll, but this isn’t the only way we can write out this
sequence of letters such that the 7th letter is S.
The calculation of the probability we get that 2nd six on the 7th roll
depends on how many ways we can rearrange all the other letters in the
sequence. That means, we’re free to arrange those ﬁrst 6 letters in any
way that we like.
How many ways can we arrange those letters? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 71 / 251 Why do we have a constant?
Remember we only care about rolling the die on until we get 2 sixes.
If we see (F )(F )(S )(F )(F )(F )(S ), we notice that the second success
happens on our 7th roll, but this isn’t the only way we can write out this
sequence of letters such that the 7th letter is S.
The calculation of the probability we get that 2nd six on the 7th roll
depends on how many ways we can rearrange all the other letters in the
sequence. That means, we’re free to arrange those ﬁrst 6 letters in any
way that we like.
How many ways can we arrange those letters?
6
1 Rob Gordon (University of Florida) = # of trials − 1
# successes − 1 STA 3032 (7661) Fall 2011 71 / 251 Negative Binomial
More generally, we don’t know # trials (=x), but know # successes (=k).
How many ways can we arrange those letters?
6
1 Rob Gordon (University of Florida) = # of trials − 1
# successes − 1 STA 3032 (7661) = Fall 2011 72 / 251 Negative Binomial
More generally, we don’t know # trials (=x), but know # successes (=k).
How many ways can we arrange those letters?
6
1 = # of trials − 1
# successes − 1 = x −1
k −1 Deﬁnition
We say X NegBin(k , p ) if it has a pmf given by
f (x ; k , p ) = P (X = x ) = x −1 k
p (1 − p )x −k , x ∈ {k , k + 1, . . .} (19)
k −1 where k is the number of successes and p is the probability of a single
success.
EX =
Rob Gordon (University of Florida) k
k (1 − p )
, Var (X ) =
p
p2
STA 3032 (7661) (20)
Fall 2011 72 / 251 Special Case of NegBinom: Geometric Distribution
If we perform an experiment until we obtain 1 success (k=1), then f (1; p , k = 1) = P (X = 1) = x −1 1
p (1 − p )x −1 = p (1 − p )x −1 ,
1−1 Stated formally: Deﬁnition
We say X has a Geometric distribution, denoted X ∼ Geo (p ) if its pmf is
given by
f (x ; p ) = P (X = x ) = p (1 − p )x −1 , x ∈ {1, 2, . . .} EX = 1
1−p
, Var (X ) =
p
p2 (21) (22) Note: The negative binomial is the sum of geometric Random Variables.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 73 / 251 Examples
(1) A coin is ﬂipped until 3 heads are obtained. What is the probability of
this happening on the 5th ﬂip?
(2) A 6sided die is rolled until a 6 is rolled. What is the probability of
this happening on the 4th roll?
Answers: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 74 / 251 Examples
(1) A coin is ﬂipped until 3 heads are obtained. What is the probability of
this happening on the 5th ﬂip?
(2) A 6sided die is rolled until a 6 is rolled. What is the probability of
this happening on the 4th roll?
Answers:
(1) Let X ∼ NegBin(k = 3, p = 1/2). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 74 / 251 Examples
(1) A coin is ﬂipped until 3 heads are obtained. What is the probability of
this happening on the 5th ﬂip?
(2) A 6sided die is rolled until a 6 is rolled. What is the probability of
this happening on the 4th roll?
Answers:
(1) Let X ∼ NegBin(k = 3, p = 1/2).
P (X = 5) = Rob Gordon 5−1
3−1 (University of Florida) 13
2 1− 1 5−3
2 = ··· = STA 3032 (7661) 3
16 Fall 2011 74 / 251 Examples
(1) A coin is ﬂipped until 3 heads are obtained. What is the probability of
this happening on the 5th ﬂip?
(2) A 6sided die is rolled until a 6 is rolled. What is the probability of
this happening on the 4th roll?
Answers:
(1) Let X ∼ NegBin(k = 3, p = 1/2).
P (X = 5) = 5−1
3−1 13
2 1− 1 5−3
2 = ··· = 3
16 (2) Let Y ∼ Geo (p = 1/6). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 74 / 251 Examples
(1) A coin is ﬂipped until 3 heads are obtained. What is the probability of
this happening on the 5th ﬂip?
(2) A 6sided die is rolled until a 6 is rolled. What is the probability of
this happening on the 4th roll?
Answers:
(1) Let X ∼ NegBin(k = 3, p = 1/2).
P (X = 5) = 5−1
3−1 13
2 1− 1 5−3
2 = ··· = 3
16 (2) Let Y ∼ Geo (p = 1/6).
P (Y = 4) = 1
6 1− 1 4−1
6 = 0.096 Homework: 5.5 odds
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 74 / 251 To summarize:
You will not pass this course if you do not recognize how and where to
apply the discrete distributions we have discussed. To review:
Distribution
Discrete Uniform
Bernoulli
Binomial
Poisson
Negative Binomial
Geometric Rob Gordon (University of Florida) Situation
Each element of support has equal probability
1 trial resulting in Success or Failure
(ﬁxed) n Bernoulli trials, results independent,
# successes unknown
large n, small p , np → λ
# successes unknown
unknown # trials, # successes known
Neg. Binomial with k = 1. STA 3032 (7661) Fall 2011 75 / 251 Chapter 6: Continuous Random Variables
Deﬁnition
A random variable is continuous if its probabilities are given as areas
under a curve. The curve is called a probability density function (pdf)
for the random variable.
The pdf has most of the same properties as the pmf with some slight
changes.
If f (x ) is a pdf, 0 ≤ f (x ) for all x .
If X is continuous P (X = x ) = 0. Why? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 76 / 251 Chapter 6: Continuous Random Variables
Deﬁnition
A random variable is continuous if its probabilities are given as areas
under a curve. The curve is called a probability density function (pdf)
for the random variable.
The pdf has most of the same properties as the pmf with some slight
changes.
If f (x ) is a pdf, 0 ≤ f (x ) for all x .
If X is continuous P (X = x ) = 0. Why?
For continuous RVs, probabilities are given by areas under the curve.
In other words you’ll see things like
P (a ≤ X ≤ b ) =
P (X ≤ b ) =
P (X ≥ a) =
Rob Gordon (University of Florida) b
a f (x )dx b
f (x )dx
−∞
∞
f (x )dx
a STA 3032 (7661) Fall 2011 76 / 251 So why is P (X = x ) = 0 for continuous RVs?
Remember, probabilities are deﬁned as areas under a curve, i.e. integrals.
x
P (X = x ) = P (x ≤ X ≤ x ) = x f (x )dx = 0.
This leads to some weird things with the notation. Unlike with pmfs we
make no distinction between P (a ≤ X ≤ b ) and P (a < X < b ).
If X is discrete, P (X ≥ 0) = P (X = 0) + P (X = 1) + . . .
If X is continuous, P (X ≥ 0) = P (X = 0) + P (X > 0) = 0 + P (X > 0)
One more property of continuous RVs:
∞
∞ f (x )dx = 1
This is just the continuous analog of the property for discrete RVs where
x P (X = x ) = 1. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 77 / 251 Chapter 6: Continuous RVs
Deﬁnition
Let X be a continuous Random Variable with pdf f (x ). The cumulative
distribution function (cdf) of X is
x F (x ) = P (X ≤ x ) = f (t )dt . (23) −∞ Examples: Let f (x ) = 2x , 0 ≤ x ≤ 1
0,
otherwise (1) Is f a valid pdf?
(2) Find the cdf of X .
(3) Find P (0.25 ≤ X ≤ 0.75). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 78 / 251 Examples: Continuous Random Variables
2x , 0 ≤ x ≤ 1
0,
otherwise
(1) Is f a valid pdf? Let f (x ) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 79 / 251 Examples: Continuous Random Variables
2x , 0 ≤ x ≤ 1
0,
otherwise
(1) Is f a valid pdf? Let f (x ) = ∞ f (x )dx = −∞ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 79 / 251 Examples: Continuous Random Variables
2x , 0 ≤ x ≤ 1
0,
otherwise
(1) Is f a valid pdf? Let f (x ) = ∞
−∞ ∞ 1 0 f (x )dx f (x )dx + =
−∞ f (x )dx +
0 1 = 0+ 2xdx + 0 =
0 Rob Gordon (University of Florida) STA 3032 (7661) f (x )dx
0 2x 2
2 1
0 = 1. Fall 2011 79 / 251 Examples: Continuous Random Variables
2x , 0 ≤ x ≤ 1
0,
otherwise
(1) Is f a valid pdf? Let f (x ) = ∞
−∞ ∞ 1 0 f (x )dx f (x )dx + =
−∞ f (x )dx +
0 1 = 0+ 2xdx + 0 =
0 f (x )dx
0 2x 2
2 1
0 = 1. (2) Find the cdf of X . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 79 / 251 Examples: Continuous Random Variables
2x , 0 ≤ x ≤ 1
0,
otherwise
(1) Is f a valid pdf? Let f (x ) = ∞
−∞ ∞ 1 0 f (x )dx f (x )dx + =
−∞ f (x )dx +
0 1 = 0+ 2xdx + 0 =
0 f (x )dx
0 2x 2
2 1
0 = 1. (2) Find the of X .
cdf 0, −∞ < x < 0
x 2,
0≤x ≤1
F (x ) = 1,
1<x <∞ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 79 / 251 Examples Continued (3) Find P (0.25 ≤ X ≤ 0.75). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 80 / 251 Examples Continued (3) Find P (0.25 ≤ X ≤ 0.75).
0.75 2xdx = x 2 P (0.25 ≤ X ≤ 0.75) =
0.25 Rob Gordon (University of Florida) STA 3032 (7661) 0.75
0.25 = 1
2 Fall 2011 80 / 251 Examples Continued (3) Find P (0.25 ≤ X ≤ 0.75).
0.75 1
2
0.25
0.25
= P (X ≤ 0.75) − P (X ≤ 0.25)
9
1
1
= F (0.75) − F (0.25) =
−
=
16 16
2 P (0.25 ≤ X ≤ 0.75) = Rob Gordon (University of Florida) 2xdx = x 2 STA 3032 (7661) 0.75 = Fall 2011 80 / 251 More fun stuﬀ
The mean and variance of continuous RVs are similar to that of their
discrete counterparts. Just replace
with . Deﬁnition
The mean of a continuous Random Variable, say X is given by the
following:
∞ µ = EX = xf (x )dx (24) −∞ The variance is given by
∞ σ 2 = E (X − µ)2 = (x − µ)2 f (x )dx (25) −∞ An alternate (and equivalent) calculation is
∞ σ 2 = E [X 2 ] − (EX )2 = x 2 dx − µ2 (26) −∞
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 81 / 251 Example
Find the standard deviation of X with pdf f (x ) = Rob Gordon (University of Florida) STA 3032 (7661) 2x , 0 ≤ x ≤ 1
0,
otherwise Fall 2011 82 / 251 Example
Find the standard deviation of X with pdf f (x ) =
∞ EX =
−∞ Rob Gordon (University of Florida) 1 2x 2 dx = xf (x )dx =
0 STA 3032 (7661) 2x , 0 ≤ x ≤ 1
0,
otherwise
2x 3
3 1
0 = 2
3 Fall 2011 82 / 251 Example
2x , 0 ≤ x ≤ 1
0,
otherwise Find the standard deviation of X with pdf f (x ) =
∞ EX = 1 −∞
∞ E X2 (University of Florida) 0 2x 3
3 1 x 2 f (x )dx = =
−∞ Rob Gordon 2x 2 dx = xf (x )dx = 2x 3 dx =
0 STA 3032 (7661) 1
0 = 2x 4 1
4 0 2
3 = 1
2 Fall 2011 82 / 251 Example
2x , 0 ≤ x ≤ 1
0,
otherwise Find the standard deviation of X with pdf f (x ) =
∞ EX = 1 −∞
∞ E X2 2x 2 dx = xf (x )dx =
0 1 x 2 f (x )dx = =
−∞ 2x 3 dx =
0 σ 2 = E X 2 − (EX )2 = Rob Gordon (University of Florida) 2x 3
3 STA 3032 (7661) 1
−
2 2
3 1
0 = 2x 4 1
4 2 = 0 2
3 = 1
2 1
18 Fall 2011 82 / 251 Example
2x , 0 ≤ x ≤ 1
0,
otherwise Find the standard deviation of X with pdf f (x ) =
∞ EX = 1 −∞
∞ E X2 2x 2 dx = xf (x )dx =
0 1 x 2 f (x )dx = =
−∞ 2x 3 dx =
0 σ 2 = E X 2 − (EX )2 =
σ= 2x 3
3 1/18 = 1
−
2 2
3 1
0 = 2x 4 1
4 2 = 0 2
3 = 1
2 1
18 11
√.
32 Homework: Section 6.1 #2(a, b, d), 3, 5(a, b, c), Section 6.2 #9,
11
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 82 / 251 Frequently Used pdfs
Continuous Uniform Distribution
Similar to discrete uniform.
Recall for discrete uniform distribution, every point in the support has
equal chance of occurring.
For Continuous Uniform Distribution, every interval with an equal
width has an equal chance of occurring. Deﬁnition
X is said to have a continuous uniform pdf if
f (x ) = Rob Gordon (University of Florida) 1
b −a , 0, a≤x ≤b
otherwise STA 3032 (7661) (27) Fall 2011 83 / 251 Continuous Uniform Distribution f(x) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Continuous Uniform pdf with a = 2, b = 6 0 2 4 6 8 10 x Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 84 / 251 Continuous Uniform Distribution EX Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 85 / 251 Continuous Uniform Distribution b EX = x
a Rob Gordon (University of Florida) 1
dx =
b−a STA 3032 (7661) Fall 2011 85 / 251 Continuous Uniform Distribution b EX = x
a Rob Gordon (University of Florida) 1 x2
1
dx =
b−a
b−a 2 STA 3032 (7661) b
a = b+a
2 Fall 2011 85 / 251 Continuous Uniform Distribution b EX = VarX = 1 x2
1
dx =
b−a
b−a 2
a
2
(b − a)
(Verify for HW)
12
x b
a = b+a
2 Example: The RTS 9Route is supposed to come to a certain stop every
12 minutes. Waiting times at bus stops are often modeled with the
continuous uniform distribution. Suppose X is a random variable
representing waiting time, i.e. X ∼ Unif (0, 12).
(1) What is the probability of waiting 7 or more minutes?
(2) What is the average waiting time? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 85 / 251 Example: Continuous Uniform Distribution P (X ≥ 7) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 86 / 251 Example: Continuous Uniform Distribution ∞ P (X ≥ 7) = f (x )dx =
7 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 86 / 251 Example: Continuous Uniform Distribution ∞ P (X ≥ 7) =
7 Rob Gordon (University of Florida) 12 f (x )dx =
7 STA 3032 (7661) 1
dx =
12 − 0 Fall 2011 86 / 251 Example: Continuous Uniform Distribution ∞ P (X ≥ 7) =
7 EX Rob Gordon 12 f (x )dx =
7 1
1
dx = x
12 − 0
12 12
7 = 5
12 = (University of Florida) STA 3032 (7661) Fall 2011 86 / 251 Example: Continuous Uniform Distribution ∞ P (X ≥ 7) =
7 EX Rob Gordon = (University of Florida) 12 f (x )dx =
7 1
1
dx = x
12 − 0
12 12
7 = 5
12 12 + 0
= 6 minutes
2 STA 3032 (7661) Fall 2011 86 / 251 Example: Continuous Uniform Distribution ∞ P (X ≥ 7) =
7 EX = 12 f (x )dx =
7 1
1
dx = x
12 − 0
12 12
7 = 5
12 12 + 0
= 6 minutes
2 What is the probability of waiting less than 5 minutes? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 86 / 251 The Normal Distribution
The normal distribution is the most commonly used distribution. Deﬁnition
X has a normal distribution if its pdf is given by
1
f (x ; µ, σ 2 ) = √
exp
2πσ 2 −1
(x − µ)2 , −∞ < x < ∞
2σ 2 (28) 2
where µX = µ and σX = σ 2 and is denoted as X ∼ N (µ, σ 2 ). The normal distribution is the bellshaped curve with the following (very
important) properties:
Symmetric about x = µ.
x = µ ± σ are inﬂection points (concavity changes).
For any normal distribution:
About 68% of population is in µ ± σ .
About 95% of population is in µ ± 2σ .
About 99.7% of population is in µ ± 3σ .
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 87 / 251 The Normal Distribution
For any normal distribution:
About 68% of population is in µ ± σ .
About 95% of population is in µ ± 2σ .
About 99.7% of population is in µ ± 3σ .
The above is usually referred to as the empirical rule. Its easier to
understand with a picture: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 88 / 251 The Normal Distribution
Problem: Probabilities are found by calculating areas under the curve, but
integrating the pdf for the normal distribution is tedious (and not possible
with the “usual” methods). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 89 / 251 The Normal Distribution
Problem: Probabilities are found by calculating areas under the curve, but
integrating the pdf for the normal distribution is tedious (and not possible
with the “usual” methods).
Solution: Computers! Its really easy to do this in R. For example, if
X ∼ N (1, 4) and I want P (X ≤ 0.5) I can just type
> pnorm(q = 0.5, mean=1, sd = sqrt(4), lower.tail = TRUE) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 89 / 251 The Normal Distribution
Problem: Probabilities are found by calculating areas under the curve, but
integrating the pdf for the normal distribution is tedious (and not possible
with the “usual” methods).
Solution: Computers! Its really easy to do this in R. For example, if
X ∼ N (1, 4) and I want P (X ≤ 0.5) I can just type
> pnorm(q = 0.5, mean=1, sd = sqrt(4), lower.tail = TRUE)
Problem: So... how do I calculate probabilities for HW/Quizzes/Tests? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 89 / 251 The Normal Distribution
Problem: Probabilities are found by calculating areas under the curve, but
integrating the pdf for the normal distribution is tedious (and not possible
with the “usual” methods).
Solution: Computers! Its really easy to do this in R. For example, if
X ∼ N (1, 4) and I want P (X ≤ 0.5) I can just type
> pnorm(q = 0.5, mean=1, sd = sqrt(4), lower.tail = TRUE)
Problem: So... how do I calculate probabilities for HW/Quizzes/Tests?
Solution: We’ll use a table. Depending on what the problem wants, we’ll
manipulate the problem in the number on a table will give us our
probabilities. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 89 / 251 The Normal Distribution
Problem: Probabilities are found by calculating areas under the curve, but
integrating the pdf for the normal distribution is tedious (and not possible
with the “usual” methods).
Solution: Computers! Its really easy to do this in R. For example, if
X ∼ N (1, 4) and I want P (X ≤ 0.5) I can just type
> pnorm(q = 0.5, mean=1, sd = sqrt(4), lower.tail = TRUE)
Problem: So... how do I calculate probabilities for HW/Quizzes/Tests?
Solution: We’ll use a table. Depending on what the problem wants, we’ll
manipulate the problem in the number on a table will give us our
probabilities.
Problem: There are an inﬁnite number of (µ, σ 2 ) pairs. Are there an
inﬁnite number of normal probability tables? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 89 / 251 The Normal Distribution
Problem: Probabilities are found by calculating areas under the curve, but
integrating the pdf for the normal distribution is tedious (and not possible
with the “usual” methods).
Solution: Computers! Its really easy to do this in R. For example, if
X ∼ N (1, 4) and I want P (X ≤ 0.5) I can just type
> pnorm(q = 0.5, mean=1, sd = sqrt(4), lower.tail = TRUE)
Problem: So... how do I calculate probabilities for HW/Quizzes/Tests?
Solution: We’ll use a table. Depending on what the problem wants, we’ll
manipulate the problem in the number on a table will give us our
probabilities.
Problem: There are an inﬁnite number of (µ, σ 2 ) pairs. Are there an
inﬁnite number of normal probability tables?
Solution: No. We’ll translate every normal distribution into a Normal
Distribution with µ = 0, σ 2 = 1. Note: This is called the standard normal
distribution and usually expressed as Z .
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 89 / 251 The Normal Distribution
Translate every normal distribution into a Normal Distribution with
µ = 0, σ 2 = 1. How does that work? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 90 / 251 The Normal Distribution
Translate every normal distribution into a Normal Distribution with
µ = 0, σ 2 = 1. How does that work? Theorem
Let X ∼ N (µ, σ 2 ). If Z = X −µ
σ, then Z ∼ N (0, 1). Proof: Transformations of random variables are covered in higher level stat
classes. Just go with it. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 90 / 251 The Normal Distribution
Translate every normal distribution into a Normal Distribution with
µ = 0, σ 2 = 1. How does that work? Theorem
Let X ∼ N (µ, σ 2 ). If Z = X −µ
σ, then Z ∼ N (0, 1). Proof: Transformations of random variables are covered in higher level stat
classes. Just go with it.
Here are some examples:
(1) P (Z ≤ 0.5) = 0.6915
(2) P (Z ≥ −0.5) = 1 − P (Z ≤ −0.5) = 1 − 0.3085 = 0.6915 or... Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 90 / 251 The Normal Distribution
Translate every normal distribution into a Normal Distribution with
µ = 0, σ 2 = 1. How does that work? Theorem
Let X ∼ N (µ, σ 2 ). If Z = X −µ
σ, then Z ∼ N (0, 1). Proof: Transformations of random variables are covered in higher level stat
classes. Just go with it.
Here are some examples:
(1) P (Z ≤ 0.5) = 0.6915
(2) P (Z ≥ −0.5) = 1 − P (Z ≤ −0.5) = 1 − 0.3085 = 0.6915 or... use
symmetry
(3) P (−1.96 ≤ Z ≤ 1.96) = P (Z ≤ 1.96) − P (Z ≤ −1.96) =
0.975 − 0.025 = 0.95
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 90 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
(5) P (−5 ≤ X ≤ 15) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
(5) P (−5 ≤ X ≤ 15) = P Rob Gordon (University of Florida) −5−5
10 ≤ X −5
10 ≤ STA 3032 (7661) 15−5
10 = Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
5−
−
(5) P (−5 ≤ X ≤ 15) = P −10 5 ≤ X105 ≤ 15−5 = P (−1 ≤ Z ≤ 1) =
10
P (Z ≤ 1) − P (Z ≤ −1) = 0.8413 − 0.1587 = 0.6826 or..... Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
5−
−
(5) P (−5 ≤ X ≤ 15) = P −10 5 ≤ X105 ≤ 15−5 = P (−1 ≤ Z ≤ 1) =
10
P (Z ≤ 1) − P (Z ≤ −1) = 0.8413 − 0.1587 = 0.6826 or..... Just use
the empirical rule! (6) P (Z ≤ k ) = 0.1762. Find k . Answer: k = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
5−
−
(5) P (−5 ≤ X ≤ 15) = P −10 5 ≤ X105 ≤ 15−5 = P (−1 ≤ Z ≤ 1) =
10
P (Z ≤ 1) − P (Z ≤ −1) = 0.8413 − 0.1587 = 0.6826 or..... Just use
the empirical rule! (6) P (Z ≤ k ) = 0.1762. Find k . Answer: k = − 0.93
(7) Let X ∼ N (5, 100). Suppose P (X ≥ k ) = 0.8531. Find k . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
5−
−
(5) P (−5 ≤ X ≤ 15) = P −10 5 ≤ X105 ≤ 15−5 = P (−1 ≤ Z ≤ 1) =
10
P (Z ≤ 1) − P (Z ≤ −1) = 0.8413 − 0.1587 = 0.6826 or..... Just use
the empirical rule! (6) P (Z ≤ k ) = 0.1762. Find k . Answer: k = − 0.93
(7) Let X ∼ N (5, 100). Suppose P (X ≥ k ) = 0.8531. Find k .
P (X ≥ k ) Rob Gordon = (University of Florida) P X −5
k −5
≥
10
10 STA 3032 (7661) =P Z ≥ k −5
10 = 0.8531 Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
5−
−
(5) P (−5 ≤ X ≤ 15) = P −10 5 ≤ X105 ≤ 15−5 = P (−1 ≤ Z ≤ 1) =
10
P (Z ≤ 1) − P (Z ≤ −1) = 0.8413 − 0.1587 = 0.6826 or..... Just use
the empirical rule! (6) P (Z ≤ k ) = 0.1762. Find k . Answer: k = − 0.93
(7) Let X ∼ N (5, 100). Suppose P (X ≥ k ) = 0.8531. Find k .
P (X ≥ k ) Rob Gordon X −5
k −5
k −5
≥
=P Z ≥
10
10
10
k −5
⇒ 1−P Z ≤
= 0.8531
10
= (University of Florida) P STA 3032 (7661) = 0.8531 Fall 2011 91 / 251 Examples: Normal Distribution
(4) Let X ∼ N (5, 100. Find P (X ≤ 0).
−
−
P (X ≤ 0) = P X105 ≤ 0105 = P (Z ≤ −0.5) = 0.3085
5−
−
(5) P (−5 ≤ X ≤ 15) = P −10 5 ≤ X105 ≤ 15−5 = P (−1 ≤ Z ≤ 1) =
10
P (Z ≤ 1) − P (Z ≤ −1) = 0.8413 − 0.1587 = 0.6826 or..... Just use
the empirical rule! (6) P (Z ≤ k ) = 0.1762. Find k . Answer: k = − 0.93
(7) Let X ∼ N (5, 100). Suppose P (X ≥ k ) = 0.8531. Find k .
P (X ≥ k ) Rob Gordon X −5
k −5
k −5
≥
=P Z ≥
10
10
10
k −5
⇒ 1−P Z ≤
= 0.8531
10
k −5
⇒ P Z≤
= 0.1469
10
= (University of Florida) P STA 3032 (7661) = 0.8531 Fall 2011 91 / 251 Examples Continued k −5
10
⇒ k = −5.5.
⇒ P Z≤ = 0.1469 ⇒ k −5
= −1.05
10 Don’t we only care about calculating probabilities? Why would we ever
care about going backwards? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 92 / 251 Examples Continued k −5
10
⇒ k = −5.5.
⇒ P Z≤ = 0.1469 ⇒ k −5
= −1.05
10 Don’t we only care about calculating probabilities? Why would we ever
care about going backwards?
This will become clear once we move from talking about probability to
talking about statistics. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 92 / 251 The Normal Table
On a test/quiz I will only provide the positive part of a normal table. How
does this change your problem solving techniques?
If your question is of the form P (Z ≤ z ) where z ≥ 0 then it changes
nothing.
If your question is of the form P (Z ≤ z ) where z < 0 then you have
to do an extra couple steps. Note that for z < 0:
P (Z ≤ z ) = P (Z ≥ −z ) = 1 − P (Z ≤ −z ) Homework:
Cont. Uniform Distribution: Section 6.3 # 15(a,b), 19, 21(a,b), 25, 27, 29
Normal Distribution: Section 6.6 # 55, 59, 61a, 63, 65, 67 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 93 / 251 Gamma Function Deﬁnition
For r > 0 the gamma function is deﬁned by
∞ Γ(r ) = t r −1 e −t dt 0 The properties of the Gamma Function are as follows:
If r is a nonnegative integer, Γ(r ) = (r − 1)!.
For every r Γ(r + 1) = r Γ(r ).
√
Γ(1/2) = π . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 94 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. EX Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
0 Rob Gordon (University of Florida) 1
x α e −x /β =
Γ(α)β α STA 3032 (7661) Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
0 1
1
x α e −x /β =
α
Γ(α)β
Γ(α)β α ∞ x α e −x /β 0 = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
= Rob Gordon 1
1
x α e −x /β =
α
Γ(α)β
Γ(α)β α
0
α+1
Γ(α + 1)β
=
Γ(α)β α (University of Florida) STA 3032 (7661) ∞ x α e −x /β 0 Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
= Rob Gordon 1
1
x α e −x /β =
α
Γ(α)β
Γ(α)β α
0
α+1
Γ(α + 1)β
αΓ(α)β α β
=
=
Γ(α)β α
Γ(α)β α (University of Florida) STA 3032 (7661) ∞ x α e −x /β 0 Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
= Rob Gordon ∞
1
1
x α e −x /β =
x α e −x /β
Γ(α)β α
Γ(α)β α 0
0
Γ(α + 1)β α+1
αΓ(α)β α β
α)β α β
Γ(α
=
=
=
Γ(α α
Γ(α)β α
Γ(α)β α
)β (University of Florida) STA 3032 (7661) Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
= Rob Gordon ∞
1
1
x α e −x /β =
x α e −x /β
Γ(α)β α
Γ(α)β α 0
0
Γ(α + 1)β α+1
αΓ(α)β α β
α)β α β
Γ(α
=
=
= αβ
Γ(α α
Γ(α)β α
Γ(α)β α
)β (University of Florida) STA 3032 (7661) Fall 2011 95 / 251 Gamma Distribution
Deﬁnition
The pdf of the gamma distribution with parameters α > 0 and β > 0 is
f (x ; α, β ) = 1
x α−1 e −x /β
Γ(α)β α (29) where x > 0 and f (x ; α, β ) = 0 otherwise. ∞ EX =
= ∞
1
1
x α e −x /β =
x α e −x /β
Γ(α)β α
Γ(α)β α 0
0
Γ(α + 1)β α+1
αΓ(α)β α β
α)β α β
Γ(α
=
=
= αβ
Γ(α α
Γ(α)β α
Γ(α)β α
)β Var (X ) = αβ 2 (Verify for HW) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 95 / 251 Gamma Distribution
Special Cases of the Gamma Distribution
β = 2: ChiSquare Distribution (We’ll talk more about this later)
α = 1: Exponential Distribution (Section 6.4)
Applications
Exponential Distribution: model waiting time between 1 Poisson
event and another
When α is an integer, the Gamma Distribution models the time
between α Poisson events Theorem
If X1 , X2 , . . . , Xn ∼ independent Exp (β ), then
n
i =1 Xi Rob Gordon (University of Florida) ∼ Gamma(n, β ) STA 3032 (7661) Fall 2011 96 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours.
P (X > 12) Rob Gordon (University of Florida) = STA 3032 (7661) Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours.
∞ P (X > 12) =
12 Rob Gordon (University of Florida) 1
x α−1 e −x /β dx =
Γ(α)β α STA 3032 (7661) Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours.
∞ P (X > 12) =
12 Rob Gordon (University of Florida) 1
x α−1 e −x /β dx =
Γ(α)β α STA 3032 (7661) ∞
12 1
x 3−1 e −x /2 dx
Γ(3)23 Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours.
∞ P (X > 12) =
12 = Rob Gordon (University of Florida) 1
16 1
x α−1 e −x /β dx =
Γ(α)β α ∞
12 1
x 3−1 e −x /2 dx
Γ(3)23 ∞ x 2 e −x /2 dx
12 STA 3032 (7661) Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours.
∞ P (X > 12) =
12 =
= Rob Gordon (University of Florida) 1
x α−1 e −x /β dx =
Γ(α)β α ∞
1
x 2 e −x /2 dx
16 12
∞
1
x 2 e −x /2 (−2) + 2
16
12 STA 3032 (7661) ∞
12 1
x 3−1 e −x /2 dx
Γ(3)23 ∞ e −x /2 (2x )dx
12 Fall 2011 97 / 251 Gamma Distribution: Examples
(1) In a certain city the daily consumption of electric power, in millions of
kWhours, is X ∼ Gamma with mean = 6 and variance = 12.
(a) What are α and β ?
Recall µ = αβ and σ 2 = αβ 2 . Then we have two equations and two
unknowns.
αβ = 6 ⇒ 6β = 12 ⇒ β = 2, α = 3.
(b) Find the probability that on any given day the daily power consumption
will exceed 12 million kWhours.
∞ P (X > 12) =
12 =
=
= Rob Gordon (University of Florida) 1
x α−1 e −x /β dx =
Γ(α)β α ∞
1
x 2 e −x /2 dx
16 12
∞
1
x 2 e −x /2 (−2) + 2
16
12
· · · = 0.06. STA 3032 (7661) ∞
12 1
x 3−1 e −x /2 dx
Γ(3)23 ∞ e −x /2 (2x )dx
12 Fall 2011 97 / 251 Gamma Distribution: More Examples
(2) The length of time for 1 individual to be served at a cafeteria is a RV
having the exponential distribution with a mean of 4 minutes. What
is the probability that a person is served in less than 3 minutes on at
least 4 or the next 6 days? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 98 / 251 Gamma Distribution: More Examples
(2) The length of time for 1 individual to be served at a cafeteria is a RV
having the exponential distribution with a mean of 4 minutes. What
is the probability that a person is served in less than 3 minutes on at
least 4 or the next 6 days?
Let X ∼ Bin(6, p ), Y ∼ Exp (β = 4). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 98 / 251 Gamma Distribution: More Examples
(2) The length of time for 1 individual to be served at a cafeteria is a RV
having the exponential distribution with a mean of 4 minutes. What
is the probability that a person is served in less than 3 minutes on at
least 4 or the next 6 days?
Let X ∼ Bin(6, p ), Y ∼ Exp (β = 4).
3 p = P (Y ≤ 3) =
0 Rob Gordon (University of Florida) 1 −x /4
e
dx = · · · = −e −3/4 + 1 ≈ 0.53
4 STA 3032 (7661) Fall 2011 98 / 251 Gamma Distribution: More Examples
(2) The length of time for 1 individual to be served at a cafeteria is a RV
having the exponential distribution with a mean of 4 minutes. What
is the probability that a person is served in less than 3 minutes on at
least 4 or the next 6 days?
Let X ∼ Bin(6, p ), Y ∼ Exp (β = 4).
3 1 −x /4
e
dx = · · · = −e −3/4 + 1 ≈ 0.53
4
0
P (X ≥ 4) = P (X = 4) + P (X = 5) + P (X = 6)
p = P (Y ≤ 3) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 98 / 251 Gamma Distribution: More Examples
(2) The length of time for 1 individual to be served at a cafeteria is a RV
having the exponential distribution with a mean of 4 minutes. What
is the probability that a person is served in less than 3 minutes on at
least 4 or the next 6 days?
Let X ∼ Bin(6, p ), Y ∼ Exp (β = 4).
3 1 −x /4
e
dx = · · · = −e −3/4 + 1 ≈ 0.53
4
0
P (X ≥ 4) = P (X = 4) + P (X = 5) + P (X = 6)
4
6−4
6
=
1 − e −3/4
e −3/4
4
5
6−5
6
e −3/4
+
1 − e −3/4
5
6
6−6
6
+
1 − e −3/4
e −3/4
6
p = P (Y ≤ 3) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 98 / 251 Gamma Distribution: More Examples
(2) The length of time for 1 individual to be served at a cafeteria is a RV
having the exponential distribution with a mean of 4 minutes. What
is the probability that a person is served in less than 3 minutes on at
least 4 or the next 6 days?
Let X ∼ Bin(6, p ), Y ∼ Exp (β = 4).
3 1 −x /4
e
dx = · · · = −e −3/4 + 1 ≈ 0.53
4
0
P (X ≥ 4) = P (X = 4) + P (X = 5) + P (X = 6)
4
6−4
6
=
1 − e −3/4
e −3/4
4
5
6−5
6
e −3/4
+
1 − e −3/4
5
6
6−6
6
+
1 − e −3/4
e −3/4
6
≈ 0.40.
p = P (Y ≤ 3) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 98 / 251 Homework for Gamma & Exponential Homework:
Section 6.5: # 45, 46, 48, 50, 51,
Section 6.4: 31, 33, 37, 41 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 99 / 251 Beta Distribution
The Beta Distribution models Random Variables that take on values in the
interval [0, 1]. Deﬁnition
We say that a Random Variable X has a Beta Distribution if its pdf is of
the form:
f (x ) = Γ(α + β ) α−1
x
(1 − x )β −1 , 0 < x < 1, 0 elsewhere
Γ(α)Γ(β ) 1 EX = x
0 Rob Gordon Γ(α + β ) α−1
Γ(α + β )
x
(1 − x )β −1 dx =
Γ(α)Γ(β )
Γ(α)Γ(β ) (University of Florida) STA 3032 (7661) 1 (30) x α (1 − x )β −1 0 Fall 2011 100 / 251 Beta Distribution
The Beta Distribution models Random Variables that take on values in the
interval [0, 1]. Deﬁnition
We say that a Random Variable X has a Beta Distribution if its pdf is of
the form:
f (x ) = Γ(α + β ) α−1
x
(1 − x )β −1 , 0 < x < 1, 0 elsewhere
Γ(α)Γ(β ) 1 EX =
= Rob Gordon Γ(α + β ) α−1
Γ(α + β )
x
(1 − x )β −1 dx =
Γ(α)Γ(β )
Γ(α)Γ(β )
0
Γ(α + β ) Γ(α + 1)Γ(β )
=
Γ(α)Γ(β ) Γ(α + β + 1) 1 x (University of Florida) STA 3032 (7661) (30) x α (1 − x )β −1 0 Fall 2011 100 / 251 Beta Distribution
The Beta Distribution models Random Variables that take on values in the
interval [0, 1]. Deﬁnition
We say that a Random Variable X has a Beta Distribution if its pdf is of
the form:
f (x ) = Γ(α + β ) α−1
x
(1 − x )β −1 , 0 < x < 1, 0 elsewhere
Γ(α)Γ(β ) 1 EX =
= Rob Gordon Γ(α + β ) α−1
Γ(α + β )
x
(1 − x )β −1 dx =
Γ(α)Γ(β )
Γ(α)Γ(β )
0
Γ(α + β ) Γ(α + 1)Γ(β )
α
=
Γ(α)Γ(β ) Γ(α + β + 1)
α+β 1 x (University of Florida) STA 3032 (7661) (30) x α (1 − x )β −1 0 Fall 2011 100 / 251 Beta Distribution
The Beta Distribution models Random Variables that take on values in the
interval [0, 1]. Deﬁnition
We say that a Random Variable X has a Beta Distribution if its pdf is of
the form:
f (x ) = Γ(α + β ) α−1
x
(1 − x )β −1 , 0 < x < 1, 0 elsewhere
Γ(α)Γ(β ) 1 EX =
= Var (X ) =
Rob Gordon Γ(α + β ) α−1
Γ(α + β )
x
(1 − x )β −1 dx =
Γ(α)Γ(β )
Γ(α)Γ(β )
0
Γ(α + β ) Γ(α + 1)Γ(β )
α
=
Γ(α)Γ(β ) Γ(α + β + 1)
α+β
αβ
(Prove for HW)
(α + β )2 (α + β + 1) 1 x (University of Florida) STA 3032 (7661) (30) x α (1 − x )β −1 0 Fall 2011 100 / 251 Beta Example
(#85b) The proportion of pure iron in certain ore samples has a beta
distribution with α = 3 and β = 1. Find the probability that two out of
three randomly selected samples will have less than 30% pure iron. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 101 / 251 Beta Example
(#85b) The proportion of pure iron in certain ore samples has a beta
distribution with α = 3 and β = 1. Find the probability that two out of
three randomly selected samples will have less than 30% pure iron.
Let X ∼ Beta(α = 3, β = 1) and Y Bin(n = 3, p = P (X < 0.3)) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 101 / 251 Beta Example
(#85b) The proportion of pure iron in certain ore samples has a beta
distribution with α = 3 and β = 1. Find the probability that two out of
three randomly selected samples will have less than 30% pure iron.
Let X ∼ Beta(α = 3, β = 1) and Y Bin(n = 3, p = P (X < 0.3))
0.3 p = P (X < 0.3) =
0 = Γ(4)
Γ(3)Γ(1) = 3! x 3
2!1! 3 P (Y = 2) = Rob Gordon (University of Florida) 3
2 0 .3 x 2 dx
0 3
10 0.3
0 Γ(3 + 1) 3−1
x
(1 − x )1−1 dx
Γ(3)Γ(1) = 27
1000 3 2 1− STA 3032 (7661) = 27
1000 27
1000 3−2 ≈ 0.002128 Fall 2011 101 / 251 More Continuous Distributions
Homework: Section 6.8:At least # 78, 79, 83
Recall from previous slides that the Gamma and Exponential Distributions
are used to model lifetimes. Two other distributions mentioned in your
book are used to model lifetime distributions. They are
Lognormal Distribution (Section 6.7)
Weibull (Section 6.9)
They work similar to previous distributions covered in class and aren’t very
interesting. You may ﬁnd yourself using these distributions in your further
studies/job, but for the sake of time we’ll skip them. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 102 / 251 More Continuous Distributions
Homework: Section 6.8:At least # 78, 79, 83
Recall from previous slides that the Gamma and Exponential Distributions
are used to model lifetimes. Two other distributions mentioned in your
book are used to model lifetime distributions. They are
Lognormal Distribution (Section 6.7)
Weibull (Section 6.9)
They work similar to previous distributions covered in class and aren’t very
interesting. You may ﬁnd yourself using these distributions in your further
studies/job, but for the sake of time we’ll skip them.
This marks the end of the material for Exam 1.
Decide: Take Exam 1 on Monday Sept 26 or Wednesday Sept 28.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 102 / 251 Chapter 7
So far we’ve only dealt with 1 Random Variable at a time. In order to
accomplish anything in statistics we’ll need to learn how to handle multiple
random variables at a time.
We’ll start by taking the notion of multiple events and generalizing it to
multiple Random Variables. (1)
(2) Chapter 4
P (A ∩ B )
A, B independent if
P (A ∩ B ) = P (A)P (B ) (3) Rob Gordon P (AB ) = P (A∩B )
P (B ) (University of Florida) Chapter 7
fX ,Y (x , y ) = P (X = x , Y = y )
X , Y are ind RVs if
P(X=x, Y=y) = P(X=x)P(Y=y)
P (X = x Y = y ) = STA 3032 (7661) fX ,Y (x ,y )
fY (y ) Fall 2011 103 / 251 Chapter 7
Let’s describe each one of these 3 features in greater depth.
(1) If X , Y are discrete the joint pmf of X and Y is
f X ,Y ( x , y ) = P ( X = x , Y = y )
The marginal pmfs can be calculated from the joint pmf:
P (X = x ) = fX ,Y (x , y ), P (Y = y ) =
y fX ,Y (x , y )
x For the continuous case, we just say
fX (x ) = fX ,Y (x , y )dy and fY (y ) =
y Rob Gordon (University of Florida) fX ,Y (x , y )dx
x STA 3032 (7661) Fall 2011 104 / 251 Chapter 7 We say that fX ,Y (x , y ) is a valid joint pmf if fX ,Y (x , y ) ∈ [0, 1] and
fX ,Y (x , y ) = 1.
x y Similarly, it is a valid joint pdf if fX ,Y (x , y ) > 0 and
fX ,Y (x , y )dydx = 1.
x Rob Gordon (University of Florida) y STA 3032 (7661) Fall 2011 105 / 251 Chapter 7 Example
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise
(a) Is f a valid joint pdf? Let fX ,Y (x , y ) = 4xydydx
x Rob Gordon = y (University of Florida) STA 3032 (7661) Fall 2011 106 / 251 Chapter 7 Example
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise
(a) Is f a valid joint pdf? Let fX ,Y (x , y ) = 1 1 4xydydx
x Rob Gordon y (University of Florida) =4 x
0 ydy dx =
0 STA 3032 (7661) Fall 2011 106 / 251 Chapter 7 Example
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise
(a) Is f a valid joint pdf? Let fX ,Y (x , y ) = 1 1 4xydydx
x Rob Gordon y (University of Florida) =4 x
0 1 ydy dx = 4
0 STA 3032 (7661) x
0 y2
2 1
0 Fall 2011 dx 106 / 251 Chapter 7 Example
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise
(a) Is f a valid joint pdf? Let fX ,Y (x , y ) = 1 1 4xydydx
x =4 x
0 y 1 ydy dx = 4
0 x
0 y2
2 1
0 dx 1 =2 xdx = 1.
0 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 106 / 251 Chapter 7 Example
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise
(a) Is f a valid joint pdf? Let fX ,Y (x , y ) = 1 1 4xydydx
x =4 x
0 y 1 ydy dx = 4
0 x
0 y2
2 1
0 dx 1 =2 xdx = 1.
0 (b) Find P X ≤ 1 , Y ≤
2 1
3 1
1
P X ≤ ,Y ≤
2
3 Rob Gordon (University of Florida) = STA 3032 (7661) Fall 2011 106 / 251 Chapter 7 Example
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise
(a) Is f a valid joint pdf? Let fX ,Y (x , y ) = 1 1 4xydydx
x =4 x
0 y 1 ydy dx = 4
0 x
0 y2
2 1
0 dx 1 =2 xdx = 1.
0 (b) Find P X ≤ 1 , Y ≤
2 1
3 1
1
P X ≤ ,Y ≤
2
3 Rob Gordon (University of Florida) 1/2 1/3 4xydydx = · · · = =
x =0 STA 3032 (7661) y =0 1
36 Fall 2011 106 / 251 Chapter 7 Example Let fX ,Y (x , y ) = 4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise (c) Find fX (x ) and fY (y ), the marginal distributions of X and Y .
1 fX (x ) =
y =0 Rob Gordon 1 fX ,Y (x , y )dy = (University of Florida) 4xydy = 4x
y =0 STA 3032 (7661) y2
2 1
0 = 2x Fall 2011 107 / 251 Chapter 7 Example Let fX ,Y (x , y ) = 4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0,
otherwise (c) Find fX (x ) and fY (y ), the marginal distributions of X and Y .
1 fX (x ) = 1 fX ,Y (x , y )dy =
y =0
1 4xydy = 4x
y =0 y2
2 1
0 = 2x fX ,Y (x , y )dx = · · · = 2y . fY (y ) =
x =0 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 107 / 251 Chapter 7
(2) Now let us describe in detail what we mean by the notion of
independence of Random Variables. Theorem
X and Y are independent if and only if
P (X ∈ I1 , Y ∈ I2 ) = P (X ∈ I1 )P (X ∈ I2 ) where I1 and I2 are subsets of
the support of X and Y respectively. In other words X and Y are independent if their joint distribution
factors into the product of their marginals Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 108 / 251 Chapter 7
(2) Now let us describe in detail what we mean by the notion of
independence of Random Variables. Theorem
X and Y are independent if and only if
P (X ∈ I1 , Y ∈ I2 ) = P (X ∈ I1 )P (X ∈ I2 ) where I1 and I2 are subsets of
the support of X and Y respectively. In other words X and Y are independent if their joint distribution
factors into the product of their marginals, i.e.
fX ,Y (x , y ) = fX (x )fY (y ). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 108 / 251 Independence
Caution: Pay attention to where the Random Variables “live” when
determining if they are independent.
We saw earlier that
4xy , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX ,Y (x , y ) =
0,
otherwise
can be factored into a product of 2 marginals.
However, we can’t do that with
8xy , 0 < x < y < 1
g X ,Y ( x , y ) =
0,
otherwise
In this case we would say that X and Y are dependent. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 109 / 251 Independence & Dependence
We can describe the relationship between variables the following way: Deﬁnition
The covariance, sometimes denoted σXY , is given by the following:
Cov (X , Y ) = EXY − EXEY (= E [(X − EX )(Y − EY )])
xyfX ,Y (x , y )dydx − =
x y xfX (x )dx
x yfY (y )dy
y To get a unitless measure of the dependence of two variables we use: Deﬁnition
The correlation or correlation coeﬃcient, denoted ρ, between two
random variables X and Y is given by
ρ= Rob Gordon (University of Florida) Cov (X , Y )
Var (X )Var (Y )
STA 3032 (7661) = σXY
σX σY (31) Fall 2011 110 / 251 Independence & Dependence
What happens to the covariance if X and Y are independent? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 111 / 251 Independence & Dependence
What happens to the covariance if X and Y are independent?
Recall that if X and Y are independent then the joint pmf(pdf) is just the
product of the marginals. That means we can say:
xyfX ,Y (x , y )dydx − Cov (X , Y ) =
x = y y yfY (y )dy
y yfY (y )dy − xfX (x )dx
x xfX (x )dx
x xfX (x )dx
x yfY (y )dy = 0
y So there we have it: X , Y independent ⇒ Cov (X , Y ) = 0.
If Cov (X , Y ) = 0 can we say that X and Y are independent? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 111 / 251 Independence & Dependence
What happens to the covariance if X and Y are independent?
Recall that if X and Y are independent then the joint pmf(pdf) is just the
product of the marginals. That means we can say:
xyfX ,Y (x , y )dydx − Cov (X , Y ) =
x = y y yfY (y )dy
y yfY (y )dy − xfX (x )dx
x xfX (x )dx
x xfX (x )dx
x yfY (y )dy = 0
y So there we have it: X , Y independent ⇒ Cov (X , Y ) = 0.
If Cov (X , Y ) = 0 can we say that X and Y are independent?
No! The fact that the covariance is 0 has no bearing on whether or not
the joint pmf(pdf) can be factored into separate marginals (see Table 7.4
on page 348 of the textbook for details).
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 111 / 251 How does Independence aﬀect Expectations and Variances
of Sums of RVs?
Let X and Y be arbitrary Random Variables.
E (X + Y ) = E (X ) + E (Y )
Var (X + Y ) = E (X + Y )2 − [E (X + Y )]2
= E X 2 + Y 2 + 2XY − (EX )2 + (EY )2 + 2EXEY Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 112 / 251 How does Independence aﬀect Expectations and Variances
of Sums of RVs?
Let X and Y be arbitrary Random Variables.
E (X + Y ) = E (X ) + E (Y )
Var (X + Y ) = E (X + Y )2 − [E (X + Y )]2
= E X 2 + Y 2 + 2XY − (EX )2 + (EY )2 + 2EXEY
= E X 2 − [EX ]2 + E Y 2 − [EY ]2 +2 [EXY − EXEY ]
Var (X ) Rob Gordon (University of Florida) Var (Y ) STA 3032 (7661) Cov (X ,Y ) Fall 2011 112 / 251 How does Independence aﬀect Expectations and Variances
of Sums of RVs?
Let X and Y be arbitrary Random Variables.
E (X + Y ) = E (X ) + E (Y )
Var (X + Y ) = E (X + Y )2 − [E (X + Y )]2
= E X 2 + Y 2 + 2XY − (EX )2 + (EY )2 + 2EXEY
= E X 2 − [EX ]2 + E Y 2 − [EY ]2 +2 [EXY − EXEY ]
Var (X )
2 Var (Y ) Cov (X ,Y ) 2 Var (aX + bY ) = a Var (X ) + b Var (Y ) + 2abCov (X , Y ) If X and Y are independent, the covariance term disappears and we get
Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 112 / 251 More on Independence
Independence is also an assumption for the following facts:
iid Let X1 , X2 , . . . Xn ∼ Bernoulli (p ). Then Y = Rob Gordon (University of Florida) STA 3032 (7661) n
i =1 Xi ∼ Bin(n, p ) Fall 2011 113 / 251 More on Independence
Independence is also an assumption for the following facts:
iid Let X1 , X2 , . . . Xn ∼ Bernoulli (p ). Then Y =
iid Let X1 , X2 , . . . Xk ∼ Geo (p ). Then Y = Rob Gordon (University of Florida) STA 3032 (7661) n
i =1 Xi k
i =1 Xi ∼ Bin(n, p ) ∼ NegBin(k , p ) Fall 2011 113 / 251 More on Independence
Independence is also an assumption for the following facts:
iid Let X1 , X2 , . . . Xn ∼ Bernoulli (p ). Then Y =
iid Let X1 , X2 , . . . Xk ∼ Geo (p ). Then Y =
iid Let X1 , X2 , . . . Xn ∼ Exp (β ). Then Y = Rob Gordon (University of Florida) STA 3032 (7661) n
i =1 Xi k
i =1 Xi
n
i =1 Xi ∼ Bin(n, p ) ∼ NegBin(k , p )
∼ Gamma(n, β ) Fall 2011 113 / 251 More on Independence
Independence is also an assumption for the following facts:
iid Let X1 , X2 , . . . Xn ∼ Bernoulli (p ). Then Y =
iid Let X1 , X2 , . . . Xk ∼ Geo (p ). Then Y =
iid Let X1 , X2 , . . . Xn ∼ Exp (β ). Then Y = n
i =1 Xi k
i =1 Xi
n
i =1 Xi ∼ Bin(n, p ) ∼ NegBin(k , p )
∼ Gamma(n, β ) Recall if α = (integer) and X ∼ Gamma(α/2, β = 2) then
X ∼ chisquared(α), also denoted X ∼ χ2 (α).
iid If Z1 , Z2 , . . . Zn ∼ N (0, 1) then
“degrees of freedom.” Rob Gordon (University of Florida) n
2
i =1 Zi STA 3032 (7661) ∼ χ2 (n). n is called the Fall 2011 113 / 251 Conditional pmfs & pdfs
(3) Let X and Y be two arbitrary Random Variables and I1 and I2 be two
subsets of the supports of X and Y respectively. Deﬁnition
The conditional probability density(mass) function of X given Y is the
following:
P (X ∈ I1 , Y ∈ I2 )
P (X ∈ I1 X ∈ I2 ) =
(32)
P (Y ∈ I2 )
In other words we can say
fX Y (x y ) = fX ,Y (x , y )
.
fY (y ) (33) If X and Y are independent, then fX ,Y (x y ) = fX (x ) (prove for
homework). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 114 / 251 Conditional Probability (revisited)
fX Y is a legitimate distribution where we can talk about probabilities and
expectations.
Examples: Suppose X and Y have a joint pdf given by
8xy , 0 ≤ x ≤ y ≤ 1
fX ,Y (x , y ) =
0,
otherwise
Find P (Y < 0.5X = 0.25).
P (Y < 0.5X = 0.25) = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 115 / 251 Conditional Probability (revisited)
fX Y is a legitimate distribution where we can talk about probabilities and
expectations.
Examples: Suppose X and Y have a joint pdf given by
8xy , 0 ≤ x ≤ y ≤ 1
fX ,Y (x , y ) =
0,
otherwise
Find P (Y < 0.5X = 0.25).
P (Y < 0.5X = 0.25) = = 0 .5
0.25 fX ,Y (0.25, y )dy fX (0.25)
0 .5
0.25 8(0.25)ydy
1
x 8xydy x =0.25 = 0.5
0.25 fX ,Y (0.25, y )dy
1
x fX ,Y (x , y )dy x =0.25 = · · · = 0.0125. Homework:
Section 7.3: 7.3, 7.5, 7.7, 7.9, 7.11
Section 7.4: 7.17
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 115 / 251 Conditional Probability
How do expectations work for conditional distributions?Some basic
principles are the following: Deﬁnition
If X and Y are two arbitrary Random Variables, the conditional
expectation of X given that Y = y is deﬁned to be
∞ E (X Y = y ) = xf (x y )dx .
−∞ Theorem
Let X and Y denote two arbitrary random variables. Then
E (X ) = E (E (X Y )).
The properties of conditional probability are well above the scope of this
class so we won’t focus too much on it. Just know that it is a thing and
that it exists.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 116 / 251 The Multinomial Distribution
We’ve gone over some common pmfs and pdfs for some common Random
Variables. Are there any common joint distributions?
Consider an experiment similar to that for the binomial random variable:
(ﬁxed number of trials, independent outcomes, etc), but let there be k
possible outcomes instead of just 2. Deﬁnition
The multinomial distribution is represented by the following pmf:
P (Y1 = y1 , . . . , Yk = yk ) = Rob Gordon (University of Florida) n!
y
p y1 p y2 · · · pkk
y1 !y2 ! · · · yk ! 1 2 STA 3032 (7661) Fall 2011 (34) 117 / 251 The Multinomial Distribution
The Multinomial Distribution has a few interesting qualities:
Marginally, Yi ∼ Bin(ni , pi ).
While trials in the experiment are independent, the Random Variables
are not.
This makes sense on an intuitive level since there’s no way to factor the
pmf into a product of other pmfs.
We can also prove this by showing that Cov (Yi , Yj ) = 0. The proof is
given in Example 7.17 on page 357 of the textbook.
Homework: 7.25, 7.27, 7.29 7.31 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 118 / 251 Statistics Now that some of the major foundations of probability have been
established, we can talk more about what we mean by statistics.
Many of the assumptions we made previously (even those assumptions
that were not explicitly labelled as such will be explored from here on out.)
For example...
1
Is the probability of ﬂipping a coin and getting a head really 2 ? Is the mean of a normal distribution really what you say it is?
How can we be sure that trials are independent?
Is the variance really constant throughout the entire experiment? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 119 / 251 Statistics
First let’s get some deﬁnitions out of the way: Deﬁnition
Parameters are numerical descriptive measures of a population.
Statistics are numerical descriptive measures of a sample.
The only diﬀerence between these two deﬁnitions is the scope of what
they describe.
For example if we have a population that is best described by a normal
distribution with parameters µ and σ 2 . The mean µ is a parameter
describing the mean of the population while x would be the mean of any
¯
sample that we take.
Since we can’t survey every person in a population, usually we take a
sample instead and take the mean of that sample (¯) to estimate the
x
mean of that population (µ).
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 120 / 251 Statistics
Let’s deﬁne some terms with a little more rigor than we saw on the last
slide. Deﬁnition
A statistic is a function of Random Variables. Deﬁnition
ˆ
A point estimate of some population parameter θ is a single value θ of a
ˆ
statistic Θ.
ˆ
Statistic (Θ)
¯
X=
S2 = 1
n −1 ˆ
P=
Rob Gordon 1
n ˆ
Value of Statistic (θ) Parameter Estimated (θ) 1
n µ Xi
¯
Xi − X X ∼Bin(n,p )
n (University of Florida) x=
¯
2 s2 = 1
n −1 p=
ˆ xi
(xi − x )2
¯ x =# successes
n STA 3032 (7661) σ2
p
Fall 2011 121 / 251 Estimators
Deﬁnition
ˆ
A statistic Θ is said to be an unbiased estimator of the parameter θ if
ˆ
E Θ = θ.
Examples:
iid
Let Xi ∼ N (µ, σ )2 , i = 1, . . . , n. Let Y ∼ Bin(n, p ). ¯
EX
ˆ
EP Rob Gordon (University of Florida) =E
=E 1
n
Y
n n Xi
i =1 = 1
=
n n EX1 = µ
i =1 1
E (Y ) = p .
n STA 3032 (7661) Fall 2011 122 / 251 Estimators
iid Again consider Xi ∼ N (µ, σ )2 , i = 1, . . . , n.
ES 2 =
=
=
= 1
n−1
1
n−1
1
n−1
1
n−1 n ¯
E (Xi − X )2
i =1
n ¯
E (Xi − µ + µ − X )2
i =1
n E ¯
(Xi − µ) + (µ − X ) 2 i =1
n ¯
¯
E (Xi − µ)2 + (µ − X )2 + 2(Xi − µ)(µ − X )2
i =1 = · · · = σ 2 (continue the proof for HW)
So S 2 is an unbiased estimator of σ 2 . This is why the estimate for
1
variance commonly includes n−1 : to force the estimate to be unbiased.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 123 / 251 Biased vs. Unbiased Estimators
Clearly if an estimator is not unbiased, it is “biased.” Why use a biased
estimator?
1
It certainly wouldn’t make sense to estimate µ with n+1 Xi instead of
1
Xi . In many other contexts it does make sense to use an unbiased
n
estimator. One way to choose an estimator is by picking the statistic with the smaller
MSE where
ˆ
MSEΘ = E Θ − θ
ˆ 2 ˆ
+Var Θ = E ˆ
Θ−θ 2 . “bias ” Unbiased estimators can be good choices because the bias term
disappears. Sometimes biased estimators can lead to a smaller MSE
though. Usually if you can ﬁnd an unbiased estimator with minimal
variance then you’re in good shape.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 124 / 251 Biased vs. Unbiased Estimators
Are there any other reasons to use Biased Estimators?
Actually there are plenty. Here are two examples:
Maximum Likelihood Estimator (MLE): This is literally the most likely
value of an unknown parameter given the values from the sample.
¯
X is both unbiased and the MLE of µ.
1
n −1 2
¯
(Xi − X )2 is the MLE for σ 2 .
n S =n Bayes Estimator: If you have some sort of prior knowledge with
respect to a sample, then you can apply Bayes’ Theorem to our idea
of Random Variables and come up with an estimator that way. These
estimators are almost always biased.
We may end up talking about MLEs later on if we have time, but we will
almost certainly not talk about Bayes’ Estimators. Understanding the
properties of both require upperlevel undergraduate & graduate level
statistics courses and Bayes’ estimators can spill into discussions about
Decision Theory which is well beyond the scope of this course.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 125 / 251 The Sampling Distribution
Have some more deﬁnitions: Deﬁnition
A sampling distribution is the probability distribution of a sample
statistic. The standard deviation of a statistic is known as the standard
error of the statistic.
Remember that a statistic is literally a function of Random Variables.
If a single Random Variable has its own distribution, then the combination
of a bunch of Random Variables should have a certain distribution as well.
¯
Our ﬁrst example will be a discussion of the distribution of the statistic X . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 126 / 251 Central Limit Theorem (CLT)
This is the most important result in statistics.
Simply stated: Take a sample from a population. If we take a large
enough sample, the distribution of the sample mean is normal, regardless
of the distribution it was sampled from.
Formally stated: Theorem
Let X1 , X2 , . . . , Xn be random variables from a random sample from a
population with mean µ and variance σ 2 , and let Sn = X1 + X2 + · · · + Xn
¯
and, X = Sn /n.
Then for large enough n(> 30)
Sn ∼ N (nµ, nσ 2 )
¯
X ∼ N (µ, σ 2 /n)
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 127 / 251 First some examples:
The amount of warpage in a type of wafer used in the manufacture of
integrated circuits has mean 1.3mm and standard deviation of 0.1mm. A
random sample of 200 wafers is drawn. What is the probability that the
sample mean warpage exceeds 1.305 mm?
First let’s write down the facts. Remember from the previous slide that
¯
X ∼ N (µ, σ 2 /n). Why? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 128 / 251 First some examples:
The amount of warpage in a type of wafer used in the manufacture of
integrated circuits has mean 1.3mm and standard deviation of 0.1mm. A
random sample of 200 wafers is drawn. What is the probability that the
sample mean warpage exceeds 1.305 mm?
First let’s write down the facts. Remember from the previous slide that
¯
X ∼ N (µ, σ 2 /n). Why?
¯
¯
¯
X ∼ N E X , Var X
1
¯
E (X ) = E n n=1 Xi =
i
¯
Var X = ??? Rob Gordon (University of Florida) 1
n n
i =1 EXi STA 3032 (7661) = 1
n n
i =1 µ =µ Fall 2011 128 / 251 Example Continued ¯
Var X = Var = Rob Gordon (University of Florida) 1
n 1
Var
n2 n Xi
i =1
n Xi (why?) i =1 STA 3032 (7661) Fall 2011 129 / 251 Example Continued ¯
Var X = Var = Rob Gordon (University of Florida) 1
n 1
Var
n2 n Xi
i =1
n Xi (why?) See Slide 112 i =1 STA 3032 (7661) Fall 2011 129 / 251 Example Continued ¯
Var X = Var =
= Rob Gordon 1
n (University of Florida) 1
Var
n2
1
n2 n Xi
i =1
n Xi (why?) See Slide 112 i =1 n Var (Xi ) (why?)
i =1 STA 3032 (7661) Fall 2011 129 / 251 Example Continued ¯
Var X = Var =
= Rob Gordon 1
n (University of Florida) 1
Var
n2
1
n2 n Xi
i =1
n Xi (why?) See Slide 112 i =1 n Var (Xi ) (why?) See Slide 112
i =1 STA 3032 (7661) Fall 2011 129 / 251 Example Continued ¯
Var X 1
n = Var =
= 1
Var
n2
1
n2 n Xi
i =1
n Xi n Var (Xi ) (why?) See Slide 112
i =1
n = 1
n2 = Rob Gordon (why?) See Slide 112 i =1 σ2
nσ 2
=
.
n2
n (University of Florida) σ2
i =1 STA 3032 (7661) Fall 2011 129 / 251 Example Continued
The amount of warpage in a type of wafer used in the manufacture of
integrated circuits has mean 1.3mm and standard deviation of 0.1mm. A
random sample of 200 wafers is drawn. What is the probability that the
sample mean warpage exceeds 1.305 mm? ¯
1.305 − EX X − EX
>
= P
¯
¯
Var X
Var X ¯
P X > 1.305 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 130 / 251 Example Continued
The amount of warpage in a type of wafer used in the manufacture of
integrated circuits has mean 1.3mm and standard deviation of 0.1mm. A
random sample of 200 wafers is drawn. What is the probability that the
sample mean warpage exceeds 1.305 mm? ¯
1.305 − EX X − EX
>
= P
¯
¯
Var X
Var X ¯
P X > 1.305 =P Z> 1.305 − µ
σ 2 /n 1.305 − 1.3
√
0.1/ 200
= P (Z > 0.707) = 1 − P (Z < 0.71)
= P Z> = 1 − 0.7611 = 0.2389.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 130 / 251 What’s really going on here?
First let’s review what’s happening when I talk about probabilities
associated with Statistics (functions of random variables).
The sample x1 , x2 , . . . , xn are realizations of the Random Variables
(X1 , X2 , . . . , Xn ) taking on speciﬁc values.
If I add them, it creates another Random Variable, Sn with its own
distribution.
How do we ﬁnd the distribution of Sn ? The answer is complicated and we
can’t discuss it completely. We care more about situations and conclusions
anyway so we’ll just talk about those:
If n is large enough Sn “becomes” normally distributed.
See slide 127 for the exact statement of the theorem. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 131 / 251 To be more speciﬁc:
Its not exactly correct to say Sn “becomes” normal. Its better to say that
Sn converges to a normal Random Variable in the sense that the cdfs
converge.
Here is roughly how it works:
Let Fn be the cdf of Sn .
Let Φ be the cdf of some normal Random Variable.
Then as n → ∞, Fn → Φ.
This is called convergence in distribution since the cumulative
distribution functions are converging to another distribution function.
If we take Zn = ¯
¯
X
q −E X
¯
Var (X ) = ¯
X −µ
√,
σ/ n and n > 30 then we would write d something like Zn → Z ∼ N (0, 1).
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 132 / 251 More Examples
iid Let X ∼ Bin(n, p ). Recall that if Yi ∼ Bernoulli (p ) then X = n=1 Yi .
i
We can apply the CLT in this case as well. If we have a “large enough” n
we can say the following: Theorem
Let X ∼ Bin(n, p ). If np > 15 and n(1 − p ) > 15 then the following hold
by the CLT:
X
X
ˆ
P=
n ∼ N (np , np (1 − p ))
p (1 − p )
∼ N p,
n In this case a large sample size isn’t the only thing we need to have. Let’s
see this in action:
www.stat.tamu.edu/~west/applets/binomialdemo1.html
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 133 / 251 Slight Problem
The binomial distribution is represented by a discrete pmf, yet the last
slide said we can approximate it with a continuous distribution. This leads
to some information loss sometimes when we approximate binomials with
the CLT.
General Rule: Let X ∼ Bin(n, p ). If I have to ﬁnd P (a ≤ X ≤ b ), pretend
its P a − 1 ≤ X ≤ b + 1 .
2
2 Example
A machine makes 1000 steel Orings per day. Each ring has 0.9 probability
of meeting a thickness speciﬁcation. What is the probability that fewer
than 890 Orings meet the speciﬁcation? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 134 / 251 CLT for Binomial
Example
A machine makes 1000 steel Orings per day. Each ring has 0.9 probability
of meeting a thickness speciﬁcation. What is the probability that fewer
than 890 Orings meet the speciﬁcation?
Let X ∼ Bin(1000, 0.9). Note that np = 900 and n(1 − p ) = 100. We can
apply the CLT.
≈ P (X ≤ 890 + 1/2) = P (X < 890) P X − np
np (1 − p ) ≤ 890 + 1/2 − 1000(0.9)
1000(0.9)(0.1) → P (Z ≤ −1.00) = · · · = 0.158 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 135 / 251 CLT
Some things to consider:
The book uses 10 instead of 15 as a cutoﬀ.
The book does not use the 1/2 trick in any examples.
On tests and quizzes I will ask you to check that np and n(1 − p ) are
greater than 15. I will not require the 1/2 trick.
Similarly we can say something about the Poisson distribution: Theorem
If X ∼ Poisson(λ) where λ > 15 then by the CLT
X ∼ N (λ, λ)
o Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 136 / 251 What if n < 30? If n < 30 then we can’t make inference about µ if we have no idea what
the distribution of our sample is.
Also what if we don’t know σ ? It turns out if n is large, then s → σ so we
don’t have to worry about that unless n is small. Theorem
If X1 , X2 , . . . , Xn are normally distributed and n < 30 then we say that
¯
X√
T = S /−µ follows a tdistribution with (n − 1) degrees of freedom.
n Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 137 / 251 What the heck is the tdistribution?
Consider the form of the statistic T :
T = = √
¯
¯
X − µ /(σ/ n)
X −µ
Z
√=
√
√
=
=
S /σ
S/ n
(S / n)/(σ/ n) Z
S2
σ2 n −1
n −1 Z
Q /(n − 1) where Z ∼ N (0, 1) and Q ∼chisquared(n1).
It turns out that T = Z / Q /(n − 1) has a t distribution. In other words
T is a function of 2 random variables who itself has a pdf that looks like
the following:
Γ ( ν +1
2
f (t ; ν ) = √
νπ Γ(ν/2)
Rob Gordon (University of Florida) 1+ t2
ν −(ν +1)/2 STA 3032 (7661) , −∞ < t < ∞ Fall 2011 138 / 251 More about the t distribution
Like Z , T has a bellshaped curve with “longer” tails. How do we ﬁnd
probabilities?
We could integrate the function from the last slide, but its very nasty.
Instead we’ll use a table (Table 4 in your text) just like we did for the Z
case. We’ll see how to do this shortly.
Remember that the use of the t distribution depends on if we think the
sample comes from a normal distribution. How can we assume normality if
we are not clearly told?
Recall the empirical (689599.7) rule and our deﬁnition of outliers (1.5
IQR away from median of sample). If we have potential outliers in our
data then we shouldn’t use the t distribution to estimate µ! Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 139 / 251 Using the t distribution
Example
8.22) It is known from past samples that pH of water in Bolton Creek
tends to be approximately normally distributed. The average water pH
level of water in the creek is estimated regularly by taking 12 samples from
diﬀerent parts of the creek. Assuming they represent random samples from
the creek, ﬁnd the approximate probability that the sample mean of the 12
pH measurements will be within 0.2 units of the true average pH for the
ﬁeld. The most recent sample measurements were as follows:
6.63, 6.59, 6.65, 6.67, 6.54, 6.13
6.62, 7.13, 6.68, 6.82, 7.62, 6.56
A quick calculation tells us that s ≈ 0.362. Since n = 12 < 30 and we are
told the data come from a normal distribution, we can say
¯
¯
P (X − µ ≤ 0.2) = P −0.2 ≤ X − µ ≤ 0.2
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 140 / 251 Example Continued
¯
¯
P (X − µ ≤ 0.2) = P −0.2 ≤ X − µ ≤ 0.2
¯
−0.2
X −µ
0.2
√≤
√≤ √
=P
s/ n
S/ n
s/ n
0.2
−0.2
√ ≤ T12−1 ≤
√
=P
0.362/ 12
0.362/ 12
≈ P (−1.916 ≤ T11 ≤ 1.916) =? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 141 / 251 Example Continued
¯
¯
P (X − µ ≤ 0.2) = P −0.2 ≤ X − µ ≤ 0.2
¯
−0.2
X −µ
0.2
√≤
√≤ √
=P
s/ n
S/ n
s/ n
0.2
−0.2
√ ≤ T12−1 ≤
√
=P
0.362/ 12
0.362/ 12
≈ P (−1.916 ≤ T11 ≤ 1.916) =?
Note that P (T11 > 1.916) ∈ [0.025, 0.05]
That means that P (T11 > 1.916) + P (T11 < −1.916) ∈ [0.05, 0.10]. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 141 / 251 Example Continued
¯
¯
P (X − µ ≤ 0.2) = P −0.2 ≤ X − µ ≤ 0.2
¯
−0.2
X −µ
0.2
√≤
√≤ √
=P
s/ n
S/ n
s/ n
0.2
−0.2
√ ≤ T12−1 ≤
√
=P
0.362/ 12
0.362/ 12
≈ P (−1.916 ≤ T11 ≤ 1.916) =?
Note that P (T11 > 1.916) ∈ [0.025, 0.05]
That means that P (T11 > 1.916) + P (T11 < −1.916) ∈ [0.05, 0.10].
So we can say that
P (−1.916 ≤ T11 ≤ 1.916) ∈ [0.90, 0, 95] Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 141 / 251 Example Continued
If we use software we can actually say that
P (−1.916 ≤ T11 ≤ 1.916) = 0.9183019
using the following code in R:
> v = c(6.63, 6.59, 6.65, 6.67, 6.54, 6.13, 6.62, 7.13, 6.68, 6.82, 7.62,
6.56)
> s = sqrt(var(v))
> t = 0.2/(s/sqrt(12))
> pt(t, 11)  pt(1*t, 11)
What if we weren’t explicitly told that the data was normally distributed?
What would we do? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 142 / 251 Example Continued
We could look for outliers. If we have outliers then the data probably
doesn’t come from a normal population. Let’s use R to get some summary
statistics:
> summary(v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.130
6.582
6.640
6.720
6.715
7.620
Then IQR = Q 3 − Q 1 = 6.715 − 6.582 = 0.133 and 1.5 × IQR = 0.1995.
So (Q1  1.5IQR, Q3 + 1.5IQR) = (6.582  0.1995, 6.715 + 0.1995) =
(6.3825, 6.9145)
Clearly we have some points in our data set that might qualify as outliers.
This is reﬂected in the boxplot as well. We need to only type boxplot(v) in
R to get one:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 143 / 251 6.5 7.0 7.5 Example Continued Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 144 / 251 Example Concluded
So what can we say about this example?
The homework problem on its own is good practice to see how the ttable
works.
If I was an actual scientist in the ﬁeld, I might look at this information and
question if we could actually assume that the pH values have a normal
distribution.
Remember we never want to take a short cut and just delete “bad”
numbers. While they may be outliers by our deﬁnition, we need to keep
them in our study as long as we can determine that there were no
measurement/recording errors.
Homework: 8.1  8.4 odds
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 145 / 251 More on the chisquared distribution
Suppose that X1 , X2 , . . . , Xn are independent random variables sampled
from a normal distribution with mean µ and variance σ 2 .
1
¯2
Recall that S 2 = n−1
Xi − X is itself a random variable, and thus
has a distribution function associated with it. Theorem
2 Let U = (n−1)S , with the assumptions from above. Then
σ2
U ∼ chi − squared (χ2 ) with (n − 1) degrees of freedom. The probability
density function of a χ2 random variable is given by:
f (u ) = Rob Gordon (University of Florida) Γ n −1
2 n −2
1
u 2 −1 e −u/2 , u > 0.
(n−1)/2
2 STA 3032 (7661) Fall 2011 146 / 251 Another awkward integral!
Remember that we didn’t do too much with the gamma distribution.
Depending on the parameters, the integral can get messy really quickly.
We’ll be using the χ2 distribution a lot though. Since we won’t be
integrating that nasty function, we’ll instead just use a table (Table 5 in
your book).
Before we get into an example, let’s state one quick fact about U . Recall
that the χ2 distribution has a mean equal to its degrees of freedom. In
other words....
(n − 1)
(n − 1) 2
S=
E S 2 := n − 1
2
σ
σ2
n−1 2
σ = σ2.
n−1 EU = E
⇒ E S2 Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 147 / 251 How does the table work?
Example
8.42) Ammeters produced by a certain company are marketed under the
speciﬁcation that the standard deviation of gauge readings be no larger
than 0.2 amp. Ten independent readings on a test circuit of constant
current, using one of these ammeters, gaves a sample variance of 0.065.
Does this suggest that the ammeter used does not meet the company’s
speciﬁcation? [Hint: Find the approximate probability of a sample variance
exceeding 0.065 if the true population variance is 0.04.] (n − 1) 2
(n − 1)
S > 0.065
σ2
σ2
(10 − 1)
= P U > 0.065
= P (U > 14.625)
0.04 P (S 2 > 0.065) = P Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 148 / 251 How does the table work?
Example
8.42) Ammeters produced by a certain company are marketed under the
speciﬁcation that the standard deviation of gauge readings be no larger
than 0.2 amp. Ten independent readings on a test circuit of constant
current, using one of these ammeters, gaves a sample variance of 0.065.
Does this suggest that the ammeter used does not meet the company’s
speciﬁcation? [Hint: Find the approximate probability of a sample variance
exceeding 0.065 if the true population variance is 0.04.] (n − 1) 2
(n − 1)
S > 0.065
σ2
σ2
(10 − 1)
= P U > 0.065
= P (U > 14.625)
0.04
> P (U > 14.684) = 0.1 P (S 2 > 0.065) = P Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 148 / 251 Example Continued
P (U > 14.625) > P (U > 14.684) = 0.1
The inequality doesn’t say much, but we can be sure that
P (U > 14.625) ≈ 0.1.
If we use R we can ﬁnd the exact value:
> pchisq(14.625, df = 9, lower.tail=FALSE)
[1] 0.1017651
which agrees with our intuition from above.
Homework: 8.5 odds Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 149 / 251 What if I want to compare the means of 2 Diﬀerent
Populations?
¯
If X is an estimate of µX , what do you think a good estimate of µX − µY
would be?
There are 3 diﬀerent cases that we need to consider:
Large samples
Small samples, equal variances
Small samples, unequal variances
Let’s go through the cases onebyone and do a few examples. In each
case suppose we sample randomly from 2 populations, i.e.
X1 , X2 , . . . , Xn1 , Y1 , Y2 , . . . , Yn2 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 150 / 251 Cases 1 and 2:
Theorem
Suppose the sizes of the samples, n1 and n2 , are both ≥ 25.Then
¯
¯
X − Y − (µX − µY )
2
σX
n1 + 2
σY
n2 ∼ N (0, 1) (35) Theorem
Suppose n1 and n2 are small and the variances are unknown but assumed
equal. Then if the populations are normal,
¯
¯
X − Y − ( µX − µY )
Sp
where
2
Sp =
Rob Gordon (University of Florida) 1
n1 + 1
n2 ∼ tn1 +n2 −2 (36) 2
2
(n1 − 1)SX + (n2 − 1)SY
(n1 − 1) + (n2 − 1)
STA 3032 (7661) (37)
Fall 2011 151 / 251 Case 3:
2
Sp from the last slide is called the pooled variance. Theorem
Suppose n1 and n2 are small and the variances are unknown and assumed
unequal. Then if the populations are normal,
¯
¯
X − Y − ( µX − µY )
+ 2
SY
n2 2
SX
n1 2
SX
n1 + where
ν= 2 2
(SX /n1 ) n1 −1 + 2
SY
n2 ∼ tν (38) 2 2
(SY /n2 ) 2 n2 −1 Note that ν is rarely a whole number. In these cases round ν up to get a
conservative estimate when using the ttable.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 152 / 251 Examples from the Exercises:
Example
8.48) Soil acidity is measured by a quantity called pH. A scientist wants to
estimate the diﬀerence in the average pH for two large ﬁelds using pH
measurements from randomly selected core samples. If the scientist selects
20 core samples from ﬁeld 1 and 15 core samples from ﬁeld 2,
independently of each other, ﬁnd the approximate probability that the
sample mean of the ( pH measurements for ﬁeld 1 will be larger than
40)
that for ﬁeld 2 by at least 0.5. The sample variances for pH measurements
for ﬁelds 1 and 2 are 1 and 0.8 respectively. In the past, both ﬁelds have
approximately the same mean soil acidity levels.
Let’s write down what we know so far: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 153 / 251 Examples from the Exercises:
Example
8.48) Soil acidity is measured by a quantity called pH. A scientist wants to
estimate the diﬀerence in the average pH for two large ﬁelds using pH
measurements from randomly selected core samples. If the scientist selects
20 core samples from ﬁeld 1 and 15 core samples from ﬁeld 2,
independently of each other, ﬁnd the approximate probability that the
sample mean of the ( pH measurements for ﬁeld 1 will be larger than
40)
that for ﬁeld 2 by at least 0.5. The sample variances for pH measurements
for ﬁelds 1 and 2 are 1 and 0.8 respectively. In the past, both ﬁelds have
approximately the same mean soil acidity levels.
Let’s write down what we know so far:
2
n1 = 20, s1 = 1.0
2
n2 = 15, s2 = 0.8 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 153 / 251 Example 48 continued
2
n1 = 20, s1 = 1.0
2
n2 = 15, s2 = 0.8 ¯
¯
P X1 − X2 ≥ 0.5 Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 154 / 251 Example 48 continued
2
n1 = 20, s1 = 1.0
2
n2 = 15, s2 = 0.8 ¯
¯
P X1 − X2 ≥ 0.5 Rob Gordon = P (University of Florida) ¯
¯
X1 − X2 − (µ1 − µ2 )
2
s1
n1 + STA 3032 (7661) 2
s2
n2 ≥ 0.5 − (µ1 − µ2 ) 2
s1
n1 + Fall 2011 2
s2
n2 154 / 251 Example 48 continued
2
n1 = 20, s1 = 1.0
2
n2 = 15, s2 = 0.8 ¯
¯
P X1 − X2 ≥ 0.5 = P ¯
¯
X1 − X2 − (µ1 − µ2 )
2
s1
n1 + 2
s2
n2 ≥ 0.5 − (µ1 − µ2 ) 2
s1
n1 + 2
s2
n2 What is the distribution of the thing on the left?
What is the value of the term on the right? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 154 / 251 Example 48 continued
2
n1 = 20, s1 = 1.0
2
n2 = 15, s2 = 0.8 ¯
¯
P X1 − X2 ≥ 0.5 = P ¯
¯
X1 − X2 − (µ1 − µ2 )
2
s1
n1 + 2
s2
n2 ≥ 0.5 − (µ1 − µ2 ) 2
s1
n1 + 2
s2
n2 What is the distribution of the thing on the left?
What is the value of the term on the right? 0.5 − (0) = P Tν ≥
= P (Tν ≥ 1.555428)
1.0
+ 0.8
20
15
ν= 2
s1
n1
2 2
(sX /n1 ) n1 −1
Rob Gordon (University of Florida) + 2
s2
n2 2
2 + 2
(sY /n2 ) n2 −1 STA 3032 (7661) = (1/20 + 0.8/15)2
(1/20)2
20−1 + (0.8/15)2
15−1 = 31.897 Fall 2011 154 / 251 Example 48 continued ¯
¯
P X1 − X2 ≥ 0.5 = P (Tν ≥ 1.555428)
where ν = 31.897
≈ P (T32 ≥ 1.555428)
∈ [0.05, 0.10] Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 155 / 251 One more example
Example
8.52) The service times for customers coming through a checkout counter
in a retail store are independent random variables with a mean of 15
minutes and a variance of 4. At the end of the work day, the manager
selects independently a random sample of 100 customers each served by
checkout counters A and B. Approximate the probability that the sample
mean service time for counter A is lower than that for counter B by 5
minutes.
So far we know the following:
µ = 15, σ 2 = 4
nA = nB = 100 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 156 / 251 Exercise 52 continued
µ = 15, σ 2 = 4
nA = nB = 100 ¯
¯
P XB − XA ≥ 5 Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 157 / 251 Exercise 52 continued
µ = 15, σ 2 = 4
nA = nB = 100 ¯
¯
P XB − XA ≥ 5 Rob Gordon =P (University of Florida) ¯
¯ XB − XA − (µB − µA )
2
σB
nB + STA 3032 (7661) 2
σA
nA ≥ 5 − (µB − µA ) 2
σB
nB + 2
σA
nA Fall 2011 157 / 251 Exercise 52 continued
µ = 15, σ 2 = 4
nA = nB = 100 ¯
¯
P XB − XA ≥ 5 =P ¯
¯ XB − XA − (µB − µA )
2
σB
nB = P Z ≥ + 2
σA
nA + 5 − (µB − µA ) 2
σB
nB + 2
σA
nA 5−0
4
100 ≥ 4
100 = P (Z ≥ 17.67767)
≈ 0. Homework: 8.47, 49, 51, 53
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 157 / 251 Dependent Samples
In the last section we talked about diﬀerences between means for 3 cases
of independent samples.
In certain situations we can also say something about the diﬀerence of two
means between 2 dependent samples. (See the examples in the textbook). Theorem
Consider the Random Variables Xi , Yi , Di , where i = 1, . . . , n and
Di = Xi − Yi . If Xi and Yi are drawn from a normal distribution, then
¯
D − µD
√ ∼ Tn−1
Sd / n
¯
where D = Rob Gordon Di /n, EDi = µD = µX − µY and SD = (University of Florida) STA 3032 (7661) (39)
1
n −1 ¯
Di − D . Fall 2011 158 / 251 Example
Example
Six bean plants had their carbohydrate concentrations (in percent by
weight) measured both in the shoot and in the root. The following results
were obtained:
Plant
1
2
3
4
5
6 Shoot
4.42
5.81
4.65
4.77
5.25
4.75 Root
3.66
5.51
3.91
4.47
4.69
3.93 Previous experience indicates that the shoot concentration be 0.5% more
than the root concentration. Find the probability that the shoot
measurements are greater than root measurements on average by more
than 0.45.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 159 / 251 Example Continued
Previous experience indicates that the shoot concentration be 0.5% more
than the root concentration. Find the probability that the shoot
measurements are greater than root measurements on average by more
than 0.45.
The diﬀerences are given by: 0.76 0.30 0.74 0.30 0.56 0.82.
¯
Using a calculator we ﬁnd that d = 0.58, sd ≈ 0.23.
¯
P D > 0.55 Rob Gordon = (University of Florida) STA 3032 (7661) Fall 2011 160 / 251 Example Continued
Previous experience indicates that the shoot concentration be 0.5% more
than the root concentration. Find the probability that the shoot
measurements are greater than root measurements on average by more
than 0.45.
The diﬀerences are given by: 0.76 0.30 0.74 0.30 0.56 0.82.
¯
Using a calculator we ﬁnd that d = 0.58, sd ≈ 0.23.
¯
P D > 0.55 ¯
D − µD
0.55 − µD
0.55 − 0.5
√>
√
√
= P T5 >
Sd / n
Sd / n
0.23/ 6
= P (T5 > 0.524) > P (T5 > 0.727) = 0.25.
=P In fact, P (T5 > 0.524) = 0.31. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 160 / 251 Diﬀerence of Two Proportions
Previously we applied the Central Limit Theorem to ﬁnd the limiting
ˆ
distribution of P . Using the properties of a normal distribution, we can
ˆ
ˆ
also ﬁnd the distribution for the diﬀerence P1 − P2 , i.e. the diﬀerence in
the estimates of two proportions from independent samples. Theorem
Suppose X1 ∼ Bin(n1 , p1 ) and X2 ∼ Bin(n2 , p2 ). If the assumptions for
applying the CLT for X1 and X2 both hold (see slide 133) then
ˆ
ˆ
P1 − P2 − (p1 − p2 )
p1 (1−p1 )
n1 Rob Gordon (University of Florida) + p2 (1−p2 )
n2 STA 3032 (7661) ∼ N (0, 1). (40) Fall 2011 161 / 251 Example
Example
The speciﬁcation for the pull strength of a wire that connects an
integrated circuit to its frame is 10g or more. In a sample of 85 units
made with gold wire, 68 met the speciﬁcation, and in a sample of 120
units made with aluminum wire, 105 met the speciﬁcation. Scientists at
the facility believe the true diﬀerence of proportions is about 0.5 Find the
probability that the diﬀerence in the proportions are less than 0.4
Note that p1 = 68/85 = 0.8 and p2 = 105/120 = 0.875.
ˆ
ˆ
P ˆ
ˆ
P1 − P2 < 0.4 Rob Gordon (University of Florida) = STA 3032 (7661) Fall 2011 162 / 251 Example
Example
The speciﬁcation for the pull strength of a wire that connects an
integrated circuit to its frame is 10g or more. In a sample of 85 units
made with gold wire, 68 met the speciﬁcation, and in a sample of 120
units made with aluminum wire, 105 met the speciﬁcation. Scientists at
the facility believe the true diﬀerence of proportions is about 0.5 Find the
probability that the diﬀerence in the proportions are less than 0.4
Note that p1 = 68/85 = 0.8 and p2 = 105/120 = 0.875.
ˆ
ˆ
P ˆ
ˆ
P1 − P2 < 0.4 ˆ
ˆ
= P −0.4 < P1 − P2 < 0.4 ˆ1 − P2 − 0.5
ˆ
P
= P −17.03 <
< −1.89
ˆ1 (1−p1 )
ˆ
P
ˆ
P2 (1−p2 )
ˆ
+
n1
n2
= P (−17.03 < Z < −1.89)
≈ 0.0292. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 162 / 251 Comparing Population Variances
So far we’ve studied distributions of the Statistics that estimate the means
and proportions of 1 or more populations, and the Statistic to estimate the
variance of one population. All that’s left to do for chapter 8 is to
compare two variances.
Recall previous sections where we discussed the diﬀerence in population
parameters: we estimated their diﬀerence, i.e. we estimated θ1 − θ2 with
ˆ
ˆ
Θ1 − Θ2 . We can’t take the same approve when comparing 2 variances
though.
−
When we talked about probabilities in terms of S 2 we used U = nσ21 S 2
2 distribution: a distribution deﬁned on only the positive
which had a χ
half of the Real number line. Any time we subtract two random variables, we risk getting a negative
value with positive probability. We’ll have to change our approach in order
to compare two variances.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 163 / 251 A slightly diﬀerent approach
Suppose I have two unknown numbers, say θ1 and θ2 . I could test if
θ1 > θ2
but that’s the same thing as checking if
θ1 − θ2 > 0.
We did things like this for comparing means or proportions since its easy
to ﬁnd the distribution of the diﬀerence of normal Random Variables (they
are almost always normal). For comparing variances it doesn’t work since a
diﬀerence of chisquare Random Variables doesn’t give us anything useful.
Equivalently we can use
θ1 > θ2 ⇒
Rob Gordon (University of Florida) θ1
> 1.
θ2 STA 3032 (7661) Fall 2011 164 / 251 The point of all this:
It turns out that we can throw in some constants and get the distribution
S2
of S1 . We’ll just use that to compare two population variances.
2
2 Theorem
Suppose two independent random samples from normal distributions with
2
2
respective sizes n1 and n2 yield sample variances of S1 and S2 . Let
ni −1
2
2
Ui = σ2 , i = 1, 2 where σ1 and σ2 are the variances of population 1 and
i
2 respectively. Then
U1 /(n1 − 1)
(41)
F=
U2 /(n2 − 1)
has a known sampling distribution, called an F − distribution with
ν1 = n1 − 1 and ν2 = n2 − 1 degrees of freedom.
Note: the book goes ahead and cancels out the degrees of freedom from
S 2 /σ 2
the Ui and just gives the statistic as F = S1 /σ1 . Its the same thing.
2
2
2 Rob Gordon (University of Florida) STA 3032 (7661) 2 Fall 2011 165 / 251 More about the Fdistribution
So why use the Fdistribution? Why write it in terms of Ui and not like
how the book does it?
Notice that each Ui has a chisquare distribution, so F is just a ratio of
two nonnegative quantities... meaning F is nonnegative as well. Now we
don’t have to worry about dealing with negative numbers when talking
about variances.
In the statement of our theorem, F is written as a ratio of two chisquare
random variables, each divided by its degrees of freedom. This is how the
F − distribution is constructed in all cases, not just in the speciﬁc case of
discussing the comparison of population variances. “My” deﬁnition is just
more in line with statistics literature. Also... we’ll be using it in future
chapters. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 166 / 251 More about the Fdistribution
What does the pdf of the F − distribution look like? Let d1 and d2
represent the numerator and denominator degrees of freedom respectively.
d (d1 x )d1 d2 2
(d1 x +d2 )d1 +d2 f (x ; d1 , d2 ) = xB
1 =
B
where B(α, β ) = d1 d2
2, 2 d1 d2
2, 2 d1
d2 d1
2 x d1
−1
2 d1
1+
x
d2 − d1 +d2
2 Γ(x )Γ(y )
.
Γ(x + y ) This is yet another pdf that we don’t want to integrate directly. We’ll be
using yet another table (Tables 6 and 7 in your appendix) for this
calculation. We’ll see how to use the table in an example.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 167 / 251 Using the Fdistribution
Example
Pullstrength tests on 10 soldered leads for a semi conductor device yield
the following results in pounds of fource required to rupture the bond:
19.8
18.8 12.7
11.1 13.2
14.3 16.9
17.0 10.6
12.5 Another set of 8 leads was tested after encapsulation to determine whether
the pull strength has been increased by encapsulation of the device, with
the following results:
24.9 22.8 23.6 22.1 20.4 21.6 21.8 22.5 Comment on the evidence available concerning equality of the two
population variances.
With a calculator we can see: ν1 = 10 − 1 = 9, ν2 = 8 − 1 = 7 and
2
2
s1 = 10.441, s2 = 1.846.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 168 / 251 Example Continued
With a calculator we can see: ν1 = 10 − 1 = 9, ν2 = 8 − 1 = 7 and
2
2
s1 = 10.441, s2 = 1.846. P 2
S1
>1
2
S2 2
2
2
S1 /σ1
1/σ1
>
= P (F9,7 > 1)
2
2
2
S2 /σ2
1/σ2
since we test under the initial assumption that =P population variances are equal
≈ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 169 / 251 Example Continued
With a calculator we can see: ν1 = 10 − 1 = 9, ν2 = 8 − 1 = 7 and
2
2
s1 = 10.441, s2 = 1.846. P 2
S1
>1
2
S2 2
2
2
S1 /σ1
1/σ1
>
= P (F9,7 > 1)
2
2
2
S2 /σ2
1/σ2
since we test under the initial assumption that =P population variances are equal
≈ 0.51.
Its diﬃcult to get an accurate idea of the probability without using a
computer, since now we have 3 dimensions (numerator df, denominator df
and probability) whereas we had 2 dimensions for the ttable (df and
probability) and only one dimension for the ztable(probability). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 169 / 251 Another example of using the Ftable Of course we can go forwards and backwards using the Ftable. Example
Find the value F0 such that
P (F9,7 > F0 ) = 0.05. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 170 / 251 Another example of using the Ftable Of course we can go forwards and backwards using the Ftable. Example
Find the value F0 such that
P (F9,7 > F0 ) = 0.05. We see from the α = 0.05 Ftable that F0 = 3.6767. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 170 / 251 Chapter 9: Some review
To be honest, we already covered the material from 9.1, but
it doesn’t hurt to do a little review. Table 9.1 on page 428 is a good resource:
Parameter
Estimator
¯
µ = population mean
µ = X = sample mean
ˆ
ˆ
σ 2 = S 2 = sample variance
σ 2 = population variance
p = population proportion
p = X /n = sample proportion
ˆ
¯
¯
µ1 − µ2 = diﬀ in population means
µ1 − µ2 = X1 − X2
= diﬀ in sample means
1
2
p1 − p2 = diﬀ in population proportions p1 − p2 = X1 − X2
n
n
= diﬀ in sample proportions
2
σ1
2
σ2 = ratio of two population variances 2
σ1
2
σ2 = 2
S1
2
S2 = ratio of sample variances Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 171 / 251 Topics from 9.1 we’ve discussed already:
Estimators (biased and unbiased).
Mean Squared Error (MSE)
ˆ
ˆ
MSE Θ = E Θ − θ 2 ˆ
+ Var Θ Concepts of “better” estimators Theorem
ˆ
ˆ
ˆ
If Θ1 and Θ2 are two estimators of θ, then the estimator Θ1 is considered
ˆ 2 if
a better estimator than Θ
ˆ
ˆ
MSE Θ1 ≤ MSE Θ2
Just read 9.1 to remind yourself of these concepts or just reread those
slides.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 172 / 251 Limitations of Point Estimators ¯
Consider the sample mean, X . It only takes one outlier to throw oﬀ your
estimator completely.
It is often convenient to estimate parameters with an interval estimate,
where the interval contains all the reasonable values that a parameter
could take on. We call this a Conﬁdence Interval.
We derive the conﬁdence interval using the sampling distribution of the
¯
point estimate (e.g. Z and T for X , etc.) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 173 / 251 Conﬁdence Interval Deﬁned Generally speaking: Deﬁnition
ˆ
Suppose Θ is an estimator of θ with a known sampling distribution, and
ˆ
ˆ
ˆ
we can ﬁnd two quantities that depend of θ, say, g1 Θ and g2 Θ2 ,
such that
ˆ
ˆ
P g1 Θ ≤ θ ≤ g2 Θ = 1 − α
ˆ
ˆ
where α ∈ (0, 1). Then we can say that g1 Θ , g2 Θ
forms an
interval that has probability (1 − α) of capturing the true θ. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 174 / 251 Example
Let Z ∼ N (0, 1) and α ∈ (0, 1).
Let zα/2 be the value such that P (Z ≥ zα/2 ) = α/2. Since the pdf of Z is
symmetric about 0, we can also say P (Z ≤ −zα/2 ) = α/2.
Then 1 − α = P (−zα/2 ≤ Z ≤ zα/2 )
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 175 / 251 Example
Let Z ∼ N (0, 1) and α ∈ (0, 1).
Let zα/2 be the value such that P (Z ≥ zα/2 ) = α/2. Since the pdf of Z is
symmetric about 0, we can also say P (Z ≤ −zα/2 ) = α/2.
Then 1 − α = P (−zα/2 ≤ Z ≤ zα/2 )
¯
X −µ
√ ≤ zα/2
= P −zα/2 ≤
σ/ n
Now if we just take the whole argument within P () and solve for µ we get
an interval for µ. Namely
σ
σ
x − zα/2 √ ≤ µ ≤ x + zα/2 √
¯
¯
n
n
The above is the conﬁdence interval for µ when we have a large sample
size.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 175 / 251 Caution: We need to be very careful about how we discuss conﬁdence intervals.
For example we cannot say “The probability that µ falls into its interval is
95%.” Why is that? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 176 / 251 Caution: We need to be very careful about how we discuss conﬁdence intervals.
For example we cannot say “The probability that µ falls into its interval is
95%.” Why is that? µ is a parameter (a ﬁxed unknown real number) and
not a random variable.
Well why not just get a 100% conﬁdence interval? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 176 / 251 Caution: We need to be very careful about how we discuss conﬁdence intervals.
For example we cannot say “The probability that µ falls into its interval is
95%.” Why is that? µ is a parameter (a ﬁxed unknown real number) and
not a random variable.
Well why not just get a 100% conﬁdence interval?
The only way to do this would be to say that our conﬁdence interval is
(−∞, ∞) and that’s just useless. Traditionally we ﬁnd 90%, 95%, 99%
(α =0.10, 0.05, 0.01 respectively). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 176 / 251 So what’s really going on here? 100
50 Simulation 150 200 95% Confidence Bands for the mean of 100 samples of size 10 from N(0,1) 0 Includes 0
Does not include 0
3 2 1 0 1 2 3 Confidence bands Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 177 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
95% of all sample means will be between 8.9 and 11.8:
95% of samples will have averages between 8.9 and 11.8:
For 95% of all samples, µ will be between 8.9 and 11.8:
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval:
The formula produces intervals that capture the sample mean for 95%
of all samples:
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8:
95% of samples will have averages between 8.9 and 11.8:
For 95% of all samples, µ will be between 8.9 and 11.8:
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval:
The formula produces intervals that capture the sample mean for 95%
of all samples:
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8: FALSE
95% of samples will have averages between 8.9 and 11.8:
For 95% of all samples, µ will be between 8.9 and 11.8:
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval:
The formula produces intervals that capture the sample mean for 95%
of all samples:
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8: FALSE
95% of samples will have averages between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be between 8.9 and 11.8:
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval:
The formula produces intervals that capture the sample mean for 95%
of all samples:
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8: FALSE
95% of samples will have averages between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval:
The formula produces intervals that capture the sample mean for 95%
of all samples:
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8: FALSE
95% of samples will have averages between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval: TRUE
The formula produces intervals that capture the sample mean for 95%
of all samples:
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8: FALSE
95% of samples will have averages between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval: TRUE
The formula produces intervals that capture the sample mean for 95%
of all samples: FALSE
The formula produces intervals that capture the population mean for
95% of all samples:
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Some True/False questions:
Example
Suppose a random sample of 114 students was chosen, and each student
was asked how many hours he or she studies each week. The resulting
95% conﬁdence interval for µ was (8.9, 11.8). Determine if each one of
the following statements is true or false:
95% of all students study between 8.9 and 11.8 hours per week:
FALSE
95% of all sample means will be between 8.9 and 11.8: FALSE
95% of samples will have averages between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be between 8.9 and 11.8: FALSE
For 95% of all samples, µ will be included in the resulting 95%
conﬁdence interval: TRUE
The formula produces intervals that capture the sample mean for 95%
of all samples: FALSE
The formula produces intervals that capture the population mean for
95% of all samples: TRUE
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 178 / 251 Conﬁdence Intervals
Each conﬁdence interval can be written in the following form:
(Point Estimate  Margin of Error, Point Estimate + Margin of Error) Theorem
A large random sample conﬁdence interval for µ with conﬁdence
coeﬃcient approximately (1 − α) is given by
σ
σ¯
¯
X − zα/2 √ , X + zα/2 √
n
n (42) If σ is unknown, replace it with s, the sample standard deviation, with no
serious loss of accuracy. Deﬁnition
The (1 − α)100% margin of error to estimate µ from a large sample is
σ
B = zα/2 √
n
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 179 / 251 Conﬁdence Intervals
Sometimes Conﬁdence Intervals can be very wide and as such does not
always give very valuable information. Consider what happens if we let α
vary. How does increasing/decreasing α aﬀect the width of the conﬁdence
interval?
Suppose we have a ﬁxed level of α and a ﬁxed margin of error in mind.
We can guarantee that size of the margin of error if we take a large
enough sample. Theorem
¯
The sample size for establishing a conﬁdence interval of the form X ± B
with conﬁdence coeﬃcient (1 − α) is given by
n≥ Rob Gordon (University of Florida) zα/2 σ
B STA 3032 (7661) 2 (43) Fall 2011 180 / 251 Example
Example
How many samples will it take so that a 95% Conﬁdence Interval speciﬁes
the mean to within ±25? Suppose σ = 221. (1.96)(221)
25
= 301. 2 n≥ = 300.2041 We always round up for our ﬁnal answer since we need whole numbers for
sample sizes.
We don’t round down because that would leave us with less than 95%
conﬁdence.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 181 / 251 One more thing...
One last thing before we write out all the conﬁdence intervals we’ll be
using: sometimes we are interested in only upper or lower bounds. For
example:
¯
X −µ
√ ≤ zα
1 − α = P (Z ≤ zα ) = P
σ/ n
σ
¯
⇒ µ ≥ X − zα √
n Theorem
A onesided (upper) conﬁdence interval for µ is given by
σ
¯
X − zα √ ∞
n
Similarly a onesided (lower) conﬁdence interval is given by
σ
¯
−∞, X + zα √
n
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 182 / 251 What’s left?
Essentially we’ve covered all the theory about conﬁdence intervals in
chapter 8 and all the derivations more or less look the same, and most (if
not all) are in the book. Let’s just list the conﬁdence intervals and do
some examples.
The following are Conﬁdence Intervals for single parameters:
Parameter
Details
CI
µ n > 30 σ
x ± zα/2 √n
¯ µ n < 30 & σ unknown s
x ± tα/2,ν √n
¯ normal population where ν = n − 1 p CLT σ2 normal population p ± zα/2
ˆ p (1−p )
ˆ
ˆ
n (n−1)s 2 (n−1)s 2
,
χ2 2 ,ν χ2−α/2 ,ν
α/
1 where ν = n − 1.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 183 / 251 The following are Conﬁdence Intervals for comparing 2 parameters:
Parameter
Details
CI
2
σ1
n1 µ1 − µ 2 large ind samples ¯
¯
X1 − X2 ± zα/2 µ1 − µ 2 small samples ¯
¯
X1 − X2 ± tα/2,ν 2
σ2
n2 + 2
s1
n1 + 2
s2
n2 unknown σi
„ normal ind. pops
µ1 − µ 2 pooled variances where ν = 2
s2
s1
+ n2
n1
2 «2 (s 2 /n2 )2
(s 2 /n1 )2
1
+ 2 −1
n1 −1
n2 ¯
¯
X1 − X2 ± tα/2,ν sp 1
n1 + 1
n2 where ν = n1 + n2 − 2
2
and sp = p1 − p2 CLT 2
σ1
2
σ2 normal ind. pops p1 − p2 ± zα/2
ˆ
ˆ 2
2
(n1 −1)s1 +(n2 −1)s2
n1 +n2 −2 p1 (1−p1 )
ˆ
ˆ
n1 + p2 (1−p2 )
ˆ
ˆ
n2 2
2
s2
s2
2 F1−α/2,ν1 ,ν2 , s 2 Fα/2,ν1 ,ν2
s1
1 where ν1 = n1 − 1, ν2 = n2 − 1.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 184 / 251 Example
(9.12) An important property of plastic clays is the percent of shrinkage
on drying. For a certain type of plastic clay, 45 test specimens showed an
average shrinkage percentage of 18.4 and a standard deviation of 1.2.
Estimate the true average percent of shrinkage for specimens of this type
in a 98% conﬁdence interval.
Which CI do we use? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 185 / 251 Example
(9.12) An important property of plastic clays is the percent of shrinkage
on drying. For a certain type of plastic clay, 45 test specimens showed an
average shrinkage percentage of 18.4 and a standard deviation of 1.2.
Estimate the true average percent of shrinkage for specimens of this type
in a 98% conﬁdence interval.
Which CI do we use? large sample µ CI.
n = 45, x = 18.4, σ = 1.2
¯
98% conﬁdence ⇒ 0.98 = 1 − α ⇒ α = 0.02 ⇒ α/2 = 0.01
12
σ
x ± zα/2 √n = 18.4 ± 2.326 √.45 = (17.984, 18.816)
¯
This stuﬀ is really easy! The hard part is just ﬁguring out which CI to use
and just ﬁll in the blanks from there. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 185 / 251 Example
(9.18) Careful inspection of 70 precast concrete supports to be used in a
construction project revealed 28 with hairline cracks. Estimate the true
proportion of supports of this type with cracks in a 98% conﬁdence
interval.
Which CI do we use? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 186 / 251 Example
(9.18) Careful inspection of 70 precast concrete supports to be used in a
construction project revealed 28 with hairline cracks. Estimate the true
proportion of supports of this type with cracks in a 98% conﬁdence
interval.
Which CI do we use? p CI. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 186 / 251 Example
(9.18) Careful inspection of 70 precast concrete supports to be used in a
construction project revealed 28 with hairline cracks. Estimate the true
proportion of supports of this type with cracks in a 98% conﬁdence
interval.
Which CI do we use? p CI.
Wait, can we say the CLT holds? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 186 / 251 Example
(9.18) Careful inspection of 70 precast concrete supports to be used in a
construction project revealed 28 with hairline cracks. Estimate the true
proportion of supports of this type with cracks in a 98% conﬁdence
interval.
Which CI do we use? p CI.
Wait, can we say the CLT holds? np = 28, n(1 − p ) = 70 − 28 = 42. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 186 / 251 Example
(9.18) Careful inspection of 70 precast concrete supports to be used in a
construction project revealed 28 with hairline cracks. Estimate the true
proportion of supports of this type with cracks in a 98% conﬁdence
interval.
Which CI do we use? p CI.
Wait, can we say the CLT holds? np = 28, n(1 − p ) = 70 − 28 = 42.
p = 28/70, n = 70, α = 0.02
ˆ
p ± zα/2
ˆ Rob Gordon p (1−p )
ˆ
ˆ
n = (University of Florida) 28
70 ± 2.326 (28/70)(42/70)
70 STA 3032 (7661) = (0.264, 0.536) Fall 2011 186 / 251 Example
(9.22) The warpwise breaking strength measured on ﬁve specimens of a
certain cloth gave a sample mean of 180 psi and a standard deviation of 5
psi. Estimate the true mean warpwise breaking strength for cloth of this
type in a 95% conﬁdence interval. What assumption is necessary for your
answer to be valid?
What CI do we use? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 187 / 251 Example
(9.22) The warpwise breaking strength measured on ﬁve specimens of a
certain cloth gave a sample mean of 180 psi and a standard deviation of 5
psi. Estimate the true mean warpwise breaking strength for cloth of this
type in a 95% conﬁdence interval. What assumption is necessary for your
answer to be valid?
What CI do we use? small sample µ CI
What assumption do we need? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 187 / 251 Example
(9.22) The warpwise breaking strength measured on ﬁve specimens of a
certain cloth gave a sample mean of 180 psi and a standard deviation of 5
psi. Estimate the true mean warpwise breaking strength for cloth of this
type in a 95% conﬁdence interval. What assumption is necessary for your
answer to be valid?
What CI do we use? small sample µ CI
What assumption do we need? Data comes from normal population. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 187 / 251 Example
(9.22) The warpwise breaking strength measured on ﬁve specimens of a
certain cloth gave a sample mean of 180 psi and a standard deviation of 5
psi. Estimate the true mean warpwise breaking strength for cloth of this
type in a 95% conﬁdence interval. What assumption is necessary for your
answer to be valid?
What CI do we use? small sample µ CI
What assumption do we need? Data comes from normal population.
x = 180, s = 5, n = 5, α = 0.05
¯
s
5
x ± tα/2,ν √n = 180 ± t0.025,4 √5 = (177.2236, 182.7764)
¯ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 187 / 251 Example
(9.42c) For a certain species of ﬁsh, the LC50 measurements (in parts per
million) for DDT in 12 experiments were as follows, according to the EPA:
16, 5, 21, 19, 10, 5, 8, 2, 7, 2, 4, 9
Another common insecticide, Diazinon, gave LC50 measurements of 7.8,
1.6, and 1.3 in three independent experiments. Estimate the true variance
ratio in a 90% conﬁdence interval.
What CI do we use? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 188 / 251 Example
(9.42c) For a certain species of ﬁsh, the LC50 measurements (in parts per
million) for DDT in 12 experiments were as follows, according to the EPA:
16, 5, 21, 19, 10, 5, 8, 2, 7, 2, 4, 9
Another common insecticide, Diazinon, gave LC50 measurements of 7.8,
1.6, and 1.3 in three independent experiments. Estimate the true variance
ratio in a 90% conﬁdence interval.
2
2
What CI do we use? σ1 /σ2 CI
2 = 41.27273, s 2 = 13.46333, n = 12, n = 3, α = 0.10
s1
1
2
2 2
2
s2
s2
13.46333
= 13.46333 F1−0.05,11,2 , 41.27273 F0.05,11,2
2 F1−α/2,ν1 ,ν2 , s 2 Fα/2,ν1 ,ν2
41.27273
s1
1
= 13.46333 0.2511113, 13.46333 19.40496 = (0.08191351, 6.329975)
41.27273
41.27273 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 188 / 251 End of Chapter 9 For now we’re done with chapter 9. We might go back to 9.4 later on if
time allows.
Homework: 9.2 odds, 9.3 odds
The summary in section 9.7 is very good, but covers some situations not
covered in slides (linear functions of means). You are only responsible for
the situations covered in the slides. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 189 / 251 Chapter 10: Hypothesis Testing
The ﬁrst page of the text summarizes the ideas behind Hypothesis Testing.
For homework, please read it.
Generally speaking, Hypothesis Testing is the idea of checking claims in
such a way that after looking at a sample, a claim about a population
(usually a population parameter like µ, σ, p , etc.) is either rejected or not
rejected.
Usually instead of the word “claim” we’ll use the word hypothesis. Deﬁnition
A hypothesis is a statement about the population parameter or process
characteristic. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 190 / 251 Types of hypotheses
There are two types: the null and alternative. Deﬁnition
A null hypothesis is a statement that speciﬁes a particular value (or
values) for the parameter being studied. It is denoted by H0 .
The null hypothesis represents ideas that are currently accepted as the
norm. Deﬁnition
An alternative hypothesis is a statment of the change from the null
hypothesis that the investigation is designed to check. It is denoted by Ha
(also sometimes H1 .)
Think of Ha as a new idea that contradicts the null hypothesis. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 191 / 251 The process Every hypothesis test is started by clearly stating H0 and H1 . Example
The following are generic examples of how to state hypotheses:
H0 : µ = 5
H1 : µ = 5 H0 : µ ≤ 5
H1 : µ > 5 H0 : p ≥ 0.25
H1 : p < 0.25 Note that the hypotheses never “overlap.” After a hypothesis test is over,
we should come to some concrete decision; it does’t make sense to choose
one hypothesis over the other if they both share something in common. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 192 / 251 Steps of the Hypothesis Test
1 State H0 , H1 . 2 Assume H0 is true. 3 Compute a relevant test statistic. Deﬁnition
The test statistic (TS) is the sample quantity on which the decision to
reject or not reject H0 is based.
¯
We’ve seen some test statistics already (i.e. estimate µ with X ,
estimate p with X /n where X ∼ Bin(n, p ), etc).
4 Come to some conclusion based upon the value of the test statistic. Deﬁnition
The rejection region (or critical region) is the set of values of the test
statistic that leads to the rejection of null hypothesis in favor of an
alternative hypothesis.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 193 / 251 More about conclusions...
The rejection region is just an extension of our idea regarding conﬁdence
intervals: if the conﬁdence interval gives us the place where a parameter is
likely to “live,” then the rejection region is everywhere else on the real line.
In the real world though, we might only care about making a decision one
way or another rather than writing out the entire interval of where a
parameter could be. Instead of reporting conﬁdence intervals/rejection
regions, we instead talk about something called a pvalue.
Before we formally deﬁne the pvalue, we need to discuss the basis by
which we come to conclusions: everything we do is based on minimizing
the probability that we make an error. We ﬁrst formally deﬁne the types of
errors we can make and their associated probabilities. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 194 / 251 Types of Errors
Think about the ways we can be right or wrong about anything: Do not reject H0
Reject H0 H0 true
correct
type I error H0 false
Type II error
correct denote α = P (type I error) = P (reject H0 H0 true)
β = P (type II error) = P (don’t reject H0 H0 false)
Remember that our goal is to make decisions in a way that minimizes the
probabilities of being wrong. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 195 / 251 Minimizing Errors
Often we can’t minimize both α and β at the same time (more on this
later), so we minimize the more serious error. Consider the following
example: Do not reject H0
Reject H0 H0 : Parachute Broken
correct
type I error Ha : Parachute Works
Type II error
correct One of these mistakes has a worse consequence! The more serious error
probability is given to α. When we start the experiment, we ﬁx α to be a
small value, and let β be whatever it ends up being.
Our decision to reject H0 depends on whether or not our pvalue is less
than α. Deﬁnition
The pvalue is the probability of observing a test statistic value at least as
extreme as the one computed from the sample data if H0 is true.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 196 / 251 Example
Suppose H0 : µ ≥ 5, Ha : µ < 5, x = 3, s = 1, n = 30
¯
¯
¯
p − value = P X < 3H0 true = P X < 3µ = 5
¯
3−µ
3−5
X −µ
√ < √ µ=5 =P Z < √
=P
σ/ n
σ/ n
1/ 30
= P (Z < −10.95) ≈ 3.163034e − 28
Note: The inequality in the pvalue calculation is the same one given in Ha .
How do we interpret the pvalue?
The probability of getting x = 3 or less when µ = 5 is very small. This
¯
leads us to believe that the true value of µ may actually be a number less
than 5.
Therefore, we reject H0 : µ ≥ 5 in favor of the alternative hypothesis. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 197 / 251 Hypothesis Tests Remember one very important fact when conducting a hypothesis test
(quoted from the textbook):
Note that not rejecting the hypothesis that µ = 2 is not the same as
accepting the hypothesis that µ = 2. When we do not reject the
hypothesis µ = 2, we are saying that 2 is a plausible value of µ, but there
are other equally plausible values for µ. We cannot conclude that µ is
equal to 2 and 2 alone.
So what’s the cutoﬀ? How small does the pvalue have to be so that we
are comfortable with rejecting H0 ? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 198 / 251 Hypothesis Tests
Procedure:
Fix α to be small (usually 0.05, sometimes 0.01 or 0.10).
If pvalue < α, reject H0
A quick word on notation:
H 0 : µ = µ0
H a : µ = µ0 called a 2sided test H 0 : µ ≤ µ0
Ha : µ > µ0 called a 1sided test Earlier we saw an example of a 1sided test. Let’s see an example of a
twosided test. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 199 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level?
Step 0: Write down what we know: n = 51, x = 485, s = 17.2, α = 0.05
¯ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level?
Step 0: Write down what we know: n = 51, x = 485, s = 17.2, α = 0.05
¯
Step 1: State the null and alternative hypotheses:
H0 : µ = 490
Ha : µ = 490 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level?
Step 0: Write down what we know: n = 51, x = 485, s = 17.2, α = 0.05
¯
Step 1: State the null and alternative hypotheses:
H0 : µ = 490
Ha : µ = 490
Step 2: Assume H0 is true. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level?
Step 0: Write down what we know: n = 51, x = 485, s = 17.2, α = 0.05
¯
Step 1: State the null and alternative hypotheses:
H0 : µ = 490
Ha : µ = 490
Step 2: Assume H0 is true.
Step 3: Find the value of the test statistic:
x −µ
¯
485 490
z = σ/√n = 17.2−√51 = −2.075997
/ Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level?
Step 0: Write down what we know: n = 51, x = 485, s = 17.2, α = 0.05
¯
Step 1: State the null and alternative hypotheses:
H0 : µ = 490
Ha : µ = 490
Step 2: Assume H0 is true.
Step 3: Find the value of the test statistic:
x −µ
¯
485 490
z = σ/√n = 17.2−√51 = −2.075997
/
Step 4: Find the pvalue:
pvalue = 2P (Z < −2.075997) = 2 ∗ 0.01894713 = 0.0379 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Example
10.26) Yield stress measurements on 51 steel rods with 10 mm diameters
gave a mean of 485 N/mm2 and a standard deviation of 17.2. Suppose
the manufacturer claims that the mean yield stress for these bars is 490.
Does the sample information suggest rejecting the manufacturer’s claim,
at the 5% signiﬁcance level?
Step 0: Write down what we know: n = 51, x = 485, s = 17.2, α = 0.05
¯
Step 1: State the null and alternative hypotheses:
H0 : µ = 490
Ha : µ = 490
Step 2: Assume H0 is true.
Step 3: Find the value of the test statistic:
x −µ
¯
485 490
z = σ/√n = 17.2−√51 = −2.075997
/
Step 4: Find the pvalue:
pvalue = 2P (Z < −2.075997) = 2 ∗ 0.01894713 = 0.0379
Step 5: Come to a conclusion:
Since pvalue < α = 0.05 we reject H0 .
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 200 / 251 Some more terms...
Before we see a few more examples, let’s deﬁne a few more terms. Deﬁnition
The power of a statistical test is the probability of rejecting the null
hypothesis when an alternative hypothesis is true.
Power = 1 − β = 1 − P (type II error) (44) The power tells us how good our test is. If we want to compare our test to
some other test with the same α level, then the power lets us know which
test is better.
Earlier we mentioned that we usually can’t decrease α and β at the same
time. However we can if we increase sample size, though in many realistic
situations (high cost of generating new samples, time to complete trials,
etc) this isn’t possible. If we decrease one it increases the other.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 201 / 251 End of Chapter 10 (for now) Make sure you are comfortable with the examples we did from class.
Your homework for chapter ten are the oddnumbered problems from 10.1,
10.2 and 10.3
Everything after this slide will not be on exam 2. We may come back to
other sections of chapter 10 later on in the course if we have time. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 202 / 251 Chapter 2 It is often the case that a scientist is interested in the relationship between
two variables. Examples of questions include
Is exposure to the sun related to skin cancer?
Is a person’s height related to his/her weight?
Is the number of beers I drink the day before a test related to the
score I receive?
We can apply some of the principles of probability and statistics to answer
questions like these.
For homework read sections 2.1 and 2.2. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 203 / 251 Scatterplots
Before digging deeply into the statistics of determining relationships
between variables, lets ﬁrst look at a quick way to see how strongly
variables are related.
“The simplest graphical tool used for detecting association between two
variables is the scatterplot, which simply plots the ordered pairs of data
points on a rectangular coordinate system.”
Here’s some more facts from section 2.3 about scatterplots:
If the plot has a roughly elliptical cloud shape, then it is reasonable to
say a linear relationship exists.
If the ellipse tilts up and to the right, the association is positive. If
the tilt is down and to the right, the association is negative.
If the ellipse is thin and long the relationship is strong. Fat and round
imply a weak relationship.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 204 / 251 6
4 y2 2
0 y1 0123456 Examples 0 2 4 6 0 1 2 y4
1 2 3 4 5 6 4 5 01234567 4
3 y3 2
1
0
0 3
x2 5 x1 0 x3 1 2 3 4 5 x4 Top Left: No clear linear relationship.
Top Right: Positive linear relationship.
Bottom Left: Strong negative linear relationship.
Bottom Right: Negative linear relationship.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 205 / 251 Terms explained Positive and negative relationships determine the direction of the
relationship and not the strength of the relationship.
A positive relationship means that as one value increases(decreases),
the other variable increases(decreases).
A negative relationship means that as one value increases(decreases),
the other variable decreases(increases).
For homework read the rest of section 2.3. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 206 / 251 Measuring Linear Relationships
Once we decide that it is reasonable to assume that a relationship is linear,
it is usually a good idea to measure that relationship somehow.
We talked brieﬂy about the relationship between Random Variables in
previous slide and on slide 110 we deﬁned the notion of the correlation
between two Random Variables.
ρ= Cov (X , Y )
Var (X )Var (Y ) = σXY
E [(X − EX )(Y − EY )]
=
σX σY
σX σY Then a reasonable estimate of ρ is given by
ρ≡r =
ˆ Rob Gordon (University of Florida) 1
n−1 n
i =1 xi − x
¯
sx STA 3032 (7661) yi − y
¯
sy (45) Fall 2011 207 / 251 More about r
r is called Pearson’s correlation coeﬃcient. It has the following properties:
−1 ≤ r ≤ 1
A value of r near 0 implies little to no linear relationship between y
and x .
In contrast, the closer r is to 1 or 1, the stronger the linear
relationship between y and x . If r = ±1, all points fall exactly on the
line.
A positive value of r implies that y increases as x decreases.
A negative value of r implies that y decreases as x increases.
These situations are illustrated in section 2.4 of the book. Please read that
section and do a few of the oddnumbered questions. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 208 / 251 Modeling Linear Relationships
Its one thing to measure a linear relationship using Pearson’s correlation
coeﬃcient; its another thing entirely to accurately model the relationship.
If two variables have a linear relationship then we should be able to model
that relationship with the equation of a line. In general, a linear relation
between two variables is given by the following equation (called a simple
linear regression model):
y = β0 + β1 x
(46)
where
y is the response variable
x is the explanatory variable or predictor variable
β0 is the yintercept. It is the value of y , for x = 0.
β1 is the slope of the line. It gives the amount of change in y for a
unit change in the value of x .
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 209 / 251 Fitting the model
Fitting this “regression” line to a data set involves estimating the slope
ˆ
ˆ
and intercept to produce a line that is denoted by y = β0 + β1 x .
ˆ 0 1 2 y 3 4 5 The question then becomes, “How do we go about constructing a line that
best ﬁts the data?” The real question is “What do we mean by best?” 0 1 2 3 4 5 6 x Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 210 / 251 Fitting the model
There are many conceivable ways to deﬁne the “best” line. The one
presented in this class is not the only one, but it is considered one of the
most basic and straightforward.
The line that we create to model the linear relationship is the one that
minimizes what is known as the sum of squared errors (SSE). Deﬁnition
n n (yi − yi )2 =
ˆ SSE =
i =1 ˆ
ˆ
yi − β0 − β1 xi 2 (47) i =1 ˆ
ˆ
We can think of SSE as a function of two variables: β0 and β1 . The
minimization problem is easily solved with calculus and linear algebra. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 211 / 251 Fitting the model
I’ll spare the gory details and just give the formula for the regression
coeﬃcients : Deﬁnition
ˆ
ˆ
The leastsquares regression line is y = β0 + β1 x with
ˆ
ˆ
slope β1 = i (xi − x )(yi − y )
¯
¯
ˆ
and yintercept β0 = y − β1 x .
¯ ˆ¯
2
¯
( xi − x ) (48) ˆ sx
Note: r = β1 sy (proof on page 85).
By now it should be clear that these formulas can get tedious really quickly.
Reset assured I won’t ask you to do anything too taxing on a quiz or test. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 212 / 251 Fitting the model
For homework you will deﬁnitely have to do some kind of model ﬁtting
however. I suggest investing in a nice calculator or acquiring decent
software including one of the following:
Excel/Open Oﬃce (limited in functionality, low learning curve)
MATLAB (very good. I’m not sure if they oﬀer cheap student
licenses, slightly high learning curve.)
MiniTab (very user friendly, free 30day license, very low learning
curve, can only do basic analyses)
R (completely free, open source, available on all operating systems,
can do everything from basic to complicated analyses, slightly high
learning curve)
In class I’ll be presenting solutions to problems in MiniTab (the book
presents results with this) and R. I highly recommend you acquire one or
both.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 213 / 251 Fitting the model Example
(2.32) Chapman and Demeritt reported diameters (in inches) and ages (in
years) of oak trees. The data are shown (in the book).
a Make a scatterplot. Is there any association between the age of the
oak tree and the diameter? If yes, discuss the nature of the relation.
b Can the diameter of oak tree be useful for predicting the age of the
tree? If yes, construct a model for the relationship. If no, discuss why
not.
c If the diameter of one oak tree is 5.5 inches, what do you predict the
age of this tree to be? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 214 / 251 10 20 Age 30 40 Make a scatterplot. Is there any association between the age of the oak
tree and the diameter? 1 2 3 4 5 6 7 8 Diameter Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 215 / 251 Can the diameter of oak tree be useful for predicting the age of the tree?
If yes, construct a model for the relationship.
Let y represent the tree age and x represent the diameter. We’ll use R to
compute β0 and β1 .
> age = c(4, 5, 8, 8, 8, 10, 10, 12, 13, 14, 16, 18, 20,
22, 23, 25, 28, 29, 30, 30, 33, 34, 35, 38, 38, 40, 42)
> diam = c(0.8, 0.8, 1, 2, 3, 2, 3.5, 4.9, 3.5, 2.5, 2.5,
4.6, 5.5, 5.8, 4.7, 6.5, 6, 4.5, 6, 7, 8, 6.5, 7, 5, 7,
7.5, 7.5)
> plot(diam, age,pch = 19, xlab = "Diameter", ylab = "Age")
> fit = lm(age ~ diam)
> fit$coeff
(Intercept)
diam
0.1882781
4.7618114
ˆ
ˆ
This output tells us β0 = −0.1882781 and β1 = 4.7618114.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 216 / 251 If the diameter of one oak tree is 5.5 inches, what do you predict the age
of this tree to be?
The predicted value of y is y . This is easily solved using our ﬁtted
ˆ
regression formula:
ˆ
ˆ
y = β0 + β1 x
ˆ
= −0.1882781 + 4.7618114(5.5)
= 4.768824
Its easy to ﬁnd this in R using the following code:
> fit$coeff[[1]] + fit1$coeff[[2]]*5.5
[1] 4.768824 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 217 / 251 We can easily plot the ﬁtted line on top of the scatterplot using the
following code: 10 20 Age 30 40 plot(diam, age,pch = 19, xlab = "Diameter", ylab = "Age")
abline(fit$coeff[[1]], fit$coeff[[2]]) 1 2 3 4 5 6 7 8 Diameter Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 218 / 251 “How do we get the correlation in R?” To get the correlation in R we just type
cor(age, diam, method="pearson")
[1] 0.8891367
Another way to measure how well x predicts Y is Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 219 / 251 Another way to determine how well x predicts Y
How do we know if our line “ﬁts” our data? Do a “goodnessofﬁt” test.
Recall that r measures the strength of the linear relationship between X
and Y . r 2 is also a good stat for goodnessofﬁt.
Let’s say that we wish to estimate some variable Y . We can do this by
minimizing
n (yi − y )2 .
¯ total sum of squares = SStotal = SSyy = (49) i =1 Similarly we could do this by minimizing
n (yi − yi )2 .
ˆ SSE = (50) i =1 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 220 / 251 We can say that our line is a good estimate if SSE is much smaller than
SStotal .
Then SStotal − SSE is a goodness of ﬁt statistic.
Problem: This diﬀerence has units of y 2 . It would be nice if we had some
kind of unitless measurement like we have with r . Instead we’ll use:
coeﬃcient of determination = r 2 = SStotal − SSE
SStotal (51) (yi − y )2
ˆ¯ (52) Furthermore it can be shown that
(yi − y )2 =
¯ (yi − yi )2 +
ˆ where (yi − y )2 is referred to as the Sum of Squares for Regression
ˆ¯
(denoted by SSR ). Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 221 / 251 To summarize: Deﬁnition
The square of the coeﬃcient of correlation (r ) is called the coeﬃcient of
determination. It represents the proportion of the sume of squares of
deviations of the y values about their mean that can be attributed to a
linear relation between y and x .
r2 = SSE
SSR
SStotal − SSE
=1−
=
SStotal
SStotal
SStotal (53) Note that like r , r 2 ∈ [−1, 1].
Last time we saw an example of ﬁtting the line with R. Let’s see how its
done with minitab. Pay close attention... this part isn’t in the slides! Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 222 / 251 Example
12 motors are operated under high temperature conditions until engine
failure.
Temp
40
45
50
55
60
65
70
75
80
85
90
Hours 851 635 764 708 469 661 586 371 337 245 129
Make scatterplot of hours (y) vs temp(x) and verify if linear model is
appropriate.
Compute LS line.
Compute ﬁtted values and residuals for each point
If temp increased by 5 degrees, how much would you predict lifetime
to increase or decrease?
Predict lifetime for temp of 73 degrees.
Should we estimate the LS line for temp = 120?
For what temp would you predict a lifetime of 500 hours? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 223 / 251 Welcome back Since the last time we saw slides, we saw lectures related to simple and
multiple linear regression (chapters 2 and 11).
Quiz 9 will be wednesday and will cover only simple linear regression
(sections 2.3  2.7 and 11.1  11.3).
It is recommended that you all do the homework problems for those
sections. We’ll give a few more examples of multiple linear regression. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 224 / 251 Example 1
The article “Application of Analysis of Variance to Wet Clutch
Engagement” (Mansouri, Khonsari, et al., Proceedings of the INstitution
of Mechanical Engineers, 2002:117125) presents the following ﬁtted model
for predicting clutch engagement time in seconds (y ) from engagement
starting speed in m/s (x1 ), maximum drive torque in N · m(x2 ), system
intertia in kg · m2 (x3 ), and applied force rate in kN/s (x4 ):
y = −0.83 + 0.017x1 + 0.0895x2 + 42.771x3 + 0.027x4 − 0.0043x2 x4
ˆ
The sum of squares for regression was SSR = 1.08613 and the sum of
squares for error was SSE = 0.036310. There were 44 degrees of freedom
for error. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 225 / 251 Example 1 continued...
a. Predict the clutch engagement time when the starting speed is 20
m/s, the maximum drive torque is 17N·m, the system inertia is 0.006
kg·m2 , and the applied force rate is 10 kn/s.
Solution:
y = −0.83 + 0.017x1 + 0.0895x2 + 42.771x3
ˆ
+0.027x4 − 0.0043x2 x4
= Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 226 / 251 Example 1 continued...
a. Predict the clutch engagement time when the starting speed is 20
m/s, the maximum drive torque is 17N·m, the system inertia is 0.006
kg·m2 , and the applied force rate is 10 kn/s.
Solution:
y = −0.83 + 0.017x1 + 0.0895x2 + 42.771x3
ˆ
+0.027x4 − 0.0043x2 x4
= − 0.83 + 0.017(20) + 0.0895(17) + 42.771(0.006)
+0.027(10) − 0.0043(17)(10) = 0.827126 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 226 / 251 Example 1 continued...
b. Is it possible to predict the change in engagement time associated
with an increase of 2m/s in starting speed? If so, ﬁnd the predicted
change. If not, explain why not. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 227 / 251 Example 1 continued...
b. Is it possible to predict the change in engagement time associated
with an increase of 2m/s in starting speed? If so, ﬁnd the predicted
change. If not, explain why not.
Solution: Just look at the coeﬃcient next to x1 . The coeﬃcient
represents the change in y per 1unit change in x1 . Then we can
predict the change in y to be 2×0.017 = 0.034 seconds.
c. Is it possible to predict the change in engagement time associated
with an increase of 2 N·m in maximum drive torque? Why or why
not? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 227 / 251 Example 1 continued...
b. Is it possible to predict the change in engagement time associated
with an increase of 2m/s in starting speed? If so, ﬁnd the predicted
change. If not, explain why not.
Solution: Just look at the coeﬃcient next to x1 . The coeﬃcient
represents the change in y per 1unit change in x1 . Then we can
predict the change in y to be 2×0.017 = 0.034 seconds.
c. Is it possible to predict the change in engagement time associated
with an increase of 2 N·m in maximum drive torque? Why or why
not?
Solution: No. Note that max drive torque (x2 ) is present in the
model as both a main eﬀect and interaction term. Since we don’t
know the change of x4 , we can’t predict the change in y . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 227 / 251 Example 1 continued...
d. Compute the coeﬃcient of determination R 2 .
Solution: Recall the last sentence from the problem: The sum of
squares for regression was SSR = 1.08613 and the sum of squares for
error was SSE = 0.036310. There were 44 degrees of freedom for
error. Then
R2 = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 228 / 251 Example 1 continued...
d. Compute the coeﬃcient of determination R 2 .
Solution: Recall the last sentence from the problem: The sum of
squares for regression was SSR = 1.08613 and the sum of squares for
error was SSE = 0.036310. There were 44 degrees of freedom for
error. Then
SSR
SSR
R 2 = SST = SSR +SSE = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 228 / 251 Example 1 continued...
d. Compute the coeﬃcient of determination R 2 .
Solution: Recall the last sentence from the problem: The sum of
squares for regression was SSR = 1.08613 and the sum of squares for
error was SSE = 0.036310. There were 44 degrees of freedom for
error. Then
SSR
SSR
1.08613
R 2 = SST = SSR +SSE = 1.08613+0.036310 = 0.9676508
e. Compute the F statistic for testing the null hypothesis that all the
coeﬃcients are equal to 0. Can this hypothesis be rejected?
Solution: F = Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 228 / 251 Example 1 continued...
d. Compute the coeﬃcient of determination R 2 .
Solution: Recall the last sentence from the problem: The sum of
squares for regression was SSR = 1.08613 and the sum of squares for
error was SSE = 0.036310. There were 44 degrees of freedom for
error. Then
SSR
SSR
1.08613
R 2 = SST = SSR +SSE = 1.08613+0.036310 = 0.9676508
e. Compute the F statistic for testing the null hypothesis that all the
coeﬃcients are equal to 0. Can this hypothesis be rejected?
Solution: F = Rob Gordon SSR /p
SSE /(n−p −1) (University of Florida) = STA 3032 (7661) Fall 2011 228 / 251 Example 1 continued...
d. Compute the coeﬃcient of determination R 2 .
Solution: Recall the last sentence from the problem: The sum of
squares for regression was SSR = 1.08613 and the sum of squares for
error was SSE = 0.036310. There were 44 degrees of freedom for
error. Then
SSR
SSR
1.08613
R 2 = SST = SSR +SSE = 1.08613+0.036310 = 0.9676508
e. Compute the F statistic for testing the null hypothesis that all the
coeﬃcients are equal to 0. Can this hypothesis be rejected?
Solution: F = SSR /p
SSE /(n−p −1) = 1.08613/5
0.036310/(44) = 263.2317 The F table gives a critical value of around 3.4. Then the pvalue is
much smaller than 0.05 so we reject the hypothesis. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 228 / 251 Example 2
The following MINITAB output is for a multiple regression. Some of the
numbers got smudged and are illegible. Fill in the missing numbers.
Predictor
Constant
X1
X2
X3 Coef
(a)
1.2127
7.8369
(d) SE Coef
1.4553
(b)
3.2109
0.8943 T
5.91
1.71
(c)
3.56 P
0.000
0.118
0.035
0.050 S = 0.82936 RSq = 78.0% RSq(adj) = 71.4%
Source
DF
SS
MS
F
P
Regression
(e)
(f) 8.1292 11.818 0.01
Residual Error 10 6.8784
(g)
Total
13
(h) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 229 / 251 Example 3
A research article describes an experiment involving a chemical process
designed to separate enantiomers. A model was ﬁt to estimate the cycle
time (y ) in terms of the ﬂow rate (x1 ), sample concentration (x2 ), and
mobilephase composition (x3 ). The results of a leastsquares ﬁt are
presented in the following table.
Predictor
Constant
x1
x2
x3
2
x1
2
x2
2
x3
x1 x2
x1 x3
x2 x3
Rob Gordon (University of Florida) Coeﬃcient
1.603
0.619
0.086
0.306
0.272
0.057
0.105
0.022
0.036
0.036
STA 3032 (7661) T P 22.289
3.084
11.011
8.542
1.802
3.300
0.630
1.004
1.018 0.000
0.018
0.000
0.000
0.115
0.013
0.549
0.349
0.343
Fall 2011 230 / 251 Example 3 Of the following, which is the best next step in the analysis?
i. Nothing needs to be done. This model is ﬁne.
22
2
ii. Drop x1 , x2 and x3 from the model, and then perform an F test. iii. Drop x1 x2 , x1 x3 and x2 x3 from the model, and then perform an F test.
2
iv. Drop x1 and x1 from the model, and then perform an F test.
3
3
3
v. Add cubic terms x1 , x2 , and x3 to the model to try to improve the ﬁt. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 231 / 251 Example 4
The following MINITAB output is for a best subsets regression involving
ﬁve dependent variables X1 , . . . , X5 .
Vars
1
1
2
2
3
3
4
4
5 Rob Gordon RSq
77.3
10.2
89.3
77.8
90.5
89.4
90.7
90.6
90.7 RSq(adj)
77.1
9.3
89.0
77.3
90.2
89.1
90.3
90.2
90.2 (University of Florida) S
1.40510
2.79400
0.97126
1.39660
0.91630
0.96763
0.91446
0.91942
0.91895 1
X
X
X
X
X
X
X
X STA 3032 (7661) 2 3 4 5 X
X
X
X
X
X X
X
X
X
X X
X
X
X X
X Fall 2011 232 / 251 Example 4 a. Which variables are in the model selected by the adjusted R 2
criterion?
b. Are there any other good models? Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 233 / 251 Example 5
Suppose we try to ﬁt the model equation
y = β0 + β1 x1 + β2 x2 + β3 x3 + β 4x4 + β5 x3 x4 + β6 x6 + β7 x7 +
and we get the following ﬁt from some statistical software:
y = −0.257 + 0.778x1 − 0.105x2 + 1.213x3 − 0.00624x4 + 0.00386x3 x4
ˆ
−0.00740x6 − 0.00148x7 . Furthermore, we get the following output: Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 234 / 251 Example 5 continued...
Predictor
Constant
x1
x2
x3
x4
x3 ∗ x4
x6
x7
S = 0.22039 Coef
0.2565
0.77818
.10479
1.2128
0.0062446
0.0038642
0.007404
0.0014773 SE Coef
0.7602
0.05270
0.03647
0.4270
0.01351
0.008414
0.009313
0.0005170 RSq = 93.5% T
0.34
14.77
2.87
2.84
0.46
0.46
0.79
2.86 P
0.736
0.000
0.005
0.005
0.645
0.647
0.428
0.005 RSq(adj) = 93.2% (continued on next slide) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 235 / 251 Example 5 continued... Analysis of Variance
Source
DF
Regression
7
Residual Error 157
Total
164 SS
111.35
7.7302
119.08 MS
15.907
0.049237 F
323.06 P
0.000 Notice that x4 , x3 ∗ x4 and x6 have large pvalues. After dropping those
terms from the model, the ﬁt for the reduced model is
y = −0.219 + 0.779x1 − 0.108x2 + 1.354x3 − 0.00134x7
ˆ
and the MINITAB output is given as Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 236 / 251 Example 5 continued...
Predictor
Constant
x1
x2
x3
x7
S = 0.22039 Coef
0.21947
0.779
0.10827
1.3536
0.0013431 RSq = 93.5% Analysis of Variance
Source
DF
Regression
7
Residual Error 157
Total
164 Rob Gordon SE Coef
0.4503
0.04909
0.0352
0.2880
0.0004722 (University of Florida) SS
111.35
7.7716
119.08 T
0.49
15.87
3.08
4.70
2.84 P
0.627
0.000
0.002
0.000
0.005 RSq(adj) = 93.3%
MS
15.907
0.049237 STA 3032 (7661) F
323.06 P
0.000 Fall 2011 237 / 251 Example 5 continued... a. Compute the f statistic for testing the plausibility of the reduced
model.
b. How many degrees of freedom does the F statistic have?
c. Find the P value for the F statistic. Is the reduced model plausible?
d. Someone claims that since each of the variables being dropped had
large P values, the reduced model must be plausible, and it was not
necessary to perform an F test. Is this correct? Explain why or why
not.
e. The total sum of squares is the same in both models, even though the
independent variables are diﬀerent. Is there a mistake? Explain. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 238 / 251 Model Selection
One systematic way to perform a model selection is Best Subsets
Regression
There’s a few ways we can go about this, but the general strategy is the
following:
Create full model
Generate all possible reduced models
Choose a model based on goodness of ﬁt statistics
R 2 (always tells us to choose full model)
Adjusted R 2 (penalizes for too many variables)
others (AIC, BIC, Mallow’s Cp , etc.) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 239 / 251 Speciﬁc Types of Variable Selection
Forward selection
Start with interceptonly model
Add a single predictor variable. If the ttest has a pvalue < α,
continue adding variables. Backward selection
Start with the full model
If one variable has a pvalue > α, delete it and ﬁt the reduced model.
Continue until all variables are signiﬁcant. Neither method is perfect. Some potential problems:
Sometimes the order in which variables are added in a forward
selection process can aﬀect the model obtained.
Sometimes adding a variable causes pvalues of previous variables to
become insigniﬁcant. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 240 / 251 A compromise Stepwise Regression:
(1) Choose “threshhold pvalues” αin , αout usually where αin ≤ αout .
(2) Do a forward selection: Choose predictor variable with smalled
pvalue, add to model provided its pvalue < αin .
(3) Do another forward selection: If a second variable has pvalue < αin ,
add to model and otherwise stop.
(4) Backwards selection: sometimes adding 2nd variable increases pvalue
of 1st variable. If pvalue of 1st variable > αout , remove from model.
(5) Return to (3). Continue until there does not exist 2 variables when
adding to the model does not give pvalues< αin nor pvalues < αout . Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 241 / 251 Warning
Model selection procedures sometimes produce models when they don’t
make sense.
Example: Annual birthrate in Great Britain was almost perfectly correlated
with annual production of pig iron* in the United States from 18751920.
*pig iron is a byproduct of smelting iron ore with a type of coal.
Don’t blatantly throw predictors in a model that make no sense.
If you aren’t sure if a relationship makes sense, redo the experiment
to verify results. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 242 / 251 Example
(Navidi, 2nd ed. pg 616) In mobile ad hoc computer networks, messages
must be forwarded from computer to computer until they reach their
destinations. The data overhead is the # of bytes of information that
must be transmitted along with the messages to get them to the right
places. A successful protocol will generally have a low data overhead.
A study is conducted on 25 simulated computer networks. The overhead,
average speed, pause time, link change rate (LCR) are recorded. LCR for a
given computer is the rate at which over computers in the network enter
and leave the transmission range of the given computer.
To start let’s say we ﬁt the following model (raw data not show, see course
website):
Overhead = β0 + β1 LCR + β2 Speed + β3 Pause + β4 Speed · Pause
+β5 LCR 2 + β6 Speed 2 + β7 Pause 2 + Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 243 / 251 Fit the Model
We can ﬁt the model in R with the following statement:
> mydata = read.table(‘/Users/robertgordon/Documents/
table84.txt’, sep=’,’, header=TRUE)
> attach(mydata)
> fit1 = lm(Overhead ~ LCR + Speed + Pause + I(Speed*Pause) +
I(LCR^2) + I(Speed^2) + I(Pause^2))
We can see the results of ﬁtting the model by typing
> summary(fit1) Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 244 / 251 Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept)
367.96413
19.40264 18.965 7.12e13
LCR
3.47669
2.12913
1.633 0.12087
Speed
3.04382
1.59133
1.913 0.07278
Pause
2.29237
0.69838
3.282 0.00439
I(Speed * Pause) 0.01222
0.01534 0.797 0.43663
I(LCR^2)
0.10412
0.03192 3.262 0.00459
I(Speed^2)
0.03131
0.01906 1.643 0.11885
I(Pause^2)
0.01318
0.01045 1.261 0.22442
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1 ***
.
**
** Residual standard error: 5.723 on 17 degrees of freedom
Multiple Rsquared: 0.9723,Adjusted Rsquared: 0.9609
Fstatistic: 85.33 on 7 and 17 DF, pvalue: 5.409e12 Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 245 / 251 Look at the high pvalues on the previous slide
LCR
Speed · Pause
Speed 2
Pause 2
Leave LCR in the model since LCR 2 is signiﬁcant. Try the following
reduced model:
Overhead = β0 + β1 LCR + β2 Speed + β3 Pause + β5 LCR 2 +
and perform the Ftest to see if the reduced model works. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 246 / 251 Note: SSEfull = 556.8 (how?) on 17 degrees of freedom.
Fitting the reduced model gives SSEreduced = 830.3.
Then the F test statistic is
f= 830.3 − 556.8)/(7 − 4)
= 2.78.
556.8/17 Under H0 we say that f has a F3,17 distribution. We can ﬁnd the pvalue
with R:
> pf(q=2.78, df1=3, df2=17, lower.tail=FALSE)
[1] 0.072727
This means the reduced model is plausible. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 247 / 251 Can we do a Best Subsets procedure in R? Yes. Consider how many ways
we can do this: almost 7 + 7 + 7 + 7 + 7 + 7 + 7 = 127.
1
2
3
4
5
6
7
(why?)
That’s a lot of models to go through. Let’s automate this process. To do
this in R we ﬁrst need to install the ‘leaps’ package.
> install.packages(’leaps’)
> library(leaps)
Then use the regsubsets function (instead of lm) to ﬁt the model:
> leaps<regsubsets(Overhead ~ LCR + Speed + Pause +
I(Speed*Pause) + I(LCR^2) + I(Speed^2) + I(Pause^2),
data=mydata,nbest=2)
The nbest=2 option tells R to only report the 2 best models for that
subgroup.
Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 248 / 251 1
1
2
2
3
3
4
4
5
5
6
6
7 (
(
(
(
(
(
(
(
(
(
(
(
( 1
2
1
2
1
2
1
2
1
2
1
2
1 Rob Gordon )
)
)
)
)
)
)
)
)
)
)
)
) LCR
""
""
""
""
"*"
"*"
"*"
"*"
"*"
""
"*"
"*"
"*" Speed
""
""
""
"*"
""
"*"
"*"
""
"*"
"*"
"*"
"*"
"*" (University of Florida) Pause
""
"*"
"*"
"*"
"*"
""
"*"
"*"
"*"
"*"
"*"
"*"
"*" I(S*P)
"*"
""
"*"
""
""
""
""
""
""
"*"
""
"*"
"*" I(LCR^2)
""
""
""
""
"*"
""
"*"
"*"
"*"
"*"
"*"
"*"
"*" STA 3032 (7661) I(S^2)
""
""
""
""
""
"*"
""
"*"
""
"*"
"*"
"*"
"*" I(P^2)
""
""
""
""
""
""
""
""
"*"
""
"*"
""
"*" Fall 2011 249 / 251 The list of best subgroups is nice, but we can make a more informed
decision if we knew the R 2 values associated with each row. We can see
this graphically with the following:
>
>
>
> par(mfrow=c(1,2)
plot(leaps,scale="r2")
library(car)
subsets(leaps, statistic="rsq") Note:
par tells R to partition a graph into 1 row and 2 columns
library tells R to access the car package. Rob Gordon (University of Florida) STA 3032 (7661) Fall 2011 250 / 251 LSPI*PI(LI(SI(P
LSPI(LI(SI(P
LSPI*PI(LI(S
LSPI(LI(P
SPI*PI(LI(S
LSPI(L
LPI(LI(S
LPI(L 0.97 0.9 0.97
0.97 LSI(S 0.97 r2 0.95
0.93
0.9 PI*P
SP 0.8 0.96 I*P
0.7 Statistic: rsq 0.97 0.83
0.6 0.82
0.74
0.55
(Intercept)
LCR
Speed
Pause
I(Speed * Pause)
I(LCR^2)
I(Speed^2)
I(Pause^2) P Rob Gordon (University of Florida) 1 2 3 4 5 6 7 Subset Size STA 3032 (7661) Fall 2011 251 / 251 ...
View
Full
Document
This note was uploaded on 12/13/2011 for the course STA 3032 taught by Professor Kyung during the Fall '08 term at University of Florida.
 Fall '08
 Kyung
 Statistics

Click to edit the document details