STA1501/101/3/2011
IMPORTANT INFORMATION
Read this tutorial letter rst.
DEPARTMENT OF STATISTICS
STA1501
Descriptive Statistics and Probability
Tutorial letter 101 for 2011
SCHEME OF WORK, STUDY RESOURCES AND ASSIGNMENTS
2
CONTENTS
Page
1. A word of welco
STA1501/101/3/2013
Tutorial Letter 101/3/2013
Descriptive Statistics and Probability
STA1501
Semesters 1 & 2
Department of Statistics
IMPORTANT INFORMATION:
This tutorial letter contains important
information about your module and
includes the assignment
Introduction to Probability
Distributions
Random Variable
Represents a possible numerical value from
an uncertain event
Random
Variables
Ch. 6
Discrete
Random Variable
Continuous
Random Variable
Ch. 7
Collectively Exhaustive Events
Collectively exhaustive events
One of the events must occur
The set of events covers the entire sample space
example:
A = aces; B = black cards;
C = diamonds; D = hearts
Events A, B, C and D are collectively exhaustive
(but
Mutually Exclusive Events
Mutually exclusive events
Events that cannot occur together
example:
A = queen of diamonds; B = queen of clubs
Events A and B are mutually exclusive
Coefficient of Correlation
Measures the relative strength of the linear
relationship between two variables
Sample coefficient of correlation:
cov (X , Y)
r
SX SY
where
n
cov (X, Y)
(X X)(Y Y)
i1
i
n
i
n 1
SX
(X X)
i1
i
n 1
n
2
SY
2
(Y
Y
)
i
i1
n 1
F
Assessing Probability
There are three approaches to assessing the probability
of an uncertain event:
1. a priori classical probability
probabilit y of occurrence
X
number of ways the event can occur
T
total number of elementary outcomes
2. empirical clas
Interquartile Range
Can eliminate some outlier problems by using
the interquartile range
Eliminate some high- and low-valued
observations and calculate the range from the
remaining values
Interquartile range = 3rd quartile 1st quartile
= Q3 Q1
Interquarti
Five-Number Summary
Box Plot: A Graphical display of data using 5number summary:
Minimum - Q1 - Median - Q3 - Maximum
Example:
25%
Minimum
Minimum
25%
1st
1st
Quartile
Quartile
25%
Median
Median
25%
3rd
3rd
Quartile
Quartile
Maximum
Maximum
Shape of Box P
Events
Simple event
Complement of an event A (denoted A)
An outcome from a sample space with one
characteristic
e.g., A red card from a deck of cards
All outcomes that are not part of event A
e.g., All cards that are not diamonds
Joint event
Involves two
Discrete Probability Distribution
Experiment: Toss 2 Coins.
T
T
H
H
T
H
T
H
Probability Distribution
X Value
Probability
0
1/4 = 0.25
1
2/4 = 0.50
2
1/4 = 0.25
Probability
4 possible outcomes
Let X = # heads.
0.50
0.25
0
1
2
X
Continuous Probability Distributions
A continuous random variable is a variable that
can assume any value on a continuum (can
assume an uncountable number of values)
thickness of an item
time required to complete a task
temperature of a solution
height,
Probability Distributions
Probability
Distributions
Ch. 6
Discrete
Probability
Distributions
Binomial
Poisson
.
Continuous
Probability
Distributions
Normal
Ch. 7
The Poisson Distribution
Apply the Poisson Distribution when:
You wish to count the number of times an event
occurs in a given area of opportunity
The probability that an event occurs in one area of
opportunity is the same for all areas of opportunity
Discrete Random Variable
Summary Measures
Expected Value (or mean) of a discrete
distribution (Weighted Average)
N
E(X) Xi P( Xi )
i1
Example: Toss 2 coins,
X = # of heads,
compute expected value of X:
E(X) = (0 x 0.25) + (1 x 0.50) + (2 x 0.25)
= 1.0
Rule of Combinations
The number of combinations of selecting X
objects out of n objects is
n!
n Cx
X! (n X)!
where:
n! =(n)(n - 1)(n - 2) . . . (2)(1)
X! = (X)(X - 1)(X - 2) . . . (2)(1)
0! = 1
(by definition)
Probability Distributions
Probability
Distributions
Ch. 6
Discrete
Probability
Distributions
Binomial
Poisson
Continuous
Probability
Distributions
Normal
Ch. 7
Discrete Random Variables
Can only assume a countable number of values
Examples:
Roll a die twice
Let X be the number of times 4 comes up
(then X could be 0, 1, or 2 times)
Toss a coin 5 times.
Let X be the number of heads
(then X = 0, 1, 2, 3, 4, or 5
Quartiles
Quartiles split the ranked data into 4 segments with
an equal number of values per segment
25%
Q1
25%
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the media
The Sample Covariance
The sample covariance measures the strength of the
linear relationship between two variables (called
bivariate data)
The sample covariance:
n
cov ( X , Y )
( X X)(Y Y )
i1
i
i
n 1
Only concerned with the strength of the relationshi
Important Terms
Probability the chance that an uncertain event
will occur (always between 0 and 1)
Event Each possible outcome of a variable
Simple Event an event that can be described
by a single characteristic
Sample Space the collection of all possible
Mean
The mean is the most common measure of
central tendency
For a sample of size n:
n
X
Sample size
X
i1
n
i
X1 X 2 Xn
n
Observed values
Mean
(continued)
The most common measure of central tendency
Mean = sum of values divided by the number of values
Aff
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical
(nominal) data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3
Cross Tabulations
The Contingency tables:
Investment
Category
Investor A
Stocks
Bonds
CD
Savings
46.5
32
15.5
16
Total
110
investment in thousands of dollars
Investor B
Investor C
Total
55
44
20
28
27.5
19
13.5
7
129
95
49
51
147
67
324
Median
In an ordered array, the median is the middle
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
Not affected by extreme values
Finding the Median
The location of the median:
n 1
Median position
posit