Probability–the Science of Uncertainty
and Data
by Fabi´an Kozynski
Probability
Probability models and axioms
Definition (Sample space)
A sample space Ω is the set of all
possible outcomes. The set’s elements must be mutually exclusive,
collectively exhaustive and at the right granularity.
Definition (Event)
An event is a subset of the sample space.
Probability is assigned to events.
Definition (Probability axioms)
A probability law
P
assigns
probabilities to events and satisfies the following axioms:
Nonnegativity
P
(
A
) ≥
0 for all events
A
.
Normalization
P
(
Ω
) =
1.
(Countable) additivity
For every sequence of events
A
1
, A
2
, . . .
such that
A
i
∩
A
j
= ∅
:
P
(
⋃
i
A
i
) =
∑
i
P
(
A
i
)
.
Corollaries (Consequences of the axioms)
•
P
(∅) =
0.
•
For any finite collection of disjoint events
A
1
, . . . , A
n
,
P
(
n
⋃
i
=
1
A
i
) =
n
∑
i
=
1
P
(
A
i
)
.
•
P
(
A
) +
P
(
A
c
) =
1.
•
P
(
A
) ≤
1.
•
If
A
⊂
B
, then
P
(
A
) ≤
P
(
B
)
.
•
P
(
A
∪
B
) =
P
(
A
) +
P
(
B
) −
P
(
A
∩
B
)
.
•
P
(
A
∪
B
) ≤
P
(
A
) +
P
(
B
)
.
Example (Discrete uniform law)
Assume Ω is finite and consists
of
n
equally likely elements. Also, assume that
A
⊂
Ω with
k
elements. Then
P
(
A
) =
k
n
.
Conditioning and Bayes’ rule
Definition (Conditional probability)
Given that event
B
has
occurred and that
P
(
B
) >
0, the probability that
A
occurs is
P
(
A
∣
B
)
△
=
P
(
A
∩
B
)
P
(
B
)
.
Remark (Conditional probabilities properties)
They are the same
as ordinary probabilities. Assuming
P
(
B
) >
0:
•
P
(
A
∣
B
) ≥
0.
•
P
(
Ω
∣
B
) =
1
•
P
(
B
∣
B
) =
1.
•
If
A
∩
C
= ∅
,
P
(
A
∪
C
∣
B
) =
P
(
A
∣
B
) +
P
(
C
∣
B
)
.
Proposition (Multiplication rule)
P
(
A
1
∩
A
2
∩⋯∩
A
n
) =
P
(
A
1
)⋅
P
(
A
2
∣
A
1
)⋯
P
(
A
n
∣
A
1
∩
A
2
∩⋯∩
A
n
−
1
)
.
Theorem (Total probability theorem)
Given a partition
{
A
1
, A
2
, . . .
}
of the sample space, meaning that
⋃
i
A
i
=
Ω and the
events are disjoint, and for every event
B
, we have
P
(
B
) =
∑
i
P
(
A
i
)
P
(
B
∣
A
i
)
.
Theorem (Bayes’ rule)
Given a partition
{
A
1
, A
2
, . . .
}
of the
sample space, meaning that
⋃
i
A
i
=
Ω and the events are disjoint,
and if
P
(
A
i
) >
0 for all
i
, then for every event
B
, the conditional
probabilities
P
(
A
i
∣
B
)
can be obtained from the conditional
probabilities
P
(
B
∣
A
i
)
and the initial probabilities
P
(
A
i
)
as follows:
P
(
A
i
∣
B
) =
P
(
A
i
)
P
(
B
∣
A
i
)
∑
j
P
(
A
j
)
P
(
B
∣
A
j
)
.
Independence
Definition (Independence of events)
Two events are independent
if occurrence of one provides no information about the other. We
say that
A
and
B
are independent if
P
(
A
∩
B
) =
P
(
A
)
P
(
B
)
.
Equivalently, as long as
P
(
A
) >
0 and
P
(
B
) >
0,
P
(
B
∣
A
) =
P
(
B
)
P
(
A
∣
B
) =
P
(
A
)
.
Remarks
•
The definition of independence is symmetric with respect to
A
and
B
.
•
The product definition applies even if
P
(
A
) =
0 or
P
(
B
) =
0.
Corollary
If
A
and
B
are independent, then
A
and
B
c
are
independent. Similarly for
A
c
and
B
, or for
A
c
and
B
c
.
Definition (Conditional independence)
We say that
A
and
B
are
independent conditioned on
C
, where
P
(
C
) >
0, if
P
(
A
∩
B
∣
C
) =
P
(
A
∣
C
)
P
(
B
∣
C
)
.
Definition (Independence of a collection of events)
We say that
events
A
1
, A
2
, . . . , A
n
are independent if for every collection of
distinct indices
i
1
, i
2
, . . . , i
k
, we have
P
(
A
i
1
∩
. . .
∩
A
i
k
) =
P
(
A
i
1
) ⋅
P
(
A
i
2
)⋯
P
(
A
i
k
)
.

#### You've reached the end of your free preview.

Want to read all 4 pages?

- Fall '15
- Probability theory, random variable X