1.3 Conditional Probability and Independence
All of the probabilities that we have dealt with thus far have been unconditional probabilities.
A sample space was dened and all probabilities were calculated with respect to that sample
space. In many instanc
1
Financial Time Series I and Methods of Statistical Prediction
Homework 2: Review on Basics
1. (a) We know that about 40% of the mathematicians in the United States are women.
So the probability that there are 3 or fewer women in a randomly selected grou
Chapter 6
Principle of Data Reduction
6.1
Introduction
An experimenter uses the information in a sample X1 , . . . , Xn to make inferences about an unknown
parameter . If the sample size n is large, then the observed sample x1 , . . . , xn is a long list
Advanced Statistical Inference I
Homework 1: Probability Theory
Due Date: October 7th
1. (Detect mixture distribution) Exercise 1.6.
2. (Countable additivity and Kolmogorovs Axiom) Exercise 1.12 and Exercise 1.35.
3. (Information and Conditioning) Exercis
Lecture 1: Set Theory
1
Set Theory
One of the main objectives of a statistician is to draw conclusions about a population of objects by
conducting an experiment. The st step in this endeavor is to identify the possible outcomes or, in
statistical terminol
Lecture 2 : Basics of Probability Theory
When an experiment is performed, the realization of the experiment is an outcome in the sample
space. If the experiment is performed a number of times, dierent outcomes may occur each time
or some outcomes may repe
1.2.3 Counting and Equally Likely Outcomes
Methods of counting are often used in order to construct probability assignments on nite
sample spaces, although they can be used to answer other questions also. The following
theorem is sometimes known as the Fu
1.4 Random Variable
Motivation example In an opinion poll, we might decide to ask 50 people whether they agree
or disagree with a certain issue. If we record a 1 for agree and 0 for disagree, the sample
space for this experiment has 250 elements. If we de
1.6. Density and Mass Functions
Denition 1.6.1 (Probability Mass Function) The probability mass function (pmf) of a discrete random variable X is given by fX (x) = P (X = x) for all x.
Example 1.6.2 (Geometric probabilities) For the geometric distribution
Transformations and Expectations
1
Distributions of Functions of a Random Variable
If X is a random variable with cdf FX (x), then any function of X , say g (X ), is also a random
variable. Sine Y = g (X ) is a function of X , we can describe the probabil
However, if FX is constant on some interval, then F X 1 is not well dened by (2). The problem is
avoided by dening FX 1 (y ) for 0 < y < 1 by
FX 1 (y ) = inf cfw_x : FX (x) y .
(3)
At the end point of the range of y , FX 1 (1) = if FX (x) < 1 for all x a
Theorem 2.1 Let X be a random variable and let a, b, and c be constants. Then for any functions
g1 (x) and g2 (x) whose expectations exist,
a. E (ag1 (X ) + bg2 (X ) + c) = aEg1 (X ) + bEg2 (X ) + c.
b. If g1 (x) 0 for all x, then Eg1 (X ) 0.
c. If g1 (x)
2.3 Moment Generating Function
Theorem 2.3.11 Let FX (x) and FY (y) be two cdfs all of whose moments exist. a. If X and Y have bounded supports, then FX (u) = FY (u) for all u if and only if EX r = EY r for all integers r = 0, 1, 2, . . . b. If the moment
3.2.3 Binomial Distribution
The binomial distribution is based on the idea of a Bernoulli trial. A Bernoulli trail is
an experiment with two, and only two, possible outcomes. A random variable X has a
Bernoulli(p) distribution if
X=
1 with probability p
0
3.2.4 Poisson Distribution
Denition Let X be the number of events per basic unit: For example, Number of rain drops in one minute. Number of cars passing by you for an hour. Number of chocolate particles in one ChoCoChip cookie. Number of typos in one pag
1
Financial Time Series I and Methods of Statistical Prediction
Homework 1: Review on Basics
1. (a) The null hypothesis might be This person is innocent. It is better to release a
guilty person than to convict an innocent one. So we would rather to make T
5.5.3 Convergence in Distribution
Denition 5.5.10
A sequence of random variables, X1 , X2 , . . ., converges in distribution to a random variable
X if
lim FXn (x) = FX (x)
n
at all points x where FX (x) is continuous.
Example (Maximum of uniforms)
If X1 ,
3.3 Continuous Distribution
3.3.1 Uniform Distribution
The continuous uniform distribution is dened by spreading mass uniformly over an interval [a, b]. Its pdf given by f (x|a, b) =
b a
1 ba
if x [a, b] otherwise
0
It is easy to check that
f (x)dx = 1
3.3.3 Normal Distribution
The normal distribution has several advantages over the other distributions. a. The normal distribution and distributions associated with it are very tractable and analytically. b. The normal distribution has the familiar bell sh
3.3.4 Beta Distribution
The beta(, ) pdf is
f (x|, ) =
1
x1 (1 x) 1 ,
B (, )
0 < x < 1,
> 0,
> 0,
where B (, ) denotes the beta function,
1
B (, ) =
x1 (1 x) 1 dx =
0
()( )
.
( + )
For n > , we have
1
1
xn x1 (1 x) 1 dx
B (, ) 0
B ( + n, )
( + n)( + )
=
3.4 Exponential Families
A family of pdfs or pmfs is called an exponential family if it can be expressed as
k
f (x| ) = h(x)c( ) exp
i=1
wi ( )ti (x) .
(1)
Here h(x) 0 and t1 (x), . . . , tk (x) are real-valued functions of the observation x (they cannot
3.6 Inequalities and Identities
Theorem 3.6.1 (Chebychevs Inequality) Let X be a random variable and let g (x) be a nonnegative function. Then, for any r > 0, P (g (X ) r) Eg (X ) . r
Proof:
Eg (X ) =
g (x)fX (x)dx g (x)fX (x)dx (g is nonnegative)
cfw_x:g
4. Multiple Random Variables
4.1 Joint and Marginal Distributions
Denition 4.1.1 An n-dimensional random vector is a function from a sample space S into Rn , n-dimensional Euclidean space. Suppose, for example, that with each point in a sample space we as
4.2 Conditional Distributions and Independence
Denition 4.2.1 Let (X, Y ) be a discrete bivariate random vector with joint pmf f (x, y ) and marginal pmfs fX (x) and fY (y ). For any x such that P (X = x) = fX (x) > 0, the conditional pmf of Y given that
3
Bivariate Transformations
Let (X, Y ) be a bivariate random vector with a known probability distribution. Let U = g 1 (X, Y )
and V = g2 (X, Y ), where g1 (x, y ) and g2 (x, y ) are some specied functions. If B is any subset of
R2 , then (U, V ) B if an
4
Hierarchical Models and Mixture Distributions
Example 4.1 (Binomial-Poisson hierarchy) Perhaps the most classic hierarchical model is the following. An insect lays a large number of eggs, each surviving with probability p. On the average, how many eggs
4.5 Covariance and Correlation
In earlier sections, we have discussed the absence or presence of a relationship between two random variables, Independence or nonindependence. But if there is a relationship, the relationship may be strong or weak. In this