Statistical Regression Analysis
July 27, 2016
2
Chapter 1
Probability Distributions, Estimation,
and Testing
1.1
Introduction
Here we introduce probability distributions, and basic estimation/testing methods.
Random variables
are
outcomes of an experiment or datagenerating process, where the outcome is not known in advance, although
the set of possible outcomes is. Random variables can be
discrete
or
continuous
. Discrete random variables
can take on only a finite or countably infinite set of possible outcomes. Continuous random variables can
take on values along a continuum. In many cases, variables of one type may be treated as or reported as
the other type.
In general, we will use uppercase letters (such as
Y
) to represent random variables, and
lowercase letters (such as
y
) to represent specific outcomes. Not all (particularly applied statistics) books
follow this convention.
1.1.1
Discrete Random Variables/Probability Distributions
In many applications, the result of the datagenerating process is the count of a number of events of some
sort. In some cases, a certain number of trials are conducted, and the outcome of each trial is observed as a
“Success” or “Failure” (binary outcomes). In these cases, the number of trials ending in Success is observed.
Alternatively, a series of trials may be conducted until a preselected number of Successes are observed. In
other settings, the number of events of interest may be counted in a fixed amount of time or space, without
actually breaking the domain into a set of distinct trials.
For discrete random variables, we will use
p
(
y
) to represent the probability that the random variable
Y
takes on the value
y
. We require that all such probabilities be bounded between 0 and 1 (inclusive), and
that they sum to 1:
P
{
Y
=
y
}
=
p
(
y
)
0
≤
p
(
y
)
≤
1
summationdisplay
y
p
(
y
) = 1
3
4
CHAPTER 1.
PROBABILITY DISTRIBUTIONS, ESTIMATION, AND TESTING
The
cumulative distribution function
is the probability that a random variable takes on a value less
than or equal to a specific value
y
*
. It is an increasing function that begins at 0 and increases to 1, and
we will denote it as
F
(
y
*
). For discrete random variables it is a step function, taking a step at each point
where
p
(
y
)
>
0:
F
(
y
*
) =
P
(
Y
≤
y
*
) =
summationdisplay
y
≤
y
*
p
(
y
)
The
mean
or
Expected Value
(
μ
) of a random variable is it’s longrun average if the experiment was
conducted repeatedly ad infinitum. The
variance
(
σ
2
)
is the average squared difference between the random
variable and its mean, measuring the dispersion within the distribution. The
standard deviation
(
σ
) is
the positive square root of the variance, and is in the same units as the data.
μ
Y
=
E
{
Y
}
=
summationdisplay
y
yp
(
y
)
σ
2
Y
=
V
{
Y
}
=
E
braceleftBig
(
Y

μ
Y
)
2
bracerightBig
=
summationdisplay
y
(
y

μ
Y
)
2
p
(
y
)
σ
Y
= +
radicalBig
σ
2
Y
Note that for any function of
Y
, the expected value and variance of the function is computed as follows:
E
{
g
(
Y
)
}
=
summationdisplay
y
g
(
y
)
p
(
y
) =
μ
g
(
Y
)
V
{
g
(
Y
)
}
=
E
braceleftBig
(
g
(
Y
)

μ
g
(
Y
)
)
2
bracerightBig
=
summationdisplay
y
(
g
(
y
)

μ
g
(
Y
)
)
2
p
(
y
)
For any constants
a
and
b
You've reached the end of your free preview.
Want to read all 85 pages?
 Fall '07
 Jiahua chen