APPLIED STATISTICS
Inferential Tools for Simple Linear Regression
Dr Tao Zou
Research School of Finance, Actuarial Studies & Statistics
The Australian National University
Last Updated: Wed Aug 9 16:17:16 2017
1 / 33
Overview
Sampling Distribution of Estimation
Standard Error of Estimation
Hypothesis Testing
Confidence Intervals and Prediction Intervals
2 / 33
References
1.
F.L. Ramsey and D.W. Schafer
(2012)
Chapter 7 of
The Statistical Sleuth
2.
The slides are made by
R Markdown
.
3 / 33
Distinguish Parameters and Estimation
SLR model
μ
{
Y

X
}
=
β
0
+
β
1
X
and
real data
(
X
1
,
Y
1
)
,
· · ·
,
(
X
n
,
Y
n
)
.
Parameters
Estimation
(notation hat "
ˆ
")
β
1
∑
n
i
=
1
(
X
i

¯
X
)(
Y
i

¯
Y
)
∑
n
i
=
1
(
X
i

¯
X
)
2
(denoted by
ˆ
β
1
)
β
0
¯
Y

ˆ
β
1
¯
X
(denoted by
ˆ
β
0
)
unknown for
real data
can be computed based on real data
the value is unique
can be different for different datasets
Hence, if we have another sample/dataset, e.g.,
(
X
n
+
1
,
Y
n
+
1
)
,
· · ·
,
(
X
n
+
n
,
Y
n
+
n
)
, we will obtain different realisations of
ˆ
β
0
and
ˆ
β
1
.
4 / 33
Sampling Distributions of
ˆ
β
0
and
ˆ
β
1
The distributions of the realisations are the sampling distributions.
We consider the sampling distributions of
ˆ
β
0
and
ˆ
β
1
given the values of the
explanatory variables.
It can be shown mathematically that the sampling distributions of
ˆ
β
0
and
ˆ
β
1
are both
normal
.
5 / 33
Sampling Distributions of
ˆ
β
0
and
ˆ
β
1
(Con’d)
Sampling distribution of
ˆ
β
1
:
ˆ
β
1
is normal distributed;
Mean=
E
(
ˆ
β
1
) =
β
1
;
Spread=SD(
ˆ
β
1
)=
σ
1
(
n

1
)
s
2
X
.
Sampling distribution of
ˆ
β
0
:
ˆ
β
0
is normal distributed;
Mean=
E
(
ˆ
β
0
) =
β
0
;
Spread=SD(
ˆ
β
0
)=
σ
1
n
+
¯
X
2
(
n

1
)
s
2
X
.
Here,
s
2
X
=
1
n

1
∑
n
i
=
1
(
X
i

¯
X
)
2
.
Knowing the sampling distributions allows us to make inferences about
β
0
and
β
1
.
Remark
: SLR model assumptions 1 & 2 & 3 can be described by
Y
=
β
0
+
β
1
X
+
E
, where
E ∼
N
(
0
, σ
2
)
. It follows
Y
∼
N
(
β
0
+
β
1
X
, σ
2
)
.
6 / 33
Example: Simulation for SLR
μ
(
Y

X
) =
β
0
+
β
1
X
.
1.
Set the
unknown
parameters
β
0
=
2 and
β
1
=
1 by yourselves.
rm
(
list=
ls
())
beta0=
2
;beta1=
1
2.
Randomly generate
R
=
1000
repeated samples
of
{
Y
i
,
X
i
}
n
i
=
1
. For
each sample, obtain the
statistics
required in your analysis. Here we
consider the least squares estimates
ˆ
β
0
and
ˆ
β
1
as the statistics.
n=
100
R=
1000
hatbeta0=
rep
(
0
,R)
#space to store the different realisations of estimations
hatbeta1=
rep
(
0
,R)
set.seed
(
1
)
for(r in
1
:R) {
X=
1
:n
#our X values
errors=
rnorm
(n)
#generate a set of errors, which implies sigma=1
Y=beta0+beta1*X+errors
#generate a set of response values
SLRfit=
lm
(Y~X)
#fit the SLR
hatbeta0[r]=SLRfit$coef[
1
]
#get the estimation
hatbeta1[r]=SLRfit$coef[
2
]
}
3.
The sampling distribution of the statistics can be described by the
R
=
1000 different statistics values from
R
=
1000
repeated samples
.
7 / 33
“set.seed()”
set.seed
(
1
)
head
(
rnorm
(n))
## [1] 0.6264538
0.1836433 0.8356286
1.5952808
0.3295078 0.8204684
set.seed
(
2
)
head
(
rnorm
(n))
## [1] 0.89691455
0.18484918
1.58784533 1.13037567 0.08025176
0.13242028
set.seed
(
1
)
head
(
rnorm
(n))
## [1] 0.6264538
0.1836433 0.8356286
1.5952808
0.3295078 0.8204684
seed=1
seed=2
· · ·
0.6264538
0.89691455
· · ·
0.1836433
0.18484918
· · ·
0.8356286
1.58784533
· · ·
1.5952808
1.13037567
· · ·
0.3295078
0.08025176
· · ·
0.8204684
0.13242028
· · ·
.
You've reached the end of your free preview.
Want to read all 33 pages?
 Three '14