STAT 331: Applied Linear Models
Tutorial 1: Solutions
version: 2015-09-24
·
07:53:12
Q1.
Install
R
on your computer, which can be downloaded from
here
.
Note:
the default
interface to
R
is extremely cumbersome and you should never use it. Instead, use
RStudio
as your interface.
Solution:
In addition to downloading
R
and
RStudio
, here are a couple of other pointers to
get started:
•
The official
Intro to
R
manual.
•
A
reference sheet
of commonly used functions.
•
Typing
?
before any function, e.g.,
?cov
, pulls up the help page on that function.
Q2.
Use
R
to generate
N
= 1000 random variables
x
1
, . . . , x
N
iid
∼ N
(0
,
5
2
).
Then, with
parameters
β
0
= 1,
β
1
= 2,
σ
= 3
.
5, generate data from the linear model
y
i
=
β
0
+
β
1
x
i
+
ǫ
i
,
ǫ
i
iid
∼ N
(0
, σ
2
)
.
(1)
Solution:
The function to simulate Normals is called
rnorm
:
args
(rnorm)
function
(n, mean = 0, sd = 1)
NULL
That is, the function generates
n
iid Normals with mean
mean
and standard deviation
sd
,
which by default take the values 0 and 1 respectively. The code to generate
x
1
, . . . , x
1000
iid
∼
N
(0
,
25) is thus
N
<-
1000
x
<-
rnorm(N, sd = 5)
Note that
R
functions accept arguments in many different ways. If you don’t name the
argument, the default order is assumed (i.e., the first argument was
n = N
). If you don’t
specify an argument, the default value is assumed (i.e.,
mean = 0
). Finally, you can spec-
ify arguments by name and then change the order.
For example an equivalent way of
generating the Normals is:
x
<-
rnorm(mean = 0, sd = 5, n = N)
1

Tutorial 1: Solutions
The vector
x
now contains 1000 iid Normals:
length(x)
# length of vector
[1]
1000
mean(x)
# sample mean
[1]
-0.00191465
sd(x)
# sample standard deviation
[1]
5.08595
head
(x)
# first 6 elements
[1]
1.992412
2.458632
2.366446 -1.733301 -3.962583
2.719608
Next, let’s simulate the response variables
y
i
ind
∼
N
(
β
0
+
β
1
x
i
, σ
2
), with (
β
0
, β
1
, σ
) =
(1
,
2
,
3
.
5). In many programming languages, you would do this using a for-loop:
# parameter values
beta0
<-
1
beta1
<-
2
sigma
<-
3.5
# generate response
y
<-
rep(
NA
, len = N)
# declare a vector of length N
for
(ii
in
1:N) {
y[ii]
<-
rnorm(n = 1, mean = beta0 + beta1 * x[ii], sd = sigma)
}
In
R
, however, many of the functions have built-in vectorization. That is, you could simply
do
# parameter values
beta0
<-
1
beta1
<-
2
sigma
<-
3.5
# generate response
y
<-
rnorm(N, mean = beta0 + beta1*x, sd = sigma)
The space for
y
is automatically allocated, an
R
knows to use each value of
beta0 + beta1*x
as the mean and the common value of
sigma
to generate each individual
y
i
. The latter
method is both shorter to code and actually computes faster than the former.
Use vector-
ization as much as possible
instead of for-loops in
R
for all-around better programming.
a. Calculate the MLE’s
ˆ
β
0
and
ˆ
β
1
.
Solution:
The MLE’s of
β
0
and
β
1
can be calculated as follows:
# MLE of beta1
xbar
<-
mean(x)
ybar
<-
mean(y)
Sxx
<-
sum((x-xbar)^2)
2

STAT 331: Applied Linear Models
Sxy
<-
sum((y-ybar)*(x-xbar))
beta1.hat
<-
Sxy/Sxx
# or equivalently
beta1.hat
<-
cov(x,y)/var(x)
# each of these just divides by N-1
beta1.hat - Sxy/Sxx
# check that they’re the same
[1]
-4.440892e-16
# MLE of beta0
beta0.hat
<-
mean(y) - beta1.hat * mean(x)
Here’s a trick to display the MLEs nicely which might come in handy for assignments.

#### You've reached the end of your free preview.

Want to read all 9 pages?

- Fall '08
- YuliaGel
- Linear Regression, Normal Distribution, Regression Analysis, Applied Linear Models