Chapter 1. Linear Regression with
One Predictor Variable
1.1 Statistical Relation Between Two
Variables
To motivate statistical relationships, let us
consider a mathematical relation between
two mathematical variables
x
and
y
.
This
may be represented by a functional rela
tion;
y
=
f
(
x
)
,
(1)
which says that given a value of
x,
there is
a unique value of
y,
which can be exactly
determined.
1
For example, the relation between the number
of hours(
x
) driven on a car and distance (
y
)
travelled may be given by
y
=
cx,
where
c
is the constant speed. There are many
examples in physical and other sciences of such
relations, known as the deterministic or exact
relationship.
•
To define a statistical relationship, we re
place the mathematical variables by ran
dom variables,
X
and
Y
and add a random
component of error
representing devia
tion from the true relation is given by
y
=
f
(
x
) +
(2)
2
Here (
x, y
) represent a typical value of the bi
variate random variable (
X, Y
)
.
•
Such a relation is also known as stochas
tic relation and models the random phe
nomenon where
(i) there is tendency of
Y
values to vary
around a smooth function and
(ii) there is a random scatter of points around
this systematic component.
Figure 1.1 presents the plot of heights and
weights of 23 students enrolled in my 2001
class of STAT360 (for the data given in
Table 1.1).
3
•
This graph shows the tendency of the data
to vary around a straight line.
This ten
dency of the variation in weights as func
tion of height is called linear trend.
Since
the points do not fall on a straight line, it
may be suitable to use a statistical rela
tionship,
i.e.
y
=
β
0
+
β
1
x
+
where
β
0
and
β
1
are unknown constants,
x
represents height and
y
represents weight,
and
represents a random error.
•
The subject matter of this course is the
study of such relationships.
4
Figure 1.1 Scatter Plot of HeightWeight
Data of STAT360 2001 Class
5
Table 1.1 Heights and Weights of 23 Students
in STAT 360 Class of 2001
Student ID
Height(Cms.)
Weight(Kgs.)
4126548
183.00
77.09
4281675
177.80
90.70
4100212
172.72
81.63
4411919
167.64
49.88
5936748
162.56
45.35
5919460
162.56
54.42
5945267
172.72
72.56
4276051
177.80
74.83
4084489
172.72
54.42
4139615
185.42
92.97
5928281
180.34
81.63
5922763
172.72
80.72
3630137
180.34
70.29
4751612
158.00
55.00
4767098
163.00
50.00
4767209
158.00
42.00
4766733
182.00
72.00
4766164
166.00
60.00
4763661
168.00
62.00
4766970
163.00
55.00
4763734
170.00
65.00
3952312
172.72
95.23
5928389
162.56
72.56
6
1.2 Regression Models
•
Terminology:
Regression
The conditional expectation given by
m
(
x
) =
E
(
Y

X
=
x
)
in a bivariate setting is called regression of
Y
on
X
. The term
regression
was used by
Sir Francis Galton (18221911) in study
ing the height of the offsprings as a func
tion of the heights of their parents in a
paper entitled ”Regression towards medi
ocrity in hereditary stature” (
Nature
, vol.
15, pp.507510).
7
•
In this paper Galton reported on his discov
ery that ”the offsprings did not resemble
their parents in size but tend to be always
more mediocre [i.e.
more average] than
they  to be smaller than the parents if par
ents were large; to be larger than parents
if they were very small...”
Thus the random variable
Y
may be as
sumed to vary around its mean
m
(
x
) as a
function of
X
, and denoting the random
deviation
Y
−
m
(
x
) by
, we can write
Y
=
m
(
x
) +
(3)
•
Note that the probability distribution of
You've reached the end of your free preview.
Want to read all 21 pages?
 Fall '09