For a parametric model, the goal is to find the predefined parametric relationship between
features and targets, once done that, the model doesn’t need the training data for pre
diction. The features,
x
*
i

i
= 1
, ..., n
*
, will give all the information we need for prediction
y
*
i
’s.
However, in Gaussian Process Regression model, the prediction eq.
2.31 and eq.
2.32 need the information from training set to get the posterior distribution of the test
set.
2.3
Predicition from Gaussian Process Regression
Form eq.
2.31, we know that the posterior distribution of test data is Gaussian with
mean and variance given in eq.2.32. But if one wants an explicit prediction for the target
function rather than a distribution, what is the value then? One intuitive answer would
be the mean of posterior distribution. In practice, this prediction is appropriate in most
reallife tasks. Rasmussen and Williams provides indepth explanation in [13, sec 2.4], we
present a summary of the rationale in the following.
To find a optimal prediction value, we need a way to measure our performance. Let’s
define a
loss function
,
L
(ˆ
y, y
), which specifies the loss between the prediction, ˆ
y
and the
true value,
y
. The loss function can be
mean squared error
or
mean absolute error
which
are both symmetric loss functions.
One wants to give a prediction value that produces the minimum loss possible. But
how do we achieve that when we don’t know the true value? According to [13, sec 2.4],
we can minimize the expected loss, by averaging the loss with respect to the posterior
distribution, which is
˜
R
L
(ˆ
y

x
*
) =
Z
L
(
y
*
,
ˆ
y
)
p
(
y
*

x
*
)
dy
*
.
(2.33)
Our best prediction is the one that minimizes this loss, i.e.
y
optimal

x
*
= argmin
ˆ
y
˜
R
L
(ˆ
y

x
*
)
.
(2.34)
For
mean squared error
loss function, the minimum occurs at the mean of
y
*
. For
mean