For a parametric model the goal is to find the predefined parametric

For a parametric model the goal is to find the

This preview shows page 20 - 23 out of 51 pages.

For a parametric model, the goal is to find the predefined parametric relationship between features and targets, once done that, the model doesn’t need the training data for pre- diction. The features, x * i | i = 1 , ..., n * , will give all the information we need for prediction y * i ’s. However, in Gaussian Process Regression model, the prediction eq. 2.31 and eq. 2.32 need the information from training set to get the posterior distribution of the test set. 2.3 Predicition from Gaussian Process Regression Form eq. 2.31, we know that the posterior distribution of test data is Gaussian with mean and variance given in eq.2.32. But if one wants an explicit prediction for the target function rather than a distribution, what is the value then? One intuitive answer would be the mean of posterior distribution. In practice, this prediction is appropriate in most real-life tasks. Rasmussen and Williams provides in-depth explanation in [13, sec 2.4], we present a summary of the rationale in the following. To find a optimal prediction value, we need a way to measure our performance. Let’s define a loss function , L y, y ), which specifies the loss between the prediction, ˆ y and the true value, y . The loss function can be mean squared error or mean absolute error which are both symmetric loss functions. One wants to give a prediction value that produces the minimum loss possible. But how do we achieve that when we don’t know the true value? According to [13, sec 2.4], we can minimize the expected loss, by averaging the loss with respect to the posterior distribution, which is ˜ R L y | x * ) = Z L ( y * , ˆ y ) p ( y * | x * ) dy * . (2.33) Our best prediction is the one that minimizes this loss, i.e. y optimal | x * = argmin ˆ y ˜ R L y | x * ) . (2.34) For mean squared error loss function, the minimum occurs at the mean of y * . For mean
Image of page 20
2.3 Predicition from Gaussian Process Regression 21 absolute error loss function, the minimum occurs at the median of y * . In our case, the distribution of y * is Gaussian, so the mean and median coincide. Furthermore, for other symmetric loss function and symmetric posterior distribution, the optimal prediction oc- curs at the mean of the distribution. For asymmetric loss functions, the optimal prediction can be computed from eq. 2.33 and eq. 2.3. For more detail, see [14].
Image of page 21
22 3 Model Selection When modelling the target function as a zero mean Gaussian process, i.e. f ( x ) ∼ GP ( 0 , k ( x , x 0 ) ) . the main task will be to find a good configuration of covariance matrix, generated by kernel function k and a set of hyperparameters that gives the best performance. The model selec- tion is a combination of choosing a good kernel function and optimizing hyperparameters of the kernel function. In this section, we will discuss several aspects of model selec- tion, namely marginal likelihood, covariance matrix, kernel functions, hyperparameters optimization.
Image of page 22
Image of page 23

You've reached the end of your free preview.

Want to read all 51 pages?

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture