Student- t Processes as Alternatives to Gaussian Processes Amar Shah Andrew Gordon Wilson Zoubin Ghahramani University of Cambridge University of Cambridge University of Cambridge Abstract We investigate the Student- t process as an alternative to the Gaussian process as a non- parametric prior over functions. We de- rive closed form expressions for the marginal likelihood and predictive distribution of a Student- t process, by integrating away an inverse Wishart process prior over the co- variance kernel of a Gaussian process model. We show surprising equivalences between dif- ferent hierarchical Gaussian process models leading to Student- t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equiv- alences. Overall, we show that a Student- t process can retain the attractive proper- ties of a Gaussian process – a nonparamet- ric representation, analytic marginal and pre- dictive distributions, and easy model selec- tion through covariance kernels – but has en- hanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly de- pend on the values of training observations. We verify empirically that a Student- t pro- cess is especially useful in situations where there are changes in covariance structure, or in applications such as Bayesian optimiza- tion, where accurate predictive covariances are critical for good performance. These advantages come at no additional computa- tional cost over Gaussian processes. 1 INTRODUCTION Gaussian processes are rich distributions over func- tions, which provide a Bayesian nonparametric ap- proach to regression. Owing to their interpretability, non-parametric flexibility, large support, consistency, Appearing in Proceedings of the 17 th International Con- ference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. JMLR: W&CP volume 33. Copy- right 2014 by the authors. simple exact learning and inference procedures, and impressive empirical performances [Rasmussen, 1996], Gaussian processes as kernel machines have steadily grown in popularity over the last decade. At the heart of every Gaussian process (GP) is a parametrized covariance kernel, which determines the properties of likely functions under a GP. Typ- ically simple parametric kernels, such as the Gaus- sian (squared exponential) kernel are used, and its pa- rameters are determined through marginal likelihood maximization, having analytically integrated away the Gaussian process. However, a fully Bayesian nonpara- metric treatment of regression would place a nonpara- metric prior over the Gaussian process covariance ker- nel, to represent uncertainty over the kernel function, and to reflect the natural intuition that the kernel does not have a simple parametric form.
You've reached the end of your free preview.
Want to read all 9 pages?
- Fall '19
- Normal Distribution, TP