zbMATH — the first resource for mathematics

Exploiting Hessian matrix and trust-region algorithm in hyperparameters estimation of Gaussian process. (English) Zbl 1097.65019
Nonparametric Bayesian approach to Gaussian regression is considered. The training data \((x_i,t_i)_{i=1}^N\) consist of output \(t_i\in\mathbb{R}\) and input vectors \(x_i\in \mathbb{R}^L\). Their distribution is described by \[ \mathbf{P}(t| \;x,\Theta)\propto \exp\left( -{1\over 2}t^TC^{-1}(\vartheta)t \right), \] where \(C\) is a matrix with entries
\[ c(x_i,x_j,\Theta)= \alpha\exp \left( -{1\over 2}\sum_{l=1}^L (x_i^{(l)}-x_j^{(l)})^2d_l \right) +\nu\delta_{ij}, \]
\(\Theta=(\alpha,d_1,\dots,d_L,\nu)\) being the hyperparameter. It is proposed to estimate \(\Theta\) using maximum likelihood, i.e. to minimize the negative log-likelihood \(L(\Theta)={1\over 2}\log\det C(\Theta)+{1\over 2}t^TC^{-1}(\Theta)t\). The authors describe the form of the Hessian matrix for \(L\) and propose a second-order trust-region algorithm for the minimization of \(L\). Results of numerical simulations are presented.

65C60 Computational problems in statistics (MSC2010)
62G08 Nonparametric regression and quantile regression
Full Text: DOI
[1] Hornik, K.; Stinchcombe, M.; White, H., Multilayer feedforward networks are universal approximators, Neural networks, 2, 259-366, (1989) · Zbl 1383.92015
[2] Baum, E.B.; Haussler, D., What size net gives valid generalization?, Neural computation, 6, 151-160, (1989)
[3] Neal, R.M., Bayesian learning for neural networks, Lecture notes in statistics, 118, (1996), Springer New York · Zbl 0888.62021
[4] C.E. Rasmussen, Evaluation of Gaussian processes and other methods for non-linear regression, Ph.D. thesis, University of Toronto, 1996.
[5] MacKay, D.J.C., Introduction to Gaussian processes, Neural networks and machine learning, F: computer and systems sciences, vol. 168, (1998), Springer Berlin, Heidelberg, pp. 133-165 · Zbl 0911.65004
[6] Williams, C.K.I.; Barber, D., Bayesian classification with Gaussian processes, IEEE transactions on pattern analysis and machine intelligence, 20, 1342-1351, (1998)
[7] S. Sambu, M. Wallat, T. Graepel, K. Obermayer, Gaussian process regression: active data selection and test point rejection, in: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 3, 2000, pp. 241-246.
[8] T. Yoshioka, S. Ishii, Fast Gaussian process regression using representative data, in: Proceedings of International Joint Conference on Neural Networks, 1, 2001, pp. 132-137.
[9] D.J. Leith, W.E. Leithead, E. Solak, R. Murray-Smith, Divide and conquer identification using Gaussian process priors, in: Proceedings of the 41st IEEE Conference on Decision and Control, 1, 2002, pp. 624-629.
[10] Shi, J.Q.; Murray-Smith, R.; Titterington, D.M., Bayesian regression and classification using mixtures of multiple Gaussian processes, International journal of adaptive control and signal processing, 17, 149-161, (2003) · Zbl 1019.93001
[11] Solak, E.; Murray-Smith, R.; Leithead, W.E.; Leith, D.J.; Rasmussen, C.E., Derivative observations in Gaussian process models of dynamic systems, Advances in neural information processing systems, 15, 1033-1040, (2003)
[12] W.E. Leithead, E. Solak, D.J. Leith, Direct identification of nonlinear structure using Gaussian process prior models, in: European Control Conference, Cambridge, 2003.
[13] O’Hagan, A., On curve Fitting and optimal design for regression, Journal of the royal statistical society B, 40, 1-42, (1978) · Zbl 0374.62070
[14] Mardia, K.V.; Marshall, R.J., Maximum likelihood estimation for models of residual covariance in spatial regression, Biometrika, 71, 135-146, (1984) · Zbl 0542.62079
[15] Cressie, N.A.C., Statistics for spatial data, (1993), John Wiley and Sons New York · Zbl 0468.62095
[16] Paciorek, C.J.; Schervish, M.J., Nonstationary covariance functions for Gaussian process regression, Advances in neural information processing systems, 16, (2003)
[17] Nocedal, J., Theory of algorithms for unconstrained optimization, Acta numerica, 199-242, (1992) · Zbl 0766.65051
[18] Moller, M.F., A scaled conjugate gradient algorithm for fast supervised learning, Neural networks, 6, 525-533, (1993)
[19] Golub, G.H.; Van Loan, C.F., Matrix computations, (1996), Johns Hopkins University Press Baltimore · Zbl 0865.65009
[20] Skilling, J., Bayesian numerical analysis, physics and probability, (1993), Cambridge University Press
[21] Wu, Z.; Phillips, G.N.; Tapia, R.; Zhang, Y., A fast Newton method for entropy maximization in statistical phase estimation, Acta crystallographica A, 57, 681-685, (2001) · Zbl 1370.82040
[22] Alexandrov, N.; Dennis, J.E.; Lewis, R.M.; Torczon, V., A trust region framework for managing the use of approximation models in optimization, Journal on structural optimization, 15, 16-23, (1998)
[23] Dennis, J.E.; Schnabel, R.B., Numerical methods for unconstrained optimization and nonlinear equations, (1983), Prentice-Hall Englewood, NJ · Zbl 0579.65058
[24] More, J.J.; Sorensen, D.C., Computing a trust region step, SIAM journal of scientific and statistically computing, 4, 553-572, (1983) · Zbl 0551.65042
[25] Steihaug, T., The conjugate gradient method and trust regions in large scale optimization, SIAM journal on numerical analysis, 20, 626-637, (1983) · Zbl 0518.65042
[26] Byrd, R.H.; Schnabel, R.B.; Shultz, G.A., Approximate solution of the trust region problem by minimization over two-dimensional subspaces, Mathematical programming, 40, 247-263, (1988) · Zbl 0652.90082
[27] Conn, A.R.; Gould, N.I.M.; Toint, Ph.L., Trust region methods, (2000), SIAM Philadelphia · Zbl 0643.65031
[28] The MathWorks Inc., Optimization Toolbox User’s Guide, Version 2.1, 2000.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.