Implementation of Similarity Based Kriging in Open Source Software and Application to Uncertainty Quantification and Reduction in Hydrogeological Inversion

Wednesday, 17 December 2014
Robert Komara and David Ginsbourger, University of Bern, Bern, Switzerland
We present the implementation of Similarity Based Kriging (SBK). This approach extends Gaussian process regression (GPR) methods, typically restricted to Euclidean spaces, to spaces that are non-Euclidean or perhaps even non-metric. SBK was inspired by problems in aquifer modeling, where inputs of numerical simulations are typically curves and parameter fields, and predicting scalar or vector outputs by Kriging with such very high-dimensional inputs may seem not feasible at first.

SBK combines ideas from the distance-based set-up of Scheidt and Caers (2009) with GPR and allows calculating Kriging predictions based only on similarities between inputs rather than on their high-dimensional representation. Written in open source code, this proposed approach includes automated construction of SBK models and provides diagnostics to assess model quality both in terms of covariance fitting and internal/external prediction validation.

Covariance hyperparameters can be estimated both by maximum likelihood and leave-one-out cross validation relying in both cases on efficient formulas and a hybrid genetic optimization algorithm using derivatives. The determination of the best dimension for Classical multidimensional scaling (MDS) and non-metric MDS of the data will be investigated. Application of this software to real life data examples in Euclidean and non-Euclidean (dis)similarity settings will be covered and touch on aquifer modeling, hydrogeological forecasting, and sequential inverse problem solving.

In the last case, a novel approach where a variant of the expected improvement criterion is used for choosing several points at a time will be presented. This part of the method and the previous covariance hyperparameter estimation parallelize naturally and we demonstrate how to save computation time by optimally distributing function evaluations over multiple cores or processors.