Bayesian estimation of observation error covariance matrix in the equatorial Pacific

Genta Ueno, The Inst of Statistical Math, Tokyo, Japan
Abstract:
We develop a Bayesian technique for estimating the parameters in the observation noise covariance matrix Rt for ensemble data assimilation. We design a posterior distribution by using the ensemble-approximated likelihood and a Wishart prior distribution and present an iterative algorithm for parameter estimation. The present algorithm is identified as the expectation-maximization (EM) algorithm for a Gaussian mixture model and can estimate a number of parameters in Rt. The algorithm is an extension of that by Ueno and Nakamura (2014) for maximum-likelihood estimation. An advantage of the proposed method is that Rt can be estimated online, and more importantly, the temporal smoothness of Rt can be controlled by adequately choosing two parameters of the prior distribution, the covariance matrix S and the number of degrees of freedom ν. The parameters S and ν may vary with the time at which Rt is estimated. The ν parameter can be objectively estimated by maximizing the marginal likelihood. The present formalism can handle cases in which the number of data points or the data positions varies with time, the former case of which is exemplified in the experiments. We present an application to a coupled atmosphere-ocean model under each of the following assumptions: Rt is a scalar multiple of a fixed matrix (RttΣ, where αt is the scalar parameter and Σ is the fixed matrix), Rt is a diagonal matrix, Rt has fixed eigenvectors, or Rt has no specific structure. We verify that the proposed algorithm works well and that only a limited number of iterations are necessary. When Rt has one of the structures mentioned above, by assuming the prior covariance matrix to be the previous estimate, namely S=\hat{R}t-1, we obtain the Bayesian estimate of Rt that varies smoothly in time compared to the maximum-likelihood estimate at each time. When Rt has no specific structure, we need to regularize S=\hat{R}t-1 to maintain the positive-definiteness of S. Through twin experiments, we find that the best estimate of Rt is, in general, obtained by a combination of structure-free Rt and tapered S by the decorrelation lengths of half the size of the model ocean basin. From experiments using real observations, we find that the estimates of the structured Rt (having fixed structure Σ and being diagonal), lead to overfitting of the data compared to the structure-free Rt.