RumEnKF: running very large Ensembles Kalman Filter by forgetting what you just did.

Wednesday, 17 December 2014
Rolf Hut, Delft University of Technology, Delft, Netherlands, Barnabas A. Amisigo, Council for Scientific and Industrial Research (CSIR), Water Research Institute (WRI), Accra, Ghana, Susan C Steele-Dunne, Delft University of Technology, Delft, 5612, Netherlands and Nick Van De Giesen, Delft University of Technology, Faculty of Civil Engineering and Geosciences, Delft, 5612, Netherlands
The eWaterCycle project works towards running an operational hyper-resolution hydrological global model, assimilating incoming satellite data in real time, and making 14 day predictions of floods and droughts.

A problem encountered in the eWatercycle project is that the computer memory needed to store a single ensemble member becomes so large that storing enough ensembles to run the EnKF is impossible, even when using mitigating strategies such as covariance inflation or localization.

Reduction of Used Memory Ensemble Kalman Filtering (RumEnKF) is introduced as a variant on the Ensemble Kalman Filter (EnKF). RumEnKF differs from EnKF in that it does not store the entire ensemble, but rather only saves the first two moments of the ensemble distribution. In this way, the number of ensemble members that can be calculated is less dependent on available memory, and mainly on available computing power (CPU). RumEnKF is developed to make optimal use of current generation super computer architecture, where the number of available floating point operations (flops) increases more rapidly than the available memory and where inter-node communication can quickly become a bottleneck. In this presentation, two simple models are used (auto-regressive and Lorenz) to show that RumEnKF performs similar to the EnKF. Furthermore, it is also shown that increasing the ensemble size has a similar impact on the estimation error from the two algorithms

In this preliminary results, RumEnKF reduces the used memory compared to the EnKF when the number of ensemble members is greater than half the number of state variables. Future research will focus on strategies to further reduce the memory burden of running non-linear data assimilation on very large models.