H11D-1369
A ‘Large Catchment Sample’ Investigation of the Performance, Reliability and Robustness of Two Conceptual Rainfall-Runoff Models.
Abstract:
This presentation reports a ‘large-sample’ (2050 watersheds from around the world) intercomparison of 2 CRR model structures (GRX and MRX, both used for many research & operational applications), using a multi-objective evaluation process and two-way split-sample testing. The watershed sample represents a diversity of hydro-meteorological and measurement contexts thereby lending confidence and statistical robustness to the analyses and inferences drawn therefrom (Gupta et al., 2014). Overall, our results indicate that:(i) The GRX and MRX models both provide similar levels of long-term (aggregate) performance during calibration and evaluation (as assessed by the various metrics computed). Hence, their simulations (and biases) are strongly correlated.
(ii) Both models suffer from a lack of robustness when simulating water balance and streamflow variability, although simulation of streamflow timing and rate of change is quite good (as indicated by the long-term linear correlation between observed and simulated time-series).
(iii) The MRX model tends to provide better and more robust reproduction of short-term processes than the GRX model (as indicated by the distributions of short-term linear correlations between observed and simulated time-series).
(iv) Model performance variations from one period to another appear to be mainly due to temporal variations in the hydro-meteorological properties of the period.
(v) The use of KGE (Gupta et al., 2009) as an objective function tends to reduce long-term process model bias (on average), which is not the case with NSE (Nash & Sutcliffe, 1970).
Further, our results clearly show that sub-period variability in model performance can be quite high (especially for water balance), and that aggregate long-term (full period) statistics can tend to over-estimate true predictive performance of a hydrologic model. Our preliminary results indicate that there may be value in computing and examining distributions of the various model performance metrics over sub-period samples, instead of relying upon a single period-average deterministic value. This could greatly improve model diagnosis by helping to reveal situations involving model structural inadequacy, non-stationarity of hydro-meteorological processes and/or problems with measurement data.