Diagnosis of Insidious Data Disasters

Thursday, 18 December 2014
Jessica D Lundquist1, Nicholas E Wayand1, Adam Massmann1, Martyn P Clark2, Fred Lott3 and Nicoleta C Cristea1, (1)University of Washington, Seattle, WA, United States, (2)NCAR, Boulder, CO, United States, (3)King County Environmental Lab, Seattle, WA, United States
Measurements and modeling have gone hand-in-hand since hydrology began as a science. In most early work, the same person took the measurements and developed the model, and iterated between them until all information collectively made sense. Over time, research has become more specialized, and now many people use a model developed by someone else, compare model simulations to data collected by another someone else, pronounce success and proceed with research if the two match, and face a black hole of uncertainty if they don’t match. In many cases, the model is calibrated to achieve a match. In perhaps many more cases, the work is shelved; the apparent failure swept under the rug.

We present two case studies of apparent modeling failure, wherein all efforts at model calibration failed, where traditional data quality-control measures detected no problems, and where only extreme stubbornness and repetitive iteration between modeling and observations led to discovery of the root of the problem. These two cases are by no means a complete sampling of data disasters that have occurred, or that may occur in the future, and are probably more likely to be outliers that will [hopefully] never occur again. The point is to exemplify that such odd cases do occur, and that while the specific-oddity varies widely, odd cases are likely much more common than we are aware of. To quote from Arthur Conan Doyle’s Sherlock Holmes: “when you have eliminated the impossible, whatever remains, however improbable, must be the truth.”

The first case presents an issue with the water balance in the snow-fed Tuolumne River, Sierra Nevada, California, combined with modeling using the Distributed Hydrology Soil Vegetation Model (DHSVM, Wigmosta et al. 1994), and the second case presents an issue with the energy balance at Snoqualmie Pass, Washington, combined with modeling using the Structure for Understanding Multiple Modeling Alternatives (SUMMA, Clark et al., submitted). The figure presents the fundamental problems: In the Tuolumne (case 1), streamflow in one year was off by a factor of two; at Snoqualmie (case 2), nighttime surface temperatures were biased by about 10°C. The reasons for and solutions to these problems will be presented, and they’re not what you might guess first.