Hunting Solomonoff’s Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

Tuesday, 16 December 2014
Grey S Nearing, Science Applications International Corporation Greenbelt, Greenbelt, MD, United States; NASA Goddard Space Flight Center, Greenbelt, MD, United States
Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe – i.e., physics.

The problem with physically-based hydrology models is, of course, that they aren’t really based on physics – they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton’s laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales.

Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don’t have infinite evidence – for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior?

I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff’s theorem was digital) of Solomonoff’s idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine “physical” and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff’s theorem to evaluate the tradeoff between model complexity and prediction power.