The Interface Between Data and Predictions through Machine Learning and Bayesian Networks

Monday, 15 December 2014: 9:15 AM
Michael N Fienen, USGS Wisconsin Water Science Center, Middleton, WI, United States and Bernard T Nolan, USGS Headquarters, Reston, VA, United States
The interface between models and data is often explored through parameter estimation and uncertainty quantification techniques. In the management decision-making context, it is often desirable to forge a path from data, through analysis, to predictions that guide decisions. This path benefits from the insights gained in process modeling and parameter estimation, but often requires a more rapid timeframe. Metamodeling— emulating the behavior of a process using a statistical mode—provides the opportunity to make predictions in areas where data are sparse or a process model is not available. Statistical modeling or machine learning techniques can create a link between data and predictions based on correlations informed by process model results. The model informs the correlations either by indicating which data points inform various predictions, or more explicitly through emulating the process model itself.

As part of the USGS National Water Quality Assessment, several datasets–with and without related process models–are explored using Artificial Neural Networks, other machine learning techniques, and Bayesian Networks. The techniques vary in their abilities to characterize uncertainty of predictions and in the structure of correlative connections within the networks. Performance is evaluated over both calibration and predictions using cross validation.