H11N-04:
Model-free data analysis for source separation based on Non-Negative Matrix Factorization and k-means clustering (NMFk)

Monday, 15 December 2014: 8:45 AM
Velimir V Vesselinov and Boian Alexandrov, LANL, Santa Fe, NM, United States
Abstract:
The identification of the physical sources causing spatial and temporal fluctuations of state variables such as river stage levels and aquifer hydraulic heads is challenging. The fluctuations can be caused by variations in natural and anthropogenic sources such as precipitation events, infiltration, groundwater pumping, barometric pressures, etc. The source identification and separation can be crucial for conceptualization of the hydrological conditions and characterization of system properties. If the original signals that cause the observed state-variable transients can be successfully "unmixed", decoupled physics models may then be applied to analyze the propagation of each signal independently.

We propose a new model-free inverse analysis of transient data based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS) coupled with k-means clustering algorithm, which we call NMFk. NMFk is capable of identifying a set of unique sources from a set of experimentally measured mixed signals, without any information about the sources, their transients, and the physical mechanisms and properties controlling the signal propagation through the system. A classical BSS conundrum is the so-called “cocktail-party” problem where several microphones are recording the sounds in a ballroom (music, conversations, noise, etc.). Each of the microphones is recording a mixture of the sounds. The goal of BSS is to “unmix'” and reconstruct the original sounds from the microphone records. Similarly to the “cocktail-party” problem, our model-freee analysis only requires information about state-variable transients at a number of observation points, m, where m > r, and r is the number of unknown unique sources causing the observed fluctuations.

We apply the analysis on a dataset from the Los Alamos National Laboratory (LANL) site. We identify and estimate the impact and sources are barometric pressure and water-supply pumping effects. We also estimate the location of the water-supply pumping wells based on the available data. The possible applications of the NMFk algorithm are not limited to hydrology problems; NMFk can be applied to any problem where temporal system behavior is observed at multiple locations and an unknown number of physical sources are causing these fluctuations.