A machine learning approach to anomaly detection and tide forecasting in a coastal sensor network

Lara Reichmann1, Isabel Houghton1, David Uminsky2, Connor Swanson2, Stanley H.I. Lio3 and Brian T Glazer3, (1)University of San Francisco, The Data Institute, San Francisco, CA, United States, (2)University of San Francisco, Data Institute, San Francisco, CA, United States, (3)University of Hawaii at Manoa, Honolulu, HI, United States
Abstract:
In-situ, wireless environmental sensors present a unique opportunity for real-time applications, ranging from adaptive sampling and decision making to early detection of natural disasters. These applications require quality assurance (QA) and quality control (QC) practices to ensure high quality data. QC traditionally relies on expert domain knowledge and human evaluation at post processing, which makes this process incompatible with real-time detection in streaming data. One of the challenges of real-time QC is the potential to discard good data that deviates from historical patterns, when it actually corresponds to an extreme natural event.

The goal of this project is to implement QC and forecasts of coastal environmental variables in real time for a network of affordable coastal watershed water level sensors growing from a focused pilot deployment in Hawaii(smartcoastlines.org). This sensor network is unique; it provides high spatial and temporal coverage of physical and biogeochemical ocean variables at a groundbreaking low manufacturing cost. Our machine learning approach to anomaly detection complements traditional rules-based models,using the spatial sensor redundancy to make predictions of the neighboring sensors. A regression tree model predicts future values at each node, and calculates the deviations between the predicted and actual results to produce an anomaly score. The second goal of this project, water level (WL) predictions, is challenging due to the lack of long-term series of WL data at each new low-cost node installed incrementally further away from federal instruments (e.g., NOAA). We apply Fourier decomposition to find the most important seasonality periods for each time series, and make WL predictions. Our approach has revealed novel tidal harmonics occurring in traditional Hawaiian fishponds. Moreover, this approach decreased the mean square root error (3 and 7-day into the future) by 20-40% with respect to other available models such as UTide. We built a user-friendly interface with forecasts visualizations that allows stakeholders to interact with data in real-time. This is yet another step toward the goal ofdemocratization ofocean observing science and data interpretation.