Echopype: Interoperable and Scalable Processing of Ocean Sonar Data
Wu-Jung Lee, Applied Physics Laboratory University of Washington, Seattle, United States, Valentina Staneva, University of Washington, eScience Institute, Seattle, WA, United States and Kavin Nguyen, University of Washington, Department of Physics, United States
Abstract:
The recent broader availability of autonomous ocean sonar systems has created a data deluge. Despite their promising potential in advancing our understanding of the marine ecosystem, these new data remain significantly under-utilized. One of the root causes of this problem is the lack of interoperable data format and scalable analysis workflows that can be integrated with other oceanographic observations and adapt well with the rapidly increasing data volume. At present, most sonar data are stored in manufacturer-specific formats and analyzed using software packages that are mostly closed-source or written in proprietary languages. Many are GUI-based, which facilitates visual data exploration but hinders reproducibility. Furthermore, none of the existing packages supports parallel computation with random-access file formats. The sheer tedium in data wrangling has significantly diverted efforts of the research community away from answering scientific questions.
We address these challenges by developing an open-source software package, echopype, that leverages the power of existing distributed computing libraries in the scientific Python stack, such as Xarray and Dask. Echopype provides tools to convert manufacturer-specific data files to netCDF, which enables explicit and scalable computation of sonar data in a labeled, multi-dimensional format that is familiar to the wider oceanography and earth science communities. In this presentation, we will demonstrate echopype functionalities by directly processing publicly-available sonar data streams from the Ocean Observatories Initiative network in a cloud-hosted Jupyter environment. We envision the continued development of echopype as a catalyst to an open, community-driven effort in establishing best practices for analyzing large ocean sonar data sets.
The echopype code repository can be accessed at https://github.com/OSOceanAcoustics/echopype