PyEcholab: Processing Water Column Echosounder Data in the Cloud

Veronica Martinez, University of Colorado Boulder, Boulder, CO, United States; University of Colorado Boulder, CIRES, Boulder, CO, United States, Charles Anderson, Cooperative Institute for Research in Environmental Sciences, Boulder, CO, United States, Carrie Wall, University of Colorado at Boulder, NOAA NCEI, Boulder, United States, Rick Towler, NOAA Alaska Fisheries Science Center, Seattle, WA, United States, George Cutter, NOAA Southwest Fisheries Science Center, Antarctic Ecosystem Research Division, La Jolla, CA, United States and Josef Michael Jech, NOAA Northeast Fisheries Science Center, Woods Hole, United States
Abstract:
Veronica Martinez1,2*, Charles Anderson1,2, Carrie Wall1,2, Rick Towler3, RandyCutter4, Michael Jech5,

1 University of Colorado, Cooperative Institute for Research in Environmental Sciences (CIRES), Boulder, CO

2 NOAA National Centers for Environmental Information (NCEI), Boulder, CO

3 NMFS Alaska Fisheries Science Center, Seattle, WA

4 NMFS Southwest Fisheries Science Center, La Jolla, CA

5 NMFS Northeast Fisheries Science Center, Woods Hole, MA

* veronica.martinez@noaa.gov

Water column echosounder data are becoming increasingly available and are used for a diversity of research objectives. However these data are large, complex, and recorded in instrument-specific binary file formats. Tools to process these data are limited to a few relatively expensive commercial applications, or custom programs developed by individual researchers, which hinders potential users without sufficient resources or programming knowledge from using these data. To address this problem, NOAA Fisheries, University of Colorado Cooperative Institute for Research in Environmental Sciences, and NOAA National Centers for Environmental Information developed PyEcholab, an open-source, python-based system for efficiently reading, processing, and visualizing water column echosounder files. This extensible and robust toolkit is currently available to the public via GitHub (https://github.com/CI-CMG/pyEcholab). To increase accessibility and big data processing capabilities, NCEI is refactoring PyEcholab to make it deployable in a cloud computing environment. Access to data processing in the cloud means that users can analyze data from anywhere and can harness its dynamic resource scaling to efficiently and effectively process datasets of increasing size without acquiring expensive, powerful computers. With over 100 TB of water column echosounder data in the NCEI archive, a (forthcoming) cloud-based centralized repository and cloud-hosted analytical tools will enable bringing the processing to the data rather than data to the processing. The result will be the first cloud enabled processing tool for echosounder data that can be utilized anywhere and from any computer. More importantly, PyEcholab provides a simple “plug and play” framework that is community driven, promotes the development of new processing techniques, and empowers researchers to investigate echosounder data in their research.