PyEcholab: Processing Water Column Echosounder Data in the Cloud
						
					
					
					
Abstract:
1 University of Colorado, Cooperative Institute for Research in Environmental Sciences (CIRES), Boulder, CO
2 NOAA National Centers for Environmental Information (NCEI), Boulder, CO
3 NMFS Alaska Fisheries Science Center, Seattle, WA
4 NMFS Southwest Fisheries Science Center, La Jolla, CA
5 NMFS Northeast Fisheries Science Center, Woods Hole, MA
* veronica.martinez@noaa.gov
Water column echosounder data are becoming increasingly available and are used for a diversity of research objectives. However these data are large, complex, and recorded in instrument-specific binary file formats. Tools to process these data are limited to a few relatively expensive commercial applications, or custom programs developed by individual researchers, which hinders potential users without sufficient resources or programming knowledge from using these data. To address this problem, NOAA Fisheries, University of Colorado Cooperative Institute for Research in Environmental Sciences, and NOAA National Centers for Environmental Information developed PyEcholab, an open-source, python-based system for efficiently reading, processing, and visualizing water column echosounder files. This extensible and robust toolkit is currently available to the public via GitHub (https://github.com/CI-CMG/pyEcholab). To increase accessibility and big data processing capabilities, NCEI is refactoring PyEcholab to make it deployable in a cloud computing environment. Access to data processing in the cloud means that users can analyze data from anywhere and can harness its dynamic resource scaling to efficiently and effectively process datasets of increasing size without acquiring expensive, powerful computers. With over 100 TB of water column echosounder data in the NCEI archive, a (forthcoming) cloud-based centralized repository and cloud-hosted analytical tools will enable bringing the processing to the data rather than data to the processing. The result will be the first cloud enabled processing tool for echosounder data that can be utilized anywhere and from any computer. More importantly, PyEcholab provides a simple “plug and play” framework that is community driven, promotes the development of new processing techniques, and empowers researchers to investigate echosounder data in their research.
