Towards an open, underwater image repository (FathomNet) for automated detection and classification of midwater and benthic targets using machine learning

Kakani Katija1, Benjamin Woodward2,3, Brian Schlining4, Lonny Lundsten1, Kevin Barnard5,6 and Katherine Lynn Croff Bell7, (1)Monterey Bay Aquarium Research Institute, Moss Landing, CA, United States, (2)CVision AI, Medford, United States, (3)CVision AI Inc, Medford, MA, United States, (4)Monterey Bay Aquarium Research Institute, Moss Landing, United States, (5)Colorado School of Mines, Golden, United States, (6)Monterey Bay Aquarium Research Institute, Moss Landing, MA, United States, (7)Ocean Exploration Trust, Narragansett, RI, United States
Ocean-going platforms are integrating high-resolution, multi-camera feeds for observation and navigation, producing a deluge of visual data. The volume and rate of this data collection can rapidly outpace researchers’ abilities to process and analyze them. Recent advances in machine learning enable fast, sophisticated analysis of visual data, but have had limited success in the oceanographic world due to lack of dataset standardization, sparse annotation tools, and insufficient formatting and aggregation of existing, expertly curated imagery for use by data scientists. To address this need, we are building FathomNet, a public platform that makes use of existing (and future), expertly curated data. Initial efforts have leveraged MBARI’s Video Annotation and Reference System and annotated deep sea video database, which has more than 6M annotations, 1M framegrabs, and 4k terms in the knowledgebase. FathomNet now has over 60k images of midwater and benthic classes (to the genus level); 3k of these images have more than 23k bounding boxes and labels that span 198 classes. In parallel, we have been evaluating the efficacy of weakly supervised localization, image level labels, bounding box algorithms, and hierarchical data structures in generation of image training sets. We will show how these tools can be applied to other institutional video data, (e.g., National Geographic Society’s DropCam and NOAA’s ROV Deep Discoverer), and enable automated tracking of state-changing midwater animals. As FathomNet continues to develop and incorporate more image data from other oceanographic community members, we hope that this effort will ultimately enable scientists, explorers, policymakers, storytellers, and the public to understand and care for our ocean.