A Tale of Two Crowds
A Tale of Two Crowds
Abstract:
‘Big biological data’ sets are becoming common in oceanography with the advent of sampling technologies that can generate high-frequency observations for multiple data streams simultaneously. Identifying and implementing robust and efficient approaches to manage and analyze these big biological data (BBD) sets has become a primary challenge facing many biological oceanographers and marine ecologists alike. Using a large plankton imagery dataset generated by the “Deep Focus Plankton Imager-2” (formerly the ISIIS-2) system as an example, we present two ‘crowd-sourcing’ approaches applied to the problem of efficiently classifying tens of millions of images of individual plankters. The first approach uses ‘crowd sourcing’’ in its typical format by asking members of the general public to identify groups of plankton via a web-interface hosted by Zooniverse. The second approach engaged members of the data science community via a partnership with Kaggle and Booz Allen Hamilton, two data science industry leaders. We discuss how academic-industry partnerships were established, the questions we sought to answer via crowd-sourcing as well as the success, the pit-falls, and the surprising outcomes that were generated by each approach.