Data-Driven Modeling of the Distribution of Diazotrophs in the Global Ocean

Weiyi Tang, Princeton University, Department of Geosciences, Princeton, NJ, United States; Duke University, Nicholas School of the Environment, Durham, NC, United States and Nicolas Cassar, Duke University, Division of Earth & Climate Sciences, Nicholas School of the Environment, Durham, United States; Institut Universitaire Européen de la Mer (IUEM), Laboratoire des Sciences de l'Environnement Marin (LEMAR), Brest, France
Diazotrophs play a critical role in the biogeochemical cycling of nitrogen, carbon and other elements in the global ocean. Despite their well-recognized role, the diversity, abundance and distribution of diazotrophs in the world’s ocean remain poorly characterized largely due to limited observations. Here we update the database of marine diazotroph nifH gene abundances and assess how environmental factors may regulate diazotrophs at the global scale. Our meta-analysis more than doubles the number of observations in the previous database. Using linear and nonlinear regressions, we find that the abundances of Trichodesmium, UCYN-A, UCYN-B and Richelia relate differently to temperature, light and nutrients. The distinct ecophysiologies of diazotrophs argue for separate parameterizations of different diazotrophs in model simulations. We further apply a machine learning algorithm - random forest - to estimate the global distributions of these diazotrophic groups and reveal their distinct niches. Our data-driven estimates agree well with independent field observations but show substantial discrepancies with prognostic models. This study highlights directions for future field work by identifying regions worthy of further investigation because they may harbor diazotrophic hotspots (e.g. coastal waters; southern Indian Ocean), remain presently undersampled (e.g. South Atlantic, eastern tropical Pacific), or produce large discrepancies in model simulations (e.g. polar regions).