Rules of thumb for zooplankton taxonomic assignment using metabarcoding
Rules of thumb for zooplankton taxonomic assignment using metabarcoding
Abstract:
Measuring planktonic communities is important given their critical role in nutrient cycling and ecosystem function, but the majority of planktonic organisms are challenging to study using traditional taxonomy. Metabarcoding is an alternative way to assess biological communities that sequences a specific portion of DNA and uses the differences between sequences to identify organisms by matching them with a reference database. The criteria used to assign taxonomy affects the result of metabarcoding, but there is not yet consensus among researchers on criteria to use. We used a curated zooplankton database developed by Smithsonian researchers to assess similarity thresholds for taxonomic identification at different taxonomic levels and for two genetic markers (18S V1-3 and COI). The Smithsonian STREAMCODE database includes morphological vouchers for zooplankton samples from the Gulf Stream and individual sequences for about 1,300 samples. We compared the taxonomic match of the sequences to a reference database with the identification obtained by the taxonomic experts to calculate the proportion of successful identifications as a function of percent similarity match. Additionally, we analyzed how the proportion of success, as a function of percent match, varies between taxonomic groups, genetic markers and the overall representation of each taxonomic group in the reference database. We found that in well-studied groups such as copepods, the phylum assignments were correct even with percent match as low as 75%, but for other groups such as polychaetes and pteropods the percent match needed for a correct assignment was closer to 85%. Good assignments for class and order mirror the phylum assignments in all groups, but family assignments showed more variability. Based on our results we provide guidelines to assign taxonomy based on percent match for different taxonomic groups and markers that we hope can be expanded by other similar studies.