Visual Recognition Software for Binary Classification and its Application to Pollen Identification

Thursday, 18 December 2014
Surangi W. Punyasena, University of Illinois at Urbana Champaign, Department of Plant Biology, Urbana, IL, United States, David K Tcheng, University of Illinois at Urbana Champaign, Illinois Informatics Institute, National Center for Supercomputing Applications, Urbana, IL, United States and Ashwin Nayak, University of Illinois at Urbana Champaign, School of Integrative Biology, Urbana, IL, United States
An underappreciated source of uncertainty in paleoecology is the uncertainty of palynological identifications. The confidence of any given identification is not regularly reported in published results, so cannot be incorporated into subsequent meta-analyses. Automated identifications systems potentially provide a means of objectively measuring the confidence of a given count or single identification, as well as a mechanism for increasing sample sizes and throughput.

We developed the software ARLO (Automated Recognition with Layered Optimization) to tackle difficult visual classification problems such as pollen identification. ARLO applies pattern recognition and machine learning to the analysis of pollen images. The features that the system discovers are not the traditional features of pollen morphology. Instead, general purpose image features, such as pixel lines and grids of different dimensions, size, spacing, and resolution, are used. ARLO adapts to a given problem by searching for the most effective combination of feature representation and learning strategy. We present a two phase approach which uses our machine learning process to first segment pollen grains from the background and then classify pollen pixels and report species ratios. We conducted two separate experiments that utilized two distinct sets of algorithms and optimization procedures. The first analysis focused on reconstructing black and white spruce pollen ratios, training and testing our classification model at the slide level. This allowed us to directly compare our automated counts and expert counts to slides of known spruce ratios. Our second analysis focused on maximizing classification accuracy at the individual pollen grain level. Instead of predicting ratios of given slides, we predicted the species represented in a given image window. The resulting analysis was more scalable, as we were able to adapt the most efficient parts of the methodology from our first analysis.

ARLO was able to distinguish between the pollen of black and white spruce with an accuracy of ~83.61%. This compared favorably to human expert performance. At the writing of this abstract, we are also experimenting with experimenting with the analysis of higher diversity samples, including modern tropical pollen material collected from ground pollen traps.