Re-training a Joint U-Net-CNN Deep Learning Image Classification Pipeline for the Segmentation of Subsea Macrofauna
Abstract:
The presented segmentation pipeline consists of two seperate deep learning networks: U-Net, originally designed for biomedical image segmentation, and a VGG16 CNN, a common lightweight convolutional neural network. Images from CamHD are fed to the U-Net model, which performs semantic segmentation, before being cascaded into the VGG16 CNN for a final yes/no classification.
When trained and tested on a subset of Mushroom scenes, this pipeline has been shown to have an overall Average Precision of 0.671 at an Intersection over Union (IOU) value of 0.5 and a U-Net pixel-wise validation accuracy of 0.99. However, system accuracy was observed to decay as the test data became more temporally distant from the training data, likely due to the dynamic Mushroom environment scene. Such decay mandates regular network re-training, a tedious task given that each of the two sub-components must be trained independently of one-another. This work demonstrates that the U-Net portion of this pipeline is more robust to time variation than the CNN and that the re-training frequency of the U-Net remains much lower than the accompanying CNN. Furthermore, we show that substantial U-Net data augmentation typically has negligible impact on network performance, whereas CNN performance varies substantially with training data size.