Classifying Short-finned Pilot Whale Acoustics with Deep Learning

Virginia Pan, Duke University, Electrical and Computer Engineering, Durham, NC, United States, Doug Nowacek, Duke University, NC, United States and Nicola Quick, Duke University, Durham, United States
Abstract:
Short-finned pilot whales are deep diving cetaceans that use sound for navigation, hunting, and communication. Scientists use digital acoustic recording tags (DTAGs) to collect depth, movement, and audio data that provide insight on their behavior. However, it is time consuming to conduct manual reviews of DTAG audio and these reviews require a trained ear. This study is the first to use deep learning to differentiate amongst short-finned pilot whale buzz and minibuzz vocalizations in DTAG recordings. A total of 2,784 buzz and minibuzz sound samples were identified from tag deployments on 25 different short-finned pilot whales offshore Cape Hatteras, NC during 2008, 2010, and 2011. Audit files of events were manually created and used to extract clips that contained either buzzes or minibuzzes. These snippets were high-pass filtered to eliminate low frequency noise, and then a short-time Fourier transform was performed on each sample, which served as the input form. Alternating convolution and rectified linear unit activation layers were applied to group sounds. A global max pooling layer was applied to check for the presence of a sound sequence in the whole audio clip, which resolved the challenge of working with differing length audio segments. Lastly, a fully connected layer, in which each cell from the max pooled layer “cast a vote”, was used to determine whether the clip was a buzz or minibuzz. The model was trained with 532 examples, including an equal number of buzzes and minibuzzes, and then was tested with 768 examples; it achieved 97.633% accuracy. Preprocessing the data was performed locally and deep learning was preformed using Amazon Web Services (AWS) with Keras backed by TensorFlow. Future work includes expanding the deep learning model to classify other types of acoustic events, such as calls, and pairing the classifier with a detection scheme to find and isolate events as input for the classifier. The completed workflow will enable scientists to more quickly analyze DTAG audio without having to aurally review entire records.