Bird Audio Diarization with Faster R-CNN

Conference paper


Shrestha, R., Glackin, C., Wall, J. and Cannings, N. 2021. Bird Audio Diarization with Faster R-CNN. 30th International Conference on Artificial Neural Networks (ICANN). Online 14 - 17 Sep 2021 Springer. https://doi.org/10.1007/978-3-030-86362-3_34
AuthorsShrestha, R., Glackin, C., Wall, J. and Cannings, N.
TypeConference paper
Abstract

Birds embody particular phonic and visual traits that distinguish them from 10,000 distinct bird species worldwide. Birds are also perceived to be indicators of biodiversity due to their propensity for responding to changes in their environment. An effective, automatic wildlife monitoring system based on bird bioacoustics, which can support manual classification, can be pivotal for the protection of the environment and endangered species. In modern machine learning, real-life bird audio classification is still considered as an esoteric challenge owing to the convoluted patterns present in bird song, and the complications that arise when numerous bird species are present in a common setting. Existing avian bioacoustic monitoring systems struggle when multiple bird species are present in an audio segment. To overcome these challenges, we propose a novel Faster Region-Based Convolutional Neural Network bird audio diarization system that incorporates object detection in the spectral domain and performs diarization of 50 bird species to effectively tackle the `which bird spoke when?' problem. Benchmark results are presented using the Bird Songs from Europe dataset achieving a Diarization Error Rate of 21.81, Jaccard Error Rate of 20.94 and F1, precision and recall values of 0.85, 0.83 and 0.87 respectively.

KeywordsDeep Neural Networks; Audio Classi cation; Diarization; Automatic Wildlife Monitoring
Year2021
Conference30th International Conference on Artificial Neural Networks (ICANN)
PublisherSpringer
Accepted author manuscript
License
File Access Level
Anyone
Publication dates
Online07 Sep 2021
Publication process dates
Accepted15 Jun 2021
Deposited01 Jul 2021
Journal citationpp. 415-426
ISSN0302-9743
Book titleArtificial Neural Networks and Machine Learning – ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part I
Book editorFarkaš, I.
Masulli, P.
Otte, S.
Wermter, S.
ISBN978-3-030-86361-6
Digital Object Identifier (DOI)https://doi.org/10.1007/978-3-030-86362-3_34
Web address (URL)https://www.springer.com/gb/book/9783030863616
Copyright holder© Springer Nature Switzerland AG 2021
Additional information

The final authenticated version is
available online at https://doi.org/10.1007/978-3-030-86362-3_34

Permalink -

https://repository.uel.ac.uk/item/8986x

Restricted files

Accepted author manuscript

  • 86
    total views
  • 2
    total downloads
  • 9
    views this month
  • 0
    downloads this month

Export as

Related outputs

Resolving Ambiguity in Hedge Detection by Automatic Generation of Linguistic Rules
Goodluck Constance, T., Bajaj, N., Rajwadi, M., Maltby, H., Wall, J., Moniri, M., Woodruff, C., Laird, T., Laird, J., Glackin, C. and Cannings, N. 2021. Resolving Ambiguity in Hedge Detection by Automatic Generation of Linguistic Rules. 30th International Conference on Artificial Neural Networks (ICANN). Online 14 - 17 Sep 2021 Springer. https://doi.org/10.1007/978-3-030-86383-8_30
An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement
Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement. Electronics. 10 (Art. 17). https://doi.org/10.3390/electronics10010017
Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures
Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206623
A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206928
Fraud detection in telephone conversations for financial services using linguistic features
Bajaj, N., Goodluck Constance, T., Rajwadi, M., Wall, J., Moniri, M., Glackin, C., Cannings, N., Woodruff, C. and Laird, J. 2019. Fraud detection in telephone conversations for financial services using linguistic features. Neural Information Processing Systems - NeurIPS 2019. Vancouver, Canada 08 - 14 Dec 2019 AI for Social Good Workshop NeurIPS.
A Framework for Augmented Reality Based Shared Experiences
Ali, A., Glackin, C., Cannings, N., Wall, J., Sharif, S. and Moniri, M. 2019. A Framework for Augmented Reality Based Shared Experiences. Immersive Learning Research Network - iLRN. London, UK 23 - 27 Jun 2019 Technischen Universität Graz. https://doi.org/10.3217/978-3-85125-657-4-24
Smart Transcription
Wall, J., Glackin, C., Dugan, N. and Cannings, N. 2019. Smart Transcription. 31st European Conference on Cognitive Ergonomics. Belfast, UK 10 - 13 Sep 2019 ACM. https://doi.org/10.1145/3335082.3335114
Explaining Sentiment Classification
Rajwadi, M., Glackin, C., Wall, J., Chollet, G. and Cannings, N. 2019. Explaining Sentiment Classification. Interspeech 2019. Graz, AT 15 - 19 Sep 2019 International Speech Communication Association. https://doi.org/10.21437/Interspeech.2019-2743
Towards a More Representative Definition of Cyber Security
Schatz, Daniel, Bashroush, R. and Wall, J. 2017. Towards a More Representative Definition of Cyber Security. Journal of Digital Forensics, Security and Law. 12 (2), pp. 53-74. https://doi.org/10.15394/jdfsl.2017.1476
Solving the Linearly Inseparable XOR Problem with Spiking Neural Networks
Wall, J. and Reljan-Delaney, M. 2017. Solving the Linearly Inseparable XOR Problem with Spiking Neural Networks . SAI Computing Conference 2017. London, UK 18 - 20 Jul 2017 IEEE. https://doi.org/10.1109/SAI.2017.8252173
Privacy preserving encrypted phonetic search of speech data
Wall, J., Glackin, C., Chollet, G., Dugan, N., Cannings, N., Tahir, S., Ghosh Ray, I. and Rajarajan, M. 2017. Privacy preserving encrypted phonetic search of speech data. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Louisiana, USA 05 - 09 Mar 2017 IEEE. pp. 6414-6418 https://doi.org/10.1109/ICASSP.2017.7953391
Spiking neuron models of the medial and lateral superior olive for sound localisation
Wall, J., McDaid, L.J., Maguire, L.P. and McGinnity, T.M. 2008. Spiking neuron models of the medial and lateral superior olive for sound localisation. IEEE International Joint Conference on Neural Networks (IJCNN) (IEEE World Congress on Computational Intelligence). Hong Kong 01 - 08 Jun 2008 Hong Kong IEEE. pp. 2641-2647 https://doi.org/10.1109/IJCNN.2008.4634168
A comparison of sound localisation techniques using cross-correlation and spiking neural networks for mobile robotics
Wall, J., McGinnity, Thomas M. and Maguire, Liam P. 2011. A comparison of sound localisation techniques using cross-correlation and spiking neural networks for mobile robotics. Neural Networks (IJCNN), The 2011 International Joint Conference on. San Jose, CA 31 Jul - 05 Aug 2011 IEEE. pp. 1981-1987
Deep Laterally Recurrent Spiking Neural Networks for Speech Enhancement
Wall, J. 2016. Deep Laterally Recurrent Spiking Neural Networks for Speech Enhancement. UEL Computing & Engineering Showcase. London, UK 16 Jun 2016 UEL.
A spiking neural network implementation of sound localisation
Wall, J., McDaid, L.J., Maguire, L.P. and McGinnity, T.M. 2007. A spiking neural network implementation of sound localisation. IET Irish Signals and Systems. Derry, UK 13 - 14 Sep 2007 Derry, UK pp. 1-5
Using the interaural time difference and cross-correlation to localise short-term complex noises
Wall, J., McGinnity, Martin and Maguire, Liam 2011. Using the interaural time difference and cross-correlation to localise short-term complex noises. Artificial Intelligence and Cognitive Science (AICS). Derry, UK 31 Aug - 02 Sep 2011 University of Ulster, Intelligent Systems Research Centre.
A Framework for Realistic 3D Tele-Immersion
Fechteler, P., Hilsmann, A., Eisert, P., Broeck, S.V., Stevens, C., Wall, J., Sanna, M., Mauro, D.A., Kuijk, F., Mekuria, R., Cesar, P., Monaghan, D., O'Connor, N.E., Daras, P., Alexiadis, D. and Zahariadis, T. 2013. A Framework for Realistic 3D Tele-Immersion. 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications. Berlin, Germany 2013 New York, NY, USA ACM. pp. 1-8 https://doi.org/10.1145/2466715.2466718
Spiking Neural Network Connectivity and its Potential for Temporal Sensory Processing and Variable Binding
Wall, J. and Glackin, Cornelius 2013. Spiking Neural Network Connectivity and its Potential for Temporal Sensory Processing and Variable Binding. Frontiers Media SA.
A Roadmap for Privacy Preserving Speech Processing
Wall, J., Glackin, C., Chollet, G., Dugan, N., Cannings, N., Tahir, S., Ghosh Ray, I, Rajarajan, M., Falkner, R. and Badii, A. 2016. A Roadmap for Privacy Preserving Speech Processing. Preserving Privacy in an Age of Increased Surveillance – A Biometrics Perspective. London, UK 17 - 17 Oct 2016
Recurrent lateral inhibitory spiking networks for speech enhancement
Wall, J., Glackin, Cornelius, Cannings, Nigel, Chollet, Gerard and Dugan, Nazim 2016. Recurrent lateral inhibitory spiking networks for speech enhancement. IEEE International Joint Conference on Neural Networks (IJCNN). Vancouver, Canada 24 - 29 Jul 2016 IEEE. pp. 1023-1028 https://doi.org/10.1109/IJCNN.2016.7727310
Post-Cochlear Auditory Modelling for Sound Localisation using Bio-Inspired Techniques
Wall, J. 2010. Post-Cochlear Auditory Modelling for Sound Localisation using Bio-Inspired Techniques. PhD Thesis University of Ulster Faculty of Computing and Engineering
Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds
Wall, J., Izquierdo, E. and Zhang, Q. 2013. Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds. in: Proceedings 2013 18th International Conference on Digital Signal Processing (DSP) IEEE. pp. 1-6
Advancements and Challenges towards a Collaborative Framework for 3D Tele-Immersive Social Networking
Mauro, D.A., O'Connor, N.E., Monaghan, D., Gowing, M., Fechteler, P., Eisert, P., Wall, J., Izquierdo, E., Alexiadis, D.S., Daras, P., Mekuria, R. and Cesar, P. 2013. Advancements and Challenges towards a Collaborative Framework for 3D Tele-Immersive Social Networking. 4th IEEE International Workshop on Hot Topics in 3D (Hot3D). San Jose, CA, USA 15 Jul 2013 IEEE. pp. 1-2
A Framework for Human-like Behavior in an Immersive Virtual World
Kuijk, Fons, Van Broeck, Sigurd, Dareau, Claude, Ravenet, Brian, Ochs, Magalie, Apostolakis, Konstantinos, Daras, Petros, Monaghan, David, O'Connor, Noel E, Wall, J. and Izquierdo, Ebroul 2013. A Framework for Human-like Behavior in an Immersive Virtual World. in: Proceedings of 2013 18th International Conference on Digital Signal Processing (DSP) IEEE. pp. 1-7
REVERIE: Natural Human Interaction in Virtual Immersive Environments
Wall, J., Izquierdo, Ebroul, Argyriou, Lemonia, Monaghan, David S., O'Connor, Noel E., Poulakos, Steven, Smolic, Aljoscha and Mekuria, Rufael 2014. REVERIE: Natural Human Interaction in Virtual Immersive Environments. in: 2014 IEEE International Conference on Image Processing (ICIP) IEEE. pp. 2165-2167
Spiking neural network model of sound localisation using the interaural intensity difference
Wall, J., McDaid, Liam J., Maguire, Liam P. and McGinnity, Thomas M. 2012. Spiking neural network model of sound localisation using the interaural intensity difference. IEEE Transactions on Neural Networks. 23 (4), pp. 574-586.
Perception-based Modelling of System Behaviour
Wall, J. 2006. Perception-based Modelling of System Behaviour. Proc. of the IEEE Systems, Man and Cybernetics Society.
A Spiking Neural Network Model of the Medial Superior Olive using Spike Timing Dependent Plasticity for Sound Localisation
Glackin, B., Wall, J., McGinnity, T.M., Maguire, L.P. and McDaid, L.J. 2010. A Spiking Neural Network Model of the Medial Superior Olive using Spike Timing Dependent Plasticity for Sound Localisation. Frontiers in Computational Neuroscience. 4 (18), pp. 1-16.
Spiking neural network connectivity and its potential for temporal sensory processing and variable binding
Wall, J. and Glackin, Cornelius 2013. Spiking neural network connectivity and its potential for temporal sensory processing and variable binding. Frontiers in Computational Neuroscience. 7 (182), pp. 1-2.
A Methodological Approach to User Evaluation and Assessment of a Virtual Environment Hangout
Pasin, Marco, Frisiello, Antonella, Wall, J., Poulakos, Steven and Smolic, Aljoscha 2015. A Methodological Approach to User Evaluation and Assessment of a Virtual Environment Hangout. in: Sanna, Andrea, Lamberti, Fabrizio, Rokne, Jon and Gatteschi, Valentina (ed.) Proceedings of the 7th International Conference on Intelligent Technologies for Interactive Entertainment EAI. pp. 1-5
Playing immersive games on the REVERIE platform
Doumanis, Ioannis, Wall, J. and Monaghan, David S. 2015. Playing immersive games on the REVERIE platform. in: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM) IEEE. pp. 1572-1577