Bird Audio Diarization with Faster R-CNN

Conference paper

Shrestha, R., Glackin, C., Wall, J. and Cannings, N. 2021. Bird Audio Diarization with Faster R-CNN. 30th International Conference on Artificial Neural Networks (ICANN). Online 14 - 17 Sep 2021 Springer. https://doi.org/10.1007/978-3-030-86362-3_34

Publication dates
Authors	Shrestha, R., Glackin, C., Wall, J. and Cannings, N.
Type	Conference paper
Abstract	Birds embody particular phonic and visual traits that distinguish them from 10,000 distinct bird species worldwide. Birds are also perceived to be indicators of biodiversity due to their propensity for responding to changes in their environment. An effective, automatic wildlife monitoring system based on bird bioacoustics, which can support manual classification, can be pivotal for the protection of the environment and endangered species. In modern machine learning, real-life bird audio classification is still considered as an esoteric challenge owing to the convoluted patterns present in bird song, and the complications that arise when numerous bird species are present in a common setting. Existing avian bioacoustic monitoring systems struggle when multiple bird species are present in an audio segment. To overcome these challenges, we propose a novel Faster Region-Based Convolutional Neural Network bird audio diarization system that incorporates object detection in the spectral domain and performs diarization of 50 bird species to effectively tackle the `which bird spoke when?' problem. Benchmark results are presented using the Bird Songs from Europe dataset achieving a Diarization Error Rate of 21.81, Jaccard Error Rate of 20.94 and F1, precision and recall values of 0.85, 0.83 and 0.87 respectively.
Keywords	Deep Neural Networks; Audio Classification; Diarization; Automatic Wildlife Monitoring
Year	2021
Conference	30th International Conference on Artificial Neural Networks (ICANN)
Publisher	Springer
Accepted author manuscript	ICANN2021_BirdAudioDiarization.pdf License Springer Nature Terms of Use for accepted manuscripts of subscription articles, books and chapters File Access Level Anyone
Online	07 Sep 2021
Publication process dates
Accepted	15 Jun 2021
Deposited	01 Jul 2021
Journal citation	pp. 415-426
ISSN	0302-9743
Book title	Artificial Neural Networks and Machine Learning – ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part I
Book editor	Farkaš, I.
	Masulli, P.
	Otte, S.
	Wermter, S.
ISBN	978-3-030-86361-6
Digital Object Identifier (DOI)	https://doi.org/10.1007/978-3-030-86362-3_34
Web address (URL)	https://www.springer.com/gb/book/9783030863616
Copyright holder	© Springer Nature Switzerland AG 2021
Additional information	The final authenticated version is available online at https://doi.org/10.1007/978-3-030-86362-3_34

Permalink -

https://repository.uel.ac.uk/item/8986x

Download files

Accepted author manuscript

	ICANN2021_BirdAudioDiarization.pdf
License: Springer Nature Terms of Use for accepted manuscripts of subscription articles, books and chapters
File access level: Anyone

751
total views
343
total downloads
7
views this month
2
downloads this month

Export as

Related outputs

Robust Deepfake Speech Algorithm Recognition: Classifying Generative Algorithms via Speaker X-Vectors and Deep Learning

Maltby, H., Wall, J., Glackin, C., Moniri, M., Shrestha, R., Cannings, N. and Salami, I. 2025. Robust Deepfake Speech Algorithm Recognition: Classifying Generative Algorithms via Speaker X-Vectors and Deep Learning. IEEE International Joint Conference on Neural Networks (IJCNN). Vancouver, Canada 24 - 29 Jul 2016 IEEE.

AI Investment Advisory: Examining Robo-Advisor Adoption Using Financial Literacy and Investment Experience Variables

Qadoos, A., AbouGrad, H., Wall, J. and Sharif, S. 2025. AI Investment Advisory: Examining Robo-Advisor Adoption Using Financial Literacy and Investment Experience Variables. AI and IoT for Next-Generation Smart Robotic Systems Innovations, Challenges, and Opportunities – AISRS Workshop, 3rd International Conference on Mechatronics and Smart Systems – CONF-MSS 2025. University of East London 09 - 09 Dec 2024 EWA Publishing. https://doi.org/10.54254/2755-2721/2025.21287

A Frequency Bin Analysis of Distinctive Ranges Between Human and Deepfake Generated Voices

Maltby, H., Wall, J., Glackin, C., Moniri, M., Cannings, N. and Salami, I. 2024. A Frequency Bin Analysis of Distinctive Ranges Between Human and Deepfake Generated Voices. 2024 International Joint Conference on Neural Networks (IJCNN) - Neural Networks Models. Yokohama, Japan 30 Jun - 05 Jul 2024 IEEE. https://doi.org/10.1109/IJCNN60899.2024.10650554

A reinforcement learning recommender system using bi-clustering and Markov Decision Process

Iftikhar, A., Ghazanfar, M. A., Ayub, M., Alahmari, S. A., Qazi, N. and Wall, J. 2024. A reinforcement learning recommender system using bi-clustering and Markov Decision Process. Expert Systems with Applications. 237 (Art.), p. 121541. https://doi.org/10.1016/j.eswa.2023.121541

Analysis of Deep Neural Networks for Military Target Classification using Synthetic Aperture Radar Images

Jacob, S., Wall, J. and Sharif, S. 2023. Analysis of Deep Neural Networks for Military Target Classification using Synthetic Aperture Radar Images. 3ICT 2023: International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies. University of Bahrain, Bahrain 20 - 21 Nov 2023 IEEE. https://doi.org/10.1109/3ICT60104.2023.10391600

Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network

Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2023. Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network. The 35th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Atlanta, Georgia (USA) 06 - 08 Nov 2023 IEEE Computer Society. https://doi.org/10.1109/ICTAI59109.2023.00087

A Deep Learning Speech Enhancement Architecture Optimised for Speech Recognition and Hearing Aids

Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2023. A Deep Learning Speech Enhancement Architecture Optimised for Speech Recognition and Hearing Aids. The 35th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Atlanta, Georgia (USA) 06 - 08 Nov 2023 IEEE Computer Society. https://doi.org/10.1109/ICTAI59109.2023.00088

An Extended Reality Solution for Mitigating the Video Fatigue of Online Meetings

Glackin, C., Cannings, N., Poobalasingam, V., Wall, J., Sharif, S. and Moniri, M. 2023. An Extended Reality Solution for Mitigating the Video Fatigue of Online Meetings. in: Jung, T. and tom Dieck, M. C. (ed.) XR-Metaverse Cases: Business Application of AR, VR, XR and Metaverse Springer. pp. 45-54

Short Utterance Dialogue Act Classification Using a Transformer Ensemble

Maltby, H., Wall, J., Goodluck Constance, T., Moniri, M., Glackin, C., Rajwadi, M. and Cannings, N. 2023. Short Utterance Dialogue Act Classification Using a Transformer Ensemble. UA-DIGITAL 2023: UA Digital Theme Research Twinning. Online virtual conference 27 - 31 Mar 2023

An Innovative Approach Based on Machine Learning to Evaluate the Risk Factors Importance in Diagnosing Keratoconus

Zorto, A. D., Sharif, S., Wall, J., Brahma, A., Alzahrani, A. I. and Alalwan, N. 2023. An Innovative Approach Based on Machine Learning to Evaluate the Risk Factors Importance in Diagnosing Keratoconus. Informatics in Medicine Unlocked. 38, p. 101208. https://doi.org/10.1016/j.imu.2023.101208

Deception Detection in Conversations using the Proximity of Linguistic Markers

Bajaj, N., Rajwadi, M., Goodluck Constance, T., Wall, J., Moniri, M., Laird, T., Woodruff, C., Laird, J., Glackin, C. and Cannings, N. 2023. Deception Detection in Conversations using the Proximity of Linguistic Markers. Knowledge-Based Systems. 23 (Art. 110422). https://doi.org/10.1016/j.knosys.2023.110422

Improving data quality assessment of connected vehicles data with machine learning and statistical methods

Wall, J., Wondie, M. and Li, Y. 2022. Improving data quality assessment of connected vehicles data with machine learning and statistical methods. Pan African Conference on Artifical Intelligence 2022. 04 - 05 Oct 2022

A Machine Learning Approach to Identify the Preferred Representational System of a Person

Amirhosseini, M. and Wall, J. 2022. A Machine Learning Approach to Identify the Preferred Representational System of a Person. Multimodal Technologies and Interaction. 6 (12), p. 112. https://doi.org/10.3390/mti6120112

Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement

Wall, J., Shrestha, R., Glackin, C., Cannings, N., Rajwadi, M., Kada, S., Laird, J., Laird, T. and Woodruff, C. 2022. Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement. CyberSciTech 2022: IEEE Cyber Science and Technology Congress. Calabria, Italy 12 - 15 Sep 2022 IEEE. https://doi.org/10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927875

Convolutional Recurrent Smart Speech Enhancement Architecture for Hearing Aids

Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2022. Convolutional Recurrent Smart Speech Enhancement Architecture for Hearing Aids. INTERSPEECH 2022. Incheon, Korea 18 - 22 Sep 2022

Two-Stage Deep Learning Approach for Speech Enhancement and Reconstruction in The Frequency and Time Domains

Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2022. Two-Stage Deep Learning Approach for Speech Enhancement and Reconstruction in The Frequency and Time Domains. WCCI 2022: IEEE World Congress on Computational Intelligence. Padua, Italy 23 May - 18 Jul 2022 IEEE. https://doi.org/10.1109/IJCNN55064.2022.9892355

A Mixed Reality Approach for dealing with the Video Fatigue of Online Meetings

Wall, J., Poobalasingam, V., Sharif, S., Moniri, M., Glackin, C. and Cannings, N. 2022. A Mixed Reality Approach for dealing with the Video Fatigue of Online Meetings. 7th International XR Conference. Lisbon, Portugal 27 - 29 Apr 2022

A Conversational AI Approach to Detecting Deception and Tackling Insurance Fraud

Wall, J. 2021. A Conversational AI Approach to Detecting Deception and Tackling Insurance Fraud. Tenth International Conference on Intelligent Computing and Information Systems (ICICIS). Cairo, Egypt 05 - 07 Dec 2021 IEEE. https://doi.org/10.1109/ICICIS52592.2021.9694118

Resolving Ambiguity in Hedge Detection by Automatic Generation of Linguistic Rules

Goodluck Constance, T., Bajaj, N., Rajwadi, M., Maltby, H., Wall, J., Moniri, M., Woodruff, C., Laird, T., Laird, J., Glackin, C. and Cannings, N. 2021. Resolving Ambiguity in Hedge Detection by Automatic Generation of Linguistic Rules. 30th International Conference on Artificial Neural Networks (ICANN). Online 14 - 17 Sep 2021 Springer. https://doi.org/10.1007/978-3-030-86383-8_30

An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement

Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement. Electronics. 10 (Art. 17). https://doi.org/10.3390/electronics10010017

Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures

Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206623

A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement

Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206928

Fraud detection in telephone conversations for financial services using linguistic features

Bajaj, N., Goodluck Constance, T., Rajwadi, M., Wall, J., Moniri, M., Glackin, C., Cannings, N., Woodruff, C. and Laird, J. 2019. Fraud detection in telephone conversations for financial services using linguistic features. Neural Information Processing Systems - NeurIPS 2019. Vancouver, Canada 08 - 14 Dec 2019 NeurIPS.

A Framework for Augmented Reality Based Shared Experiences

Ali, A., Glackin, C., Cannings, N., Wall, J., Sharif, S. and Moniri, M. 2019. A Framework for Augmented Reality Based Shared Experiences. Immersive Learning Research Network - iLRN. London, UK 23 - 27 Jun 2019 Technischen Universität Graz. https://doi.org/10.3217/978-3-85125-657-4-24

Smart Transcription

Wall, J., Glackin, C., Dugan, N. and Cannings, N. 2019. Smart Transcription. 31st European Conference on Cognitive Ergonomics. Belfast, UK 10 - 13 Sep 2019 Association for Computing Machinery (ACM). https://doi.org/10.1145/3335082.3335114

Explaining Sentiment Classification

Rajwadi, M., Glackin, C., Wall, J., Chollet, G. and Cannings, N. 2019. Explaining Sentiment Classification. Interspeech 2019. Graz, AT 15 - 19 Sep 2019 International Speech Communication Association. https://doi.org/10.21437/Interspeech.2019-2743

Towards a More Representative Definition of Cyber Security

Schatz, Daniel, Bashroush, R. and Wall, J. 2017. Towards a More Representative Definition of Cyber Security. Journal of Digital Forensics, Security and Law. 12 (2), pp. 53-74. https://doi.org/10.15394/jdfsl.2017.1476

Solving the Linearly Inseparable XOR Problem with Spiking Neural Networks

Wall, J. and Reljan-Delaney, M. 2017. Solving the Linearly Inseparable XOR Problem with Spiking Neural Networks . SAI Computing Conference 2017. London, UK 18 - 20 Jul 2017 IEEE. https://doi.org/10.1109/SAI.2017.8252173

Privacy preserving encrypted phonetic search of speech data

Wall, J., Glackin, C., Chollet, G., Dugan, N., Cannings, N., Tahir, S., Ghosh Ray, I. and Rajarajan, M. 2017. Privacy preserving encrypted phonetic search of speech data. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Louisiana, USA 05 - 09 Mar 2017 IEEE. pp. 6414-6418 https://doi.org/10.1109/ICASSP.2017.7953391

Spiking neuron models of the medial and lateral superior olive for sound localisation

Wall, J., McDaid, L.J., Maguire, L.P. and McGinnity, T.M. 2008. Spiking neuron models of the medial and lateral superior olive for sound localisation. IEEE International Joint Conference on Neural Networks (IJCNN) (IEEE World Congress on Computational Intelligence). Hong Kong 01 - 08 Jun 2008 Hong Kong IEEE. pp. 2641-2647 https://doi.org/10.1109/IJCNN.2008.4634168

A comparison of sound localisation techniques using cross-correlation and spiking neural networks for mobile robotics

Wall, J., McGinnity, Thomas M. and Maguire, Liam P. 2011. A comparison of sound localisation techniques using cross-correlation and spiking neural networks for mobile robotics. Neural Networks (IJCNN), The 2011 International Joint Conference on. San Jose, CA 31 Jul - 05 Aug 2011 IEEE. pp. 1981-1987

A spiking neural network implementation of sound localisation

Wall, J., McDaid, L.J., Maguire, L.P. and McGinnity, T.M. 2007. A spiking neural network implementation of sound localisation. IET Irish Signals and Systems. Derry, UK 13 - 14 Sep 2007 Derry, UK pp. 1-5

Using the interaural time difference and cross-correlation to localise short-term complex noises

Wall, J., McGinnity, Martin and Maguire, Liam 2011. Using the interaural time difference and cross-correlation to localise short-term complex noises. Artificial Intelligence and Cognitive Science (AICS). Derry, UK 31 Aug - 02 Sep 2011 University of Ulster, Intelligent Systems Research Centre.

A Framework for Realistic 3D Tele-Immersion

Fechteler, P., Hilsmann, A., Eisert, P., Broeck, S.V., Stevens, C., Wall, J., Sanna, M., Mauro, D.A., Kuijk, F., Mekuria, R., Cesar, P., Monaghan, D., O'Connor, N.E., Daras, P., Alexiadis, D. and Zahariadis, T. 2013. A Framework for Realistic 3D Tele-Immersion. 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications. Berlin, Germany 2013 New York, NY, USA Association for Computing Machinery (ACM). pp. 1-8 https://doi.org/10.1145/2466715.2466718

A Roadmap for Privacy Preserving Speech Processing

Wall, J., Glackin, C., Chollet, G., Dugan, N., Cannings, N., Tahir, S., Ghosh Ray, I, Rajarajan, M., Falkner, R. and Badii, A. 2016. A Roadmap for Privacy Preserving Speech Processing. Preserving Privacy in an Age of Increased Surveillance – A Biometrics Perspective. London, UK 17 - 17 Oct 2016

Deep Laterally Recurrent Spiking Neural Networks for Speech Enhancement

Wall, J. 2016. Deep Laterally Recurrent Spiking Neural Networks for Speech Enhancement. UEL Computing & Engineering Showcase. London, UK 16 Jun 2016 University of East London.

Recurrent lateral inhibitory spiking networks for speech enhancement

Wall, J., Glackin, Cornelius, Cannings, Nigel, Chollet, Gerard and Dugan, Nazim 2016. Recurrent lateral inhibitory spiking networks for speech enhancement. IEEE International Joint Conference on Neural Networks (IJCNN). Vancouver, Canada 24 - 29 Jul 2016 IEEE. pp. 1023-1028 https://doi.org/10.1109/IJCNN.2016.7727310

Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds

Wall, J., Izquierdo, E. and Zhang, Q. 2013. Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds. in: Proceedings 2013 18th International Conference on Digital Signal Processing (DSP) IEEE. pp. 1-6

Advancements and Challenges towards a Collaborative Framework for 3D Tele-Immersive Social Networking

Mauro, D.A., O'Connor, N.E., Monaghan, D., Gowing, M., Fechteler, P., Eisert, P., Wall, J., Izquierdo, E., Alexiadis, D.S., Daras, P., Mekuria, R. and Cesar, P. 2013. Advancements and Challenges towards a Collaborative Framework for 3D Tele-Immersive Social Networking. 4th IEEE International Workshop on Hot Topics in 3D (Hot3D). San Jose, CA, USA 15 Jul 2013 IEEE. pp. 1-2

A Framework for Human-like Behavior in an Immersive Virtual World

Kuijk, Fons, Van Broeck, Sigurd, Dareau, Claude, Ravenet, Brian, Ochs, Magalie, Apostolakis, Konstantinos, Daras, Petros, Monaghan, David, O'Connor, Noel E, Wall, J. and Izquierdo, Ebroul 2013. A Framework for Human-like Behavior in an Immersive Virtual World. in: Proceedings of 2013 18th International Conference on Digital Signal Processing (DSP) IEEE. pp. 1-7

REVERIE: Natural Human Interaction in Virtual Immersive Environments

Wall, J., Izquierdo, Ebroul, Argyriou, Lemonia, Monaghan, David S., O'Connor, Noel E., Poulakos, Steven, Smolic, Aljoscha and Mekuria, Rufael 2014. REVERIE: Natural Human Interaction in Virtual Immersive Environments. in: 2014 IEEE International Conference on Image Processing (ICIP) IEEE. pp. 2165-2167

Spiking neural network model of sound localisation using the interaural intensity difference

Wall, J., McDaid, Liam J., Maguire, Liam P. and McGinnity, Thomas M. 2012. Spiking neural network model of sound localisation using the interaural intensity difference. IEEE Transactions on Neural Networks. 23 (4), pp. 574-586.

Perception-based Modelling of System Behaviour

Wall, J. 2006. Perception-based Modelling of System Behaviour. Proc. of the IEEE Systems, Man and Cybernetics Society.

A Spiking Neural Network Model of the Medial Superior Olive using Spike Timing Dependent Plasticity for Sound Localisation

Glackin, B., Wall, J., McGinnity, T.M., Maguire, L.P. and McDaid, L.J. 2010. A Spiking Neural Network Model of the Medial Superior Olive using Spike Timing Dependent Plasticity for Sound Localisation. Frontiers in Computational Neuroscience. 4 (18), pp. 1-16.

A Methodological Approach to User Evaluation and Assessment of a Virtual Environment Hangout

Pasin, Marco, Frisiello, Antonella, Wall, J., Poulakos, Steven and Smolic, Aljoscha 2015. A Methodological Approach to User Evaluation and Assessment of a Virtual Environment Hangout. in: Sanna, Andrea, Lamberti, Fabrizio, Rokne, Jon and Gatteschi, Valentina (ed.) Proceedings of the 7th International Conference on Intelligent Technologies for Interactive Entertainment EAI. pp. 1-5

Playing immersive games on the REVERIE platform

Doumanis, Ioannis, Wall, J. and Monaghan, David S. 2015. Playing immersive games on the REVERIE platform. in: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM) IEEE. pp. 1572-1577

Spiking neural network connectivity and its potential for temporal sensory processing and variable binding

Wall, J. and Glackin, C. 2013. Spiking neural network connectivity and its potential for temporal sensory processing and variable binding. Frontiers in Computational Neuroscience. 7 (182), pp. 1-2. https://doi.org/10.3389/fncom.2013.00182

Spiking Neural Network Connectivity and its Potential for Temporal Sensory Processing and Variable Binding

Wall, J. and Glackin, C. 2013. Spiking Neural Network Connectivity and its Potential for Temporal Sensory Processing and Variable Binding. Frontiers Media SA.

Bird Audio Diarization with Faster R-CNN

Download files

Accepted author manuscript

751

343

7

2

Export as

Related outputs