Deep Learning-based Speech Enhancement for Real-life Applications

PhD Thesis


Abdallah Abdelhafiz Nossier, S. 2023. Deep Learning-based Speech Enhancement for Real-life Applications. PhD Thesis University of East London School of Architecture, Computing & Engineering https://doi.org/10.15123/uel.8wv3q
AuthorsAbdallah Abdelhafiz Nossier, S.
TypePhD Thesis
Abstract

Speech enhancement is the process of improving speech quality and intelligibility by suppressing noise. Inspired by the outstanding performance of the deep learning approach for speech enhancement, this thesis aims to add to this research area through the following contributions. The thesis presents an experimental analysis of different deep neural networks for speech enhancement, to compare their performance and investigate factors and approaches that improve the performance. The outcomes of this analysis facilitate the development of better speech enhancement networks in this work.

Moreover, this thesis proposes a new deep convolutional denoising autoencoderbased speech enhancement architecture, in which strided and dilated convolutions were applied to improve the performance while keeping network complexity to a minimum. Furthermore, a two-stage speech enhancement approach is proposed that reduces distortion, by performing a speech denoising first stage in the frequency domain, followed by a second speech reconstruction stage in the time domain. This approach was proven to reduce speech distortion, leading to better overall quality of the processed speech in comparison to state-of-the-art speech enhancement models.

Finally, the work presents two deep neural network speech enhancement architectures for hearing aids and automatic speech recognition, as two real-world speech enhancement applications. A smart speech enhancement architecture was proposed for hearing aids, which is an integrated hearing aid and alert system. This architecture enhances both speech and important emergency noise, and only eliminates undesired noise. The results show that this idea is applicable to improve the performance of hearing aids. On the other hand, the architecture proposed for automatic speech recognition solves the mismatch issue between speech enhancement automatic speech recognition systems, leading to significant reduction in the word error rate of a baseline automatic speech recognition system, provided by Intelligent Voice for research purposes. In conclusion, the results presented in this thesis show promising performance for the proposed architectures for real time speech enhancement applications.

Keywordsautomatic speech recognition; deep learning; hearing aids; speech distortion; speech enhancement
Year2023
PublisherUniversity of East London
Digital Object Identifier (DOI)https://doi.org/10.15123/uel.8wv3q
File
License
File Access Level
Anyone
Publication dates
Online05 Jul 2024
Publication process dates
Completed04 Oct 2023
Deposited05 Jul 2024
Copyright holder© 2023, The Author
Permalink -

https://repository.uel.ac.uk/item/8wv3q

Download files


File
2023_PhD_Abdallah Abdelhafiz Nossier.pdf
License: CC BY-NC-ND 4.0
File access level: Anyone

  • 91
    total views
  • 1229
    total downloads
  • 7
    views this month
  • 5
    downloads this month

Export as

Related outputs

Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2023. Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network. The 35th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Atlanta, Georgia (USA) 06 - 08 Nov 2023 IEEE Computer Society. https://doi.org/10.1109/ICTAI59109.2023.00087
A Deep Learning Speech Enhancement Architecture Optimised for Speech Recognition and Hearing Aids
Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2023. A Deep Learning Speech Enhancement Architecture Optimised for Speech Recognition and Hearing Aids. The 35th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Atlanta, Georgia (USA) 06 - 08 Nov 2023 IEEE Computer Society. https://doi.org/10.1109/ICTAI59109.2023.00088
Convolutional Recurrent Smart Speech Enhancement Architecture for Hearing Aids
Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2022. Convolutional Recurrent Smart Speech Enhancement Architecture for Hearing Aids. INTERSPEECH 2022. Incheon, Korea 18 - 22 Sep 2022
Two-Stage Deep Learning Approach for Speech Enhancement and Reconstruction in The Frequency and Time Domains
Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2022. Two-Stage Deep Learning Approach for Speech Enhancement and Reconstruction in The Frequency and Time Domains. WCCI 2022: IEEE World Congress on Computational Intelligence. Padua, Italy 23 May - 18 Jul 2022 IEEE. https://doi.org/10.1109/IJCNN55064.2022.9892355
An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement
Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement. Electronics. 10 (Art. 17). https://doi.org/10.3390/electronics10010017
Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures
Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206623
A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206928