Deep Learning-based Speech Enhancement for Real-life Applications
PhD Thesis
Abdallah Abdelhafiz Nossier, S. 2023. Deep Learning-based Speech Enhancement for Real-life Applications. PhD Thesis University of East London School of Architecture, Computing & Engineering https://doi.org/10.15123/uel.8wv3q
Authors | Abdallah Abdelhafiz Nossier, S. |
---|---|
Type | PhD Thesis |
Abstract | Speech enhancement is the process of improving speech quality and intelligibility by suppressing noise. Inspired by the outstanding performance of the deep learning approach for speech enhancement, this thesis aims to add to this research area through the following contributions. The thesis presents an experimental analysis of different deep neural networks for speech enhancement, to compare their performance and investigate factors and approaches that improve the performance. The outcomes of this analysis facilitate the development of better speech enhancement networks in this work. Moreover, this thesis proposes a new deep convolutional denoising autoencoderbased speech enhancement architecture, in which strided and dilated convolutions were applied to improve the performance while keeping network complexity to a minimum. Furthermore, a two-stage speech enhancement approach is proposed that reduces distortion, by performing a speech denoising first stage in the frequency domain, followed by a second speech reconstruction stage in the time domain. This approach was proven to reduce speech distortion, leading to better overall quality of the processed speech in comparison to state-of-the-art speech enhancement models. Finally, the work presents two deep neural network speech enhancement architectures for hearing aids and automatic speech recognition, as two real-world speech enhancement applications. A smart speech enhancement architecture was proposed for hearing aids, which is an integrated hearing aid and alert system. This architecture enhances both speech and important emergency noise, and only eliminates undesired noise. The results show that this idea is applicable to improve the performance of hearing aids. On the other hand, the architecture proposed for automatic speech recognition solves the mismatch issue between speech enhancement automatic speech recognition systems, leading to significant reduction in the word error rate of a baseline automatic speech recognition system, provided by Intelligent Voice for research purposes. In conclusion, the results presented in this thesis show promising performance for the proposed architectures for real time speech enhancement applications. |
Keywords | automatic speech recognition; deep learning; hearing aids; speech distortion; speech enhancement |
Year | 2023 |
Publisher | University of East London |
Digital Object Identifier (DOI) | https://doi.org/10.15123/uel.8wv3q |
File | License File Access Level Anyone |
Publication dates | |
Online | 05 Jul 2024 |
Publication process dates | |
Completed | 04 Oct 2023 |
Deposited | 05 Jul 2024 |
Copyright holder | © 2023, The Author |
https://repository.uel.ac.uk/item/8wv3q
Download files
99
total views1238
total downloads1
views this month3
downloads this month