An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement
Article
Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement. Electronics. 10 (Art. 17). https://doi.org/10.3390/electronics10010017
Authors | Nossier, S. A., Wall, J., Moniri, M., Glackin, C. and Cannings, N. |
---|---|
Abstract | Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalization ability; complexity; and, processing time. Further analysis is then provided while using two different approaches. The first approach investigates how the performance is affected by changing network hyperparameters and the structure of the data, including the Lombard effect. While the second approach interprets the results by visualizing the spectrogram of the output layer of all the investigated models, and the spectrograms of the hidden layers of the convolutional neural network architecture. Finally, a general evaluation is performed for supervised deep learning-based speech enhancement while using SWOC analysis, to discuss the technique’s Strengths, Weaknesses, Opportunities, and Challenges. The results of this paper contribute to the understanding of how different deep neural networks perform the speech enhancement task, highlight the strengths and weaknesses of each architecture, and provide recommendations for achieving better performance. This work facilitates the development of better deep neural networks for speech enhancement in the future. |
Journal | Electronics |
Journal citation | 10 (Art. 17) |
ISSN | 2079-9292 |
Year | 2020 |
Publisher | MDPI |
Publisher's version | License File Access Level Anyone |
Digital Object Identifier (DOI) | https://doi.org/10.3390/electronics10010017 |
Publication dates | |
Online | 24 Dec 2020 |
Publication process dates | |
Accepted | 17 Dec 2020 |
Deposited | 13 Jan 2021 |
Funder | University of East London |
Intelligent Voice Ltd | |
Copyright holder | © 2020 The Authors |
https://repository.uel.ac.uk/item/88x5z
Download files
585
total views264
total downloads6
views this month2
downloads this month