A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
Conference paper
Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. 2020. A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement. 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK 19 - 24 Jul 2020 IEEE. https://doi.org/10.1109/IJCNN48605.2020.9206928
Authors | Abdallah Abdelhafiz Nossier, S., Wall, J., Moniri, M., Glackin, C. and Cannings, N. |
---|---|
Type | Conference paper |
Abstract | Deep learning has recently made a breakthrough in the speech enhancement process. Some architectures are based on a time domain representation, while others operate in the frequency domain; however, the study and comparison of different networks working in time and frequency is not reported in the literature. In this paper, this comparison between time and frequency domain learning for five Deep Neural Network (DNN) based speech enhancement architectures is presented. The comparison covers the evaluation of the output speech using four objective evaluation metrics: PESQ, STOI, LSD, and SSNR increase. Furthermore, the complexity of the five networks was investigated by comparing the number of parameters and processing time for each architecture. Finally some of the factors that affect learning in time and frequency were discussed. The primary results of this paper show that fully connected based architectures generate speech with low overall perception when learning in the time domain. On the other hand, convolutional based designs give acceptable performance in both frequency and time domains. However, time domain implementations show an inferior generalization ability. Frequency domain based learning was proved to be better than time domain when the complex spectrogram is used in the training process. Additionally, feature extraction is also proved to be very effective in DNN based supervised speech enhancement, whether it is performed at the beginning, or implicitly by bottleneck layer features. Finally, it was concluded that the choice of the working domain is mainly restricted by the type and design of the architecture used. |
Keywords | Deep Learning; Speech Enhancement; Time Domain; Frequency Domain |
Year | 2020 |
Conference | 2020 International Joint Conference on Neural Networks (IJCNN) |
Publisher | IEEE |
Accepted author manuscript | License File Access Level Anyone |
Publication dates | |
Online | 28 Sep 2020 |
Publication process dates | |
Accepted | 20 Mar 2020 |
Deposited | 12 Jan 2021 |
ISSN | 2161-4407 |
Book title | 2020 International Joint Conference on Neural Networks (IJCNN) |
ISBN | 978-1-7281-6926-2 |
Digital Object Identifier (DOI) | https://doi.org/10.1109/IJCNN48605.2020.9206928 |
Copyright holder | © 2020 IEEE |
Copyright information | Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
https://repository.uel.ac.uk/item/88x23
Download files
Accepted author manuscript
Time vs Freq_Final Version.pdf | ||
License: All rights reserved | ||
File access level: Anyone |
381
total views1277
total downloads13
views this month16
downloads this month