Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Article


Chaudhury, M., Karami, A. and Ghazanfar, M. A. 2022. Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics. 11 (16), p. 2567. https://doi.org/10.3390/electronics11162567
AuthorsChaudhury, M., Karami, A. and Ghazanfar, M. A.
Abstract

The trend for listening to music online has greatly increased over the past decade due to the number of online musical tracks. The large music databases of music libraries that are provided by online music content distribution vendors make music streaming and downloading services more accessible to the end-user. It is essential to classify similar types of songs with an appropriate tag or index (genre) to present similar songs in a convenient way to the end-user. As the trend of online music listening continues to increase, developing multiple machine learning models to classify music genres has become a main area of research. In this research paper, a popular music dataset GTZAN which contains ten music genres is analysed to study various types of music features and audio signals. Multiple scalable machine learning algorithms supported by Apache Spark, including naïve Bayes, decision tree, logistic regression, and random forest, are investigated for the classification of music genres. The performance of these classifiers is compared, and the random forest performs as the best classifier for the classification of music genres. Apache Spark is used in this paper to reduce the computation time for machine learning predictions with no computational cost, as it focuses on parallel computation. The present work also demonstrates that the perfect combination of Apache Spark and machine learning algorithms reduces the scalability problem of the computation of machine learning predictions. Moreover, different hyperparameters of the random forest classifier are optimized to increase the performance efficiency of the classifier in the domain of music genre classification. The experimental outcome shows that the developed random forest classifier can establish a high level of performance accuracy, especially for the mislabelled, distorted GTZAN dataset. This classifier has outperformed other machine learning classifiers supported by Apache Spark in the present work. The random forest classifier manages to achieve 90% accuracy for music genre classification compared to other work in the same domain.

Keywordsmusic genre; Apache Spark; PySpark; machine learning; exploratory data analysis
JournalElectronics
Journal citation11 (16), p. 2567
ISSN2079-9292
Year2022
PublisherMDPI
Publisher's version
License
File Access Level
Anyone
Digital Object Identifier (DOI)https://doi.org/10.3390/electronics11162567
Web address (URL)https://www.mdpi.com/2079-9292/11/16/2567
Publication dates
Online17 Aug 2022
Publication process dates
Accepted14 Aug 2022
Deposited04 Jul 2023
Copyright holder© 2022, The Author(s)
Permalink -

https://repository.uel.ac.uk/item/8w3qz

Download files


Publisher's version
electronics-11-02567.pdf
License: CC BY 4.0
File access level: Anyone

  • 210
    total views
  • 276
    total downloads
  • 4
    views this month
  • 2
    downloads this month

Export as

Related outputs

Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing
Karami, A., Shemshaki, M. and Ghazanfar, M. 2024. Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing. Data Intelligence. p. In Press. https://doi.org/10.3724/2096-7004.di.2024.0055
Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data
Amirhosseini, M. H., Ayodele, A. L. and Karami, A. 2024. Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data. IS'24: 12th IEEE International Conference on Intelligent Systems. Varna, Bulgaria 29 - 31 Aug 2024 IEEE. https://doi.org/10.1109/IS61756.2024.10705185
A reinforcement learning recommender system using bi-clustering and Markov Decision Process
Iftikhar, A., Ghazanfar, M. A., Ayub, M., Alahmari, S. A., Qazi, N. and Wall, J. 2024. A reinforcement learning recommender system using bi-clustering and Markov Decision Process. Expert Systems with Applications. 237 (Art.), p. 121541. https://doi.org/10.1016/j.eswa.2023.121541
Shifting the Weight: Applications of AI in Olympic Weightlifting
Bolarinwa, D., Qazi, N. and Ghazanfar, M. 2023. Shifting the Weight: Applications of AI in Olympic Weightlifting. PRDC 2023: 28th IEEE Pacific Rim International Symposium on Dependable Computing. Singapore 24 - 27 Oct 2023 IEEE. https://doi.org/10.1109/PRDC59308.2023.00051
Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis
Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis. Machine Learning and Applications: An International Journal (MLAIJ). 10 (2/3), pp. 1-15. https://doi.org/10.5121/mlaij.2023.10301
Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment
Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment . 5th International Conference on Machine Learning & Applications (CMLA 2023). Sydney, Australia 17 - 18 Jun 2023 AIRCC Publishing Corporation.
A novel DeepMaskNet model for face mask detection and masked facial recognition
Ullah, N., Javed, A., Ghazanfar, M., Alsufyani, A. and Bourouis, S. 2022. A novel DeepMaskNet model for face mask detection and masked facial recognition. Journal of King Saud University - Computer and Information Sciences. 30 (10-B), pp. 9905-9914. https://doi.org/10.1016/j.jksuci.2021.12.017
Designing a Cost-Efficient Network for a Small Enterprise
Jafari, F., Karami, A. and Osemwengie, L. 2021. Designing a Cost-Efficient Network for a Small Enterprise. SAI Computing Conference 2021. Online 15 - 16 Jul 2021 Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_14
Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System
Kure, H. I., Islam, S., Ghazanfar, M., Raza, A. and Pasha, M. 2021. Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System. Neural Computing and Applications. 34, p. 493–514. https://doi.org/10.1007/s00521-021-06400-0
Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs)
Khalid, A., Lundqvist, K., Yates, A. and Ghazanfar, M. 2021. Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs). PLoS ONE. 16 (Art. e0245485). https://doi.org/10.1371/journal.pone.0245485
Stock market prediction using machine learning classifiers and social media, news
Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S. 2020. Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing. 13, pp. 3433-3456. https://doi.org/10.1007/s12652-020-01839-w
A novel centroids initialisation for K-means clustering in the presence of benign outliers
Karami, A., Ur Rehman, S. and Ghazanfar, M. 2020. A novel centroids initialisation for K-means clustering in the presence of benign outliers. International Journal of Data Analysis Techniques and Strategies. 12 (4), pp. 287-298. https://doi.org/10.1504/IJDATS.2020.111498
Identifying Users with Wearable Sensors based on Activity Patterns
Ehatisham-ul-Haq, M., Malik, M. N., Azam, M. A., Naeem, U., Khalid, A. and Ghazanfar, M. 2020. Identifying Users with Wearable Sensors based on Activity Patterns. The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020). Madeira, Portugal 02 - 05 Nov 2020 Elsevier. https://doi.org/10.1016/j.procs.2020.10.005
Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems
Ayub, M., Ghazanfar, M., Mehmood, Z., Saba, T., Alharbey, R., Munshi, A. M. and Alrige, M. A. 2019. Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems. PLoS ONE. 14 (Art. e0220129). https://doi.org/10.1371/journal.pone.0220129
Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm
Iqbal, Misbah, Ghazanfar, M., Sattar, Asma, Maqsood, Muazzam, Khan, Salabat, Mehmood, Irfan and Baik, Sung Wook 2019. Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm. IEEE Access. 7, pp. 24719-24737. https://doi.org/10.1109/ACCESS.2019.2897003
An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities
Karami, A. 2018. An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities. Expert Systems with Applications. 108, pp. 36-60. https://doi.org/10.1016/j.eswa.2018.04.038
A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets
Khan, U., Aadil, F., Ghazanfar, M., Khan, S., Metawa, N., Muhammad, K., Mehmood, I. and Nam, Y. 2018. A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets. Sustainability. 10 (Art. 3702). https://doi.org/10.3390/su10103702
Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network
Sharif, M., Naeem, U., Islam, S. and Karami, A. 2018. Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01057-7_34
The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain
Okoye, Kingsley, Islam, S., Naeem, U., Sharif, M., Azam, Muhammad Awais and Karami, A. 2018. The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_96
A Framework for Uncertainty-Aware Visual Analytics in Big Data
Karami, A. 2015. A Framework for Uncertainty-Aware Visual Analytics in Big Data. CEUR Workshop Proceedings. 1510, pp. 146-155.
Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options
Karami, A. and Johansson, Ronnie 2013. Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options. Journal of Information Science and Engineering. 30 (2), pp. 519-534.
Choosing DBSCAN parameters automatically using differential evolution
Karami, A. and Johansson, Ronnie 2014. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 91 (7), pp. 1-11. https://doi.org/10.5120/15890-5059
A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks
Karami, A. and Guerrero-Zapata, Manel 2014. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks. Neurocomputing. 149 (Part C), pp. 1253-1269. https://doi.org/10.1016/j.neucom.2014.08.070
A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking
Karami, A. and Guerrero-Zapata, Manel 2014. A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking. Neurocomputing. 151 (3), pp. 1262-1282. https://doi.org/10.1016/j.neucom.2014.11.003
An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking
Karami, A. and Guerrero-Zapata, Manel 2015. An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking. Computer Networks. 80 (April), pp. 51-65. https://doi.org/10.1016/j.comnet.2015.01.020
ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network
Karami, A. 2015. ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network. Journal of Network and Computer Applications. 56 (Oct.), pp. 1-18. https://doi.org/10.1016/j.jnca.2015.05.017
A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks
Siddiqui, A., Karami, A. and Johnson, M. O. 2017. A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks. International Journal of Computer Applications. 174 (Art. 4). https://doi.org/10.5120/ijca2017915376