Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Article

Chaudhury, M., Karami, A. and Ghazanfar, M. A. 2022. Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics. 11 (16), p. 2567. https://doi.org/10.3390/electronics11162567

Publication dates
Authors	Chaudhury, M., Karami, A. and Ghazanfar, M. A.
Abstract	The trend for listening to music online has greatly increased over the past decade due to the number of online musical tracks. The large music databases of music libraries that are provided by online music content distribution vendors make music streaming and downloading services more accessible to the end-user. It is essential to classify similar types of songs with an appropriate tag or index (genre) to present similar songs in a convenient way to the end-user. As the trend of online music listening continues to increase, developing multiple machine learning models to classify music genres has become a main area of research. In this research paper, a popular music dataset GTZAN which contains ten music genres is analysed to study various types of music features and audio signals. Multiple scalable machine learning algorithms supported by Apache Spark, including naïve Bayes, decision tree, logistic regression, and random forest, are investigated for the classification of music genres. The performance of these classifiers is compared, and the random forest performs as the best classifier for the classification of music genres. Apache Spark is used in this paper to reduce the computation time for machine learning predictions with no computational cost, as it focuses on parallel computation. The present work also demonstrates that the perfect combination of Apache Spark and machine learning algorithms reduces the scalability problem of the computation of machine learning predictions. Moreover, different hyperparameters of the random forest classifier are optimized to increase the performance efficiency of the classifier in the domain of music genre classification. The experimental outcome shows that the developed random forest classifier can establish a high level of performance accuracy, especially for the mislabelled, distorted GTZAN dataset. This classifier has outperformed other machine learning classifiers supported by Apache Spark in the present work. The random forest classifier manages to achieve 90% accuracy for music genre classification compared to other work in the same domain.
Keywords	music genre; Apache Spark; PySpark; machine learning; exploratory data analysis
Journal	Electronics
Journal citation	11 (16), p. 2567
ISSN	2079-9292
Year	2022
Publisher	MDPI
Publisher's version	electronics-11-02567.pdf License CC BY 4.0 File Access Level Anyone
Digital Object Identifier (DOI)	https://doi.org/10.3390/electronics11162567
Web address (URL)	https://www.mdpi.com/2079-9292/11/16/2567
Online	17 Aug 2022
Publication process dates
Accepted	14 Aug 2022
Deposited	04 Jul 2023
Copyright holder	© 2022, The Author(s)

Permalink -

https://repository.uel.ac.uk/item/8w3qz

Download files

Publisher's version

	electronics-11-02567.pdf
License: CC BY 4.0
File access level: Anyone

1586
total views
435
total downloads
38
views this month
9
downloads this month

Export as

Related outputs

PSM: Proactive Spill Mitigation in PySpark

Karami, A. 2025. PSM: Proactive Spill Mitigation in PySpark. 12th IEEE International Conference on Data Science and Advanced Analytics. Birmingham, UK 09 - 12 Oct 2025 IEEE.

Adaptive Federated Learning for Anomaly Detection in Satellite Telemetry

Atefrad, A. and Karami, A. 2025. Adaptive Federated Learning for Anomaly Detection in Satellite Telemetry. 4th 2025 IEEE World Conference on Applied Intelligence and Computing (AIC 2025). 26 - 27 Jul 2025 Soft Computing Research Society (SCRS).

Edge computing in big data: challenges and benefits

Karami, A. and Karami, M. 2025. Edge computing in big data: challenges and benefits. International Journal of Data Science and Analytics. p. In press. https://doi.org/10.1007/s41060-025-00855-3

Enhancing Smart Contract Security: Static Heuristics and CodeBERT Embeddings

Soofiyan, S. and Karami, A. 2025. Enhancing Smart Contract Security: Static Heuristics and CodeBERT Embeddings. Applied Intelligence and Computing. The Institution of Electronics and Telecommunication Engineers (IETE), Delhi Centre, India 26 - 27 Jul 2025 IEEE.

WASPO: Workload-Aware Spark Performance Optimization Using NSGA-II

Karami, A. and Amirhosseini, M. 2025. WASPO: Workload-Aware Spark Performance Optimization Using NSGA-II. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11064004

Advancing Personality Type Prediction: Utilizing Enhanced Machine and Deep Learning Models with the Myers-Briggs Type Indicator

Amirhosseini, M., Karami, A. and Kalabi, F. 2025. Advancing Personality Type Prediction: Utilizing Enhanced Machine and Deep Learning Models with the Myers-Briggs Type Indicator. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11064294

Harnessing Social Media Sentiment for Predictive Insights into the Nigerian Presidential Election

Alao, J. O., Amirhosseini, M., Karami, A. and Ghorashi, S. A. 2025. Harnessing Social Media Sentiment for Predictive Insights into the Nigerian Presidential Election. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11064202

AI-Driven Mortality Prediction in COVID-19 Patients Using Advanced Feature Selection

Rajakaruna, I., Amirhosseini, M., Li, Y., Karami, A. and Arachchillage, D. J. 2025. AI-Driven Mortality Prediction in COVID-19 Patients Using Advanced Feature Selection. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11063762

Harmony in Federated Learning: A Comprehensive Review of Techniques to Tackle Heterogeneity and Non-IID Data

Karami, M. and Karami, A. 2025. Harmony in Federated Learning: A Comprehensive Review of Techniques to Tackle Heterogeneity and Non-IID Data. Cluster Computing. p. In press.

The impact of big data characteristics on credit risk assessment

Karami, A. and Igbokwe, C. 2025. The impact of big data characteristics on credit risk assessment. International Journal of Data Science and Analytics. p. In press. https://doi.org/10.1007/s41060-025-00753-8

Ethereum Smart Contracts: A Hierarchical Analysis of Vulnerability Challenges and Mitigation Strategies

Soofiyan, S. and Karami, A. 2025. Ethereum Smart Contracts: A Hierarchical Analysis of Vulnerability Challenges and Mitigation Strategies. Cluster Computing. p. In press.

Leveraging Big Data Characteristics for Enhanced Healthcare Fraud Detection

Karami, A. and Jafari, F. 2025. Leveraging Big Data Characteristics for Enhanced Healthcare Fraud Detection. Cluster Computing. 28 (Art. 349). https://doi.org/10.1007/s10586-024-05097-9

Breaking Down SEO Complexity: Bridging PCA and Bayesian-Optimized t-SNE

Karami, A., Ghasemabadi, S. F. and Amirhosseini, M. 2024. Breaking Down SEO Complexity: Bridging PCA and Bayesian-Optimized t-SNE. 2024 IEEE International Conference on Big Knowledge (ICBK). IEEE. https://doi.org/10.1109/ICKG63256.2024.00028

Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing

Karami, A., Shemshaki, M. and Ghazanfar, M. 2024. Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing. Data Intelligence. p. In Press. https://doi.org/10.3724/2096-7004.di.2024.0055

Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data

Amirhosseini, M. H., Ayodele, A. L. and Karami, A. 2024. Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data. IS'24: 12th IEEE International Conference on Intelligent Systems. Varna, Bulgaria 29 - 31 Aug 2024 IEEE. https://doi.org/10.1109/IS61756.2024.10705185

A reinforcement learning recommender system using bi-clustering and Markov Decision Process

Iftikhar, A., Ghazanfar, M. A., Ayub, M., Alahmari, S. A., Qazi, N. and Wall, J. 2024. A reinforcement learning recommender system using bi-clustering and Markov Decision Process. Expert Systems with Applications. 237 (Art.), p. 121541. https://doi.org/10.1016/j.eswa.2023.121541

Shifting the Weight: Applications of AI in Olympic Weightlifting

Bolarinwa, D., Qazi, N. and Ghazanfar, M. 2023. Shifting the Weight: Applications of AI in Olympic Weightlifting. PRDC 2023: 28th IEEE Pacific Rim International Symposium on Dependable Computing. Singapore 24 - 27 Oct 2023 IEEE. https://doi.org/10.1109/PRDC59308.2023.00051

Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis

Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis. Machine Learning and Applications: An International Journal (MLAIJ). 10 (2/3), pp. 1-15. https://doi.org/10.5121/mlaij.2023.10301

Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment

Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment . 5th International Conference on Machine Learning & Applications (CMLA 2023). Sydney, Australia 17 - 18 Jun 2023 AIRCC Publishing Corporation.

A novel DeepMaskNet model for face mask detection and masked facial recognition

Ullah, N., Javed, A., Ghazanfar, M., Alsufyani, A. and Bourouis, S. 2022. A novel DeepMaskNet model for face mask detection and masked facial recognition. Journal of King Saud University - Computer and Information Sciences. 30 (10-B), pp. 9905-9914. https://doi.org/10.1016/j.jksuci.2021.12.017

Designing a Cost-Efficient Network for a Small Enterprise

Jafari, F., Karami, A. and Osemwengie, L. 2021. Designing a Cost-Efficient Network for a Small Enterprise. SAI Computing Conference 2021. Online 15 - 16 Jul 2021 Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_14

Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System

Kure, H. I., Islam, S., Ghazanfar, M., Raza, A. and Pasha, M. 2021. Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System. Neural Computing and Applications. 34, p. 493–514. https://doi.org/10.1007/s00521-021-06400-0

Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs)

Khalid, A., Lundqvist, K., Yates, A. and Ghazanfar, M. 2021. Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs). PLoS ONE. 16 (Art. e0245485). https://doi.org/10.1371/journal.pone.0245485

Stock market prediction using machine learning classifiers and social media, news

Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S. 2020. Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing. 13, pp. 3433-3456. https://doi.org/10.1007/s12652-020-01839-w

A novel centroids initialisation for K-means clustering in the presence of benign outliers

Karami, A., Ur Rehman, S. and Ghazanfar, M. 2020. A novel centroids initialisation for K-means clustering in the presence of benign outliers. International Journal of Data Analysis Techniques and Strategies. 12 (4), pp. 287-298. https://doi.org/10.1504/IJDATS.2020.111498

Identifying Users with Wearable Sensors based on Activity Patterns

Ehatisham-ul-Haq, M., Malik, M. N., Azam, M. A., Naeem, U., Khalid, A. and Ghazanfar, M. 2020. Identifying Users with Wearable Sensors based on Activity Patterns. The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020). Madeira, Portugal 02 - 05 Nov 2020 Elsevier. https://doi.org/10.1016/j.procs.2020.10.005

Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems

Ayub, M., Ghazanfar, M., Mehmood, Z., Saba, T., Alharbey, R., Munshi, A. M. and Alrige, M. A. 2019. Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems. PLoS ONE. 14 (Art. e0220129). https://doi.org/10.1371/journal.pone.0220129

Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm

Iqbal, Misbah, Ghazanfar, M., Sattar, Asma, Maqsood, Muazzam, Khan, Salabat, Mehmood, Irfan and Baik, Sung Wook 2019. Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm. IEEE Access. 7, pp. 24719-24737. https://doi.org/10.1109/ACCESS.2019.2897003

An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities

Karami, A. 2018. An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities. Expert Systems with Applications. 108, pp. 36-60. https://doi.org/10.1016/j.eswa.2018.04.038

A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets

Khan, U., Aadil, F., Ghazanfar, M., Khan, S., Metawa, N., Muhammad, K., Mehmood, I. and Nam, Y. 2018. A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets. Sustainability. 10 (Art. 3702). https://doi.org/10.3390/su10103702

Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network

Sharif, M., Naeem, U., Islam, S. and Karami, A. 2018. Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01057-7_34

The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain

Okoye, Kingsley, Islam, S., Naeem, U., Sharif, M., Azam, Muhammad Awais and Karami, A. 2018. The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_96

A Framework for Uncertainty-Aware Visual Analytics in Big Data

Karami, A. 2015. A Framework for Uncertainty-Aware Visual Analytics in Big Data. CEUR Workshop Proceedings. 1510, pp. 146-155.

Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options

Karami, A. and Johansson, Ronnie 2013. Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options. Journal of Information Science and Engineering. 30 (2), pp. 519-534.

Choosing DBSCAN parameters automatically using differential evolution

Karami, A. and Johansson, Ronnie 2014. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 91 (7), pp. 1-11. https://doi.org/10.5120/15890-5059

A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks

Karami, A. and Guerrero-Zapata, Manel 2014. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks. Neurocomputing. 149 (Part C), pp. 1253-1269. https://doi.org/10.1016/j.neucom.2014.08.070

A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking

Karami, A. and Guerrero-Zapata, Manel 2014. A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking. Neurocomputing. 151 (3), pp. 1262-1282. https://doi.org/10.1016/j.neucom.2014.11.003

An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking

Karami, A. and Guerrero-Zapata, Manel 2015. An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking. Computer Networks. 80 (April), pp. 51-65. https://doi.org/10.1016/j.comnet.2015.01.020

ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network

Karami, A. 2015. ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network. Journal of Network and Computer Applications. 56 (Oct.), pp. 1-18. https://doi.org/10.1016/j.jnca.2015.05.017

A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks

Siddiqui, A., Karami, A. and Johnson, M. O. 2017. A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks. International Journal of Computer Applications. 174 (Art. 4). https://doi.org/10.5120/ijca2017915376

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Download files

Publisher's version

1586

435

38

9

Export as

Related outputs