A novel centroids initialisation for K-means clustering in the presence of benign outliers

Article


Karami, A., Ur Rehman, S. and Ghazanfar, M. 2020. A novel centroids initialisation for K-means clustering in the presence of benign outliers. International Journal of Data Analysis Techniques and Strategies. 12 (4), pp. 287-298. https://doi.org/10.1504/IJDATS.2020.111498
AuthorsKarami, A., Ur Rehman, S. and Ghazanfar, M.
Abstract

K-means is one of the most important and widely applied clustering algorithms in learning systems. However, it suffers from centroids initialisation that makes K-means algorithm unstable. The performance and the stability of the K-means algorithm may be degraded if benign outliers (i.e., long-term independence data points) appear in data. In this paper, we developed a novel algorithm to optimise K-means performance in the presence of benign outliers. We firstly identified the benign outliers and executed K-means across them, then K-means runs over all data points to re-locate clusters' centroids, providing high accuracy. The experimental results over several benchmarking and synthetic datasets confirm that the proposed method significantly outperformed some existing approaches with better accuracy based on applied performance metrics.

Keywordsclustering; K-means; centroid initialisation; benign outlier
JournalInternational Journal of Data Analysis Techniques and Strategies
Journal citation12 (4), pp. 287-298
ISSN 1755-8050
Year2020
PublisherInderscience
Accepted author manuscript
License
CC BY-NC-ND
File Access Level
Anyone
Digital Object Identifier (DOI)https://doi.org/10.1504/IJDATS.2020.111498
Publication dates
Online25 Nov 2020
Publication process dates
Deposited04 Jul 2023
Copyright holder© 2023, The Author
Permalink -

https://repository.uel.ac.uk/item/8w3qw

Download files


Accepted author manuscript
X KARAMI_211844.pdf
License: CC BY-NC-ND
File access level: Anyone

  • 90
    total views
  • 67
    total downloads
  • 1
    views this month
  • 1
    downloads this month

Export as

Related outputs

Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing
Karami, A., Shemshaki, M. and Ghazanfar, M. 2024. Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing. Data Intelligence. p. In Press. https://doi.org/10.3724/2096-7004.di.2024.0055
Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data
Amirhosseini, M. H., Ayodele, A. L. and Karami, A. 2024. Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data. IS'24: 12th IEEE International Conference on Intelligent Systems. Varna, Bulgaria 29 - 31 Aug 2024 IEEE. https://doi.org/10.1109/IS61756.2024.10705185
A reinforcement learning recommender system using bi-clustering and Markov Decision Process
Iftikhar, A., Ghazanfar, M. A., Ayub, M., Alahmari, S. A., Qazi, N. and Wall, J. 2024. A reinforcement learning recommender system using bi-clustering and Markov Decision Process. Expert Systems with Applications. 237 (Art.), p. 121541. https://doi.org/10.1016/j.eswa.2023.121541
Shifting the Weight: Applications of AI in Olympic Weightlifting
Bolarinwa, D., Qazi, N. and Ghazanfar, M. 2023. Shifting the Weight: Applications of AI in Olympic Weightlifting. PRDC 2023: 28th IEEE Pacific Rim International Symposium on Dependable Computing. Singapore 24 - 27 Oct 2023 IEEE. https://doi.org/10.1109/PRDC59308.2023.00051
Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis
Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis. Machine Learning and Applications: An International Journal (MLAIJ). 10 (2/3), pp. 1-15. https://doi.org/10.5121/mlaij.2023.10301
Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment
Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment . 5th International Conference on Machine Learning & Applications (CMLA 2023). Sydney, Australia 17 - 18 Jun 2023 AIRCC Publishing Corporation.
Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
Chaudhury, M., Karami, A. and Ghazanfar, M. A. 2022. Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics. 11 (16), p. 2567. https://doi.org/10.3390/electronics11162567
A novel DeepMaskNet model for face mask detection and masked facial recognition
Ullah, N., Javed, A., Ghazanfar, M., Alsufyani, A. and Bourouis, S. 2022. A novel DeepMaskNet model for face mask detection and masked facial recognition. Journal of King Saud University - Computer and Information Sciences. 30 (10-B), pp. 9905-9914. https://doi.org/10.1016/j.jksuci.2021.12.017
Designing a Cost-Efficient Network for a Small Enterprise
Jafari, F., Karami, A. and Osemwengie, L. 2021. Designing a Cost-Efficient Network for a Small Enterprise. SAI Computing Conference 2021. Online 15 - 16 Jul 2021 Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_14
Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System
Kure, H. I., Islam, S., Ghazanfar, M., Raza, A. and Pasha, M. 2021. Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System. Neural Computing and Applications. 34, p. 493–514. https://doi.org/10.1007/s00521-021-06400-0
Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs)
Khalid, A., Lundqvist, K., Yates, A. and Ghazanfar, M. 2021. Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs). PLoS ONE. 16 (Art. e0245485). https://doi.org/10.1371/journal.pone.0245485
Stock market prediction using machine learning classifiers and social media, news
Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S. 2020. Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing. 13, pp. 3433-3456. https://doi.org/10.1007/s12652-020-01839-w
Identifying Users with Wearable Sensors based on Activity Patterns
Ehatisham-ul-Haq, M., Malik, M. N., Azam, M. A., Naeem, U., Khalid, A. and Ghazanfar, M. 2020. Identifying Users with Wearable Sensors based on Activity Patterns. The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020). Madeira, Portugal 02 - 05 Nov 2020 Elsevier. https://doi.org/10.1016/j.procs.2020.10.005
Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems
Ayub, M., Ghazanfar, M., Mehmood, Z., Saba, T., Alharbey, R., Munshi, A. M. and Alrige, M. A. 2019. Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems. PLoS ONE. 14 (Art. e0220129). https://doi.org/10.1371/journal.pone.0220129
Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm
Iqbal, Misbah, Ghazanfar, M., Sattar, Asma, Maqsood, Muazzam, Khan, Salabat, Mehmood, Irfan and Baik, Sung Wook 2019. Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm. IEEE Access. 7, pp. 24719-24737. https://doi.org/10.1109/ACCESS.2019.2897003
An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities
Karami, A. 2018. An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities. Expert Systems with Applications. 108, pp. 36-60. https://doi.org/10.1016/j.eswa.2018.04.038
A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets
Khan, U., Aadil, F., Ghazanfar, M., Khan, S., Metawa, N., Muhammad, K., Mehmood, I. and Nam, Y. 2018. A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets. Sustainability. 10 (Art. 3702). https://doi.org/10.3390/su10103702
Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network
Sharif, M., Naeem, U., Islam, S. and Karami, A. 2018. Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01057-7_34
The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain
Okoye, Kingsley, Islam, S., Naeem, U., Sharif, M., Azam, Muhammad Awais and Karami, A. 2018. The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_96
A Framework for Uncertainty-Aware Visual Analytics in Big Data
Karami, A. 2015. A Framework for Uncertainty-Aware Visual Analytics in Big Data. CEUR Workshop Proceedings. 1510, pp. 146-155.
Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options
Karami, A. and Johansson, Ronnie 2013. Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options. Journal of Information Science and Engineering. 30 (2), pp. 519-534.
Choosing DBSCAN parameters automatically using differential evolution
Karami, A. and Johansson, Ronnie 2014. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 91 (7), pp. 1-11. https://doi.org/10.5120/15890-5059
A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks
Karami, A. and Guerrero-Zapata, Manel 2014. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks. Neurocomputing. 149 (Part C), pp. 1253-1269. https://doi.org/10.1016/j.neucom.2014.08.070
A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking
Karami, A. and Guerrero-Zapata, Manel 2014. A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking. Neurocomputing. 151 (3), pp. 1262-1282. https://doi.org/10.1016/j.neucom.2014.11.003
An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking
Karami, A. and Guerrero-Zapata, Manel 2015. An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking. Computer Networks. 80 (April), pp. 51-65. https://doi.org/10.1016/j.comnet.2015.01.020
ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network
Karami, A. 2015. ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network. Journal of Network and Computer Applications. 56 (Oct.), pp. 1-18. https://doi.org/10.1016/j.jnca.2015.05.017
A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks
Siddiqui, A., Karami, A. and Johnson, M. O. 2017. A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks. International Journal of Computer Applications. 174 (Art. 4). https://doi.org/10.5120/ijca2017915376