Stock market prediction using machine learning classifiers and social media, news

Article


Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S. 2020. Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing. 13, pp. 3433-3456. https://doi.org/10.1007/s12652-020-01839-w
AuthorsKhan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S.
Abstract

Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble.

KeywordsDeep learning; Feature selection; Hybrid algorithm; Natural language processing; Predictive modeling; Sentiment analysis; Stock market prediction
JournalJournal of Ambient Intelligence and Humanized Computing
Journal citation13, pp. 3433-3456
ISSN1868-5145
Year2020
PublisherSpringer
Accepted author manuscript
License
File Access Level
Anyone
Digital Object Identifier (DOI)https://doi.org/10.1007/s12652-020-01839-w
Publication dates
Online14 Mar 2020
PrintJul 2022
Publication process dates
Accepted25 Feb 2020
Deposited09 Oct 2023
Copyright holder© 2020, The Authors
Additional information

This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s12652-020-01839-w

Permalink -

https://repository.uel.ac.uk/item/8w3qx

Download files


Accepted author manuscript
Paper-AIHC-Stock Prediction using Social Media, News-Revised.pdf
License: Springer Nature Terms of Use for accepted manuscripts of subscription articles, books and chapters
File access level: Anyone

  • 589
    total views
  • 2948
    total downloads
  • 66
    views this month
  • 154
    downloads this month

Export as

Related outputs

Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing
Karami, A., Shemshaki, M. and Ghazanfar, M. 2024. Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing. Data Intelligence. p. In Press. https://doi.org/10.3724/2096-7004.di.2024.0055
Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data
Amirhosseini, M. H., Ayodele, A. L. and Karami, A. 2024. Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data. IS'24: 12th IEEE International Conference on Intelligent Systems. Varna, Bulgaria 29 - 31 Aug 2024 IEEE. https://doi.org/10.1109/IS61756.2024.10705185
A reinforcement learning recommender system using bi-clustering and Markov Decision Process
Iftikhar, A., Ghazanfar, M. A., Ayub, M., Alahmari, S. A., Qazi, N. and Wall, J. 2024. A reinforcement learning recommender system using bi-clustering and Markov Decision Process. Expert Systems with Applications. 237 (Art.), p. 121541. https://doi.org/10.1016/j.eswa.2023.121541
Shifting the Weight: Applications of AI in Olympic Weightlifting
Bolarinwa, D., Qazi, N. and Ghazanfar, M. 2023. Shifting the Weight: Applications of AI in Olympic Weightlifting. PRDC 2023: 28th IEEE Pacific Rim International Symposium on Dependable Computing. Singapore 24 - 27 Oct 2023 IEEE. https://doi.org/10.1109/PRDC59308.2023.00051
Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis
Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis. Machine Learning and Applications: An International Journal (MLAIJ). 10 (2/3), pp. 1-15. https://doi.org/10.5121/mlaij.2023.10301
Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment
Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment . 5th International Conference on Machine Learning & Applications (CMLA 2023). Sydney, Australia 17 - 18 Jun 2023 AIRCC Publishing Corporation.
Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
Chaudhury, M., Karami, A. and Ghazanfar, M. A. 2022. Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics. 11 (16), p. 2567. https://doi.org/10.3390/electronics11162567
A novel DeepMaskNet model for face mask detection and masked facial recognition
Ullah, N., Javed, A., Ghazanfar, M., Alsufyani, A. and Bourouis, S. 2022. A novel DeepMaskNet model for face mask detection and masked facial recognition. Journal of King Saud University - Computer and Information Sciences. 30 (10-B), pp. 9905-9914. https://doi.org/10.1016/j.jksuci.2021.12.017
Designing a Cost-Efficient Network for a Small Enterprise
Jafari, F., Karami, A. and Osemwengie, L. 2021. Designing a Cost-Efficient Network for a Small Enterprise. SAI Computing Conference 2021. Online 15 - 16 Jul 2021 Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_14
Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System
Kure, H. I., Islam, S., Ghazanfar, M., Raza, A. and Pasha, M. 2021. Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System. Neural Computing and Applications. 34, p. 493–514. https://doi.org/10.1007/s00521-021-06400-0
Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs)
Khalid, A., Lundqvist, K., Yates, A. and Ghazanfar, M. 2021. Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs). PLoS ONE. 16 (Art. e0245485). https://doi.org/10.1371/journal.pone.0245485
A novel centroids initialisation for K-means clustering in the presence of benign outliers
Karami, A., Ur Rehman, S. and Ghazanfar, M. 2020. A novel centroids initialisation for K-means clustering in the presence of benign outliers. International Journal of Data Analysis Techniques and Strategies. 12 (4), pp. 287-298. https://doi.org/10.1504/IJDATS.2020.111498
Identifying Users with Wearable Sensors based on Activity Patterns
Ehatisham-ul-Haq, M., Malik, M. N., Azam, M. A., Naeem, U., Khalid, A. and Ghazanfar, M. 2020. Identifying Users with Wearable Sensors based on Activity Patterns. The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020). Madeira, Portugal 02 - 05 Nov 2020 Elsevier. https://doi.org/10.1016/j.procs.2020.10.005
Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems
Ayub, M., Ghazanfar, M., Mehmood, Z., Saba, T., Alharbey, R., Munshi, A. M. and Alrige, M. A. 2019. Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems. PLoS ONE. 14 (Art. e0220129). https://doi.org/10.1371/journal.pone.0220129
Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm
Iqbal, Misbah, Ghazanfar, M., Sattar, Asma, Maqsood, Muazzam, Khan, Salabat, Mehmood, Irfan and Baik, Sung Wook 2019. Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm. IEEE Access. 7, pp. 24719-24737. https://doi.org/10.1109/ACCESS.2019.2897003
An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities
Karami, A. 2018. An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities. Expert Systems with Applications. 108, pp. 36-60. https://doi.org/10.1016/j.eswa.2018.04.038
A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets
Khan, U., Aadil, F., Ghazanfar, M., Khan, S., Metawa, N., Muhammad, K., Mehmood, I. and Nam, Y. 2018. A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets. Sustainability. 10 (Art. 3702). https://doi.org/10.3390/su10103702
Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network
Sharif, M., Naeem, U., Islam, S. and Karami, A. 2018. Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01057-7_34
The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain
Okoye, Kingsley, Islam, S., Naeem, U., Sharif, M., Azam, Muhammad Awais and Karami, A. 2018. The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_96
A Framework for Uncertainty-Aware Visual Analytics in Big Data
Karami, A. 2015. A Framework for Uncertainty-Aware Visual Analytics in Big Data. CEUR Workshop Proceedings. 1510, pp. 146-155.
Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options
Karami, A. and Johansson, Ronnie 2013. Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options. Journal of Information Science and Engineering. 30 (2), pp. 519-534.
Choosing DBSCAN parameters automatically using differential evolution
Karami, A. and Johansson, Ronnie 2014. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 91 (7), pp. 1-11. https://doi.org/10.5120/15890-5059
A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks
Karami, A. and Guerrero-Zapata, Manel 2014. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks. Neurocomputing. 149 (Part C), pp. 1253-1269. https://doi.org/10.1016/j.neucom.2014.08.070
A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking
Karami, A. and Guerrero-Zapata, Manel 2014. A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking. Neurocomputing. 151 (3), pp. 1262-1282. https://doi.org/10.1016/j.neucom.2014.11.003
An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking
Karami, A. and Guerrero-Zapata, Manel 2015. An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking. Computer Networks. 80 (April), pp. 51-65. https://doi.org/10.1016/j.comnet.2015.01.020
ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network
Karami, A. 2015. ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network. Journal of Network and Computer Applications. 56 (Oct.), pp. 1-18. https://doi.org/10.1016/j.jnca.2015.05.017
A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks
Siddiqui, A., Karami, A. and Johnson, M. O. 2017. A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks. International Journal of Computer Applications. 174 (Art. 4). https://doi.org/10.5120/ijca2017915376