Stock market prediction using machine learning classifiers and social media, news

Article

Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S. 2020. Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing. 13, pp. 3433-3456. https://doi.org/10.1007/s12652-020-01839-w

Publication dates
Authors	Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H. and Alfakeeh, A. S.
Abstract	Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble.
Keywords	Deep learning; Feature selection; Hybrid algorithm; Natural language processing; Predictive modeling; Sentiment analysis; Stock market prediction
Journal	Journal of Ambient Intelligence and Humanized Computing
Journal citation	13, pp. 3433-3456
ISSN	1868-5145
Year	2020
Publisher	Springer
Accepted author manuscript	Paper-AIHC-Stock Prediction using Social Media, News-Revised.pdf License Springer Nature Terms of Use for accepted manuscripts of subscription articles, books and chapters File Access Level Anyone
Digital Object Identifier (DOI)	https://doi.org/10.1007/s12652-020-01839-w
Online	14 Mar 2020
Print	Jul 2022
Publication process dates
Accepted	25 Feb 2020
Deposited	09 Oct 2023
Copyright holder	© 2020, The Authors
Additional information	This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s12652-020-01839-w

Permalink -

https://repository.uel.ac.uk/item/8w3qx

Download files

Accepted author manuscript

	Paper-AIHC-Stock Prediction using Social Media, News-Revised.pdf
License: Springer Nature Terms of Use for accepted manuscripts of subscription articles, books and chapters
File access level: Anyone

2438
total views
4707
total downloads
79
views this month
90
downloads this month

Export as

Related outputs

PSM: Proactive Spill Mitigation in PySpark

Karami, A. 2025. PSM: Proactive Spill Mitigation in PySpark. 12th IEEE International Conference on Data Science and Advanced Analytics. Birmingham, UK 09 - 12 Oct 2025 IEEE.

Adaptive Federated Learning for Anomaly Detection in Satellite Telemetry

Atefrad, A. and Karami, A. 2025. Adaptive Federated Learning for Anomaly Detection in Satellite Telemetry. 4th 2025 IEEE World Conference on Applied Intelligence and Computing (AIC 2025). 26 - 27 Jul 2025 Soft Computing Research Society (SCRS).

Edge computing in big data: challenges and benefits

Karami, A. and Karami, M. 2025. Edge computing in big data: challenges and benefits. International Journal of Data Science and Analytics. p. In press. https://doi.org/10.1007/s41060-025-00855-3

Enhancing Smart Contract Security: Static Heuristics and CodeBERT Embeddings

Soofiyan, S. and Karami, A. 2025. Enhancing Smart Contract Security: Static Heuristics and CodeBERT Embeddings. Applied Intelligence and Computing. The Institution of Electronics and Telecommunication Engineers (IETE), Delhi Centre, India 26 - 27 Jul 2025 IEEE.

WASPO: Workload-Aware Spark Performance Optimization Using NSGA-II

Karami, A. and Amirhosseini, M. 2025. WASPO: Workload-Aware Spark Performance Optimization Using NSGA-II. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11064004

Advancing Personality Type Prediction: Utilizing Enhanced Machine and Deep Learning Models with the Myers-Briggs Type Indicator

Amirhosseini, M., Karami, A. and Kalabi, F. 2025. Advancing Personality Type Prediction: Utilizing Enhanced Machine and Deep Learning Models with the Myers-Briggs Type Indicator. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11064294

Harnessing Social Media Sentiment for Predictive Insights into the Nigerian Presidential Election

Alao, J. O., Amirhosseini, M., Karami, A. and Ghorashi, S. A. 2025. Harnessing Social Media Sentiment for Predictive Insights into the Nigerian Presidential Election. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11064202

AI-Driven Mortality Prediction in COVID-19 Patients Using Advanced Feature Selection

Rajakaruna, I., Amirhosseini, M., Li, Y., Karami, A. and Arachchillage, D. J. 2025. AI-Driven Mortality Prediction in COVID-19 Patients Using Advanced Feature Selection. Cognitive Models and Artificial Intelligence Conference. Prague-Czech Republic 13 - 14 Jun 2025 IEEE. https://doi.org/10.1109/AICCONF64766.2025.11063762

Harmony in Federated Learning: A Comprehensive Review of Techniques to Tackle Heterogeneity and Non-IID Data

Karami, M. and Karami, A. 2025. Harmony in Federated Learning: A Comprehensive Review of Techniques to Tackle Heterogeneity and Non-IID Data. Cluster Computing. p. In press.

The impact of big data characteristics on credit risk assessment

Karami, A. and Igbokwe, C. 2025. The impact of big data characteristics on credit risk assessment. International Journal of Data Science and Analytics. p. In press. https://doi.org/10.1007/s41060-025-00753-8

Ethereum Smart Contracts: A Hierarchical Analysis of Vulnerability Challenges and Mitigation Strategies

Soofiyan, S. and Karami, A. 2025. Ethereum Smart Contracts: A Hierarchical Analysis of Vulnerability Challenges and Mitigation Strategies. Cluster Computing. p. In press.

Leveraging Big Data Characteristics for Enhanced Healthcare Fraud Detection

Karami, A. and Jafari, F. 2025. Leveraging Big Data Characteristics for Enhanced Healthcare Fraud Detection. Cluster Computing. 28 (Art. 349). https://doi.org/10.1007/s10586-024-05097-9

Breaking Down SEO Complexity: Bridging PCA and Bayesian-Optimized t-SNE

Karami, A., Ghasemabadi, S. F. and Amirhosseini, M. 2024. Breaking Down SEO Complexity: Bridging PCA and Bayesian-Optimized t-SNE. 2024 IEEE International Conference on Big Knowledge (ICBK). IEEE. https://doi.org/10.1109/ICKG63256.2024.00028

Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing

Karami, A., Shemshaki, M. and Ghazanfar, M. 2024. Exploring the Ethical Implications of AI-Powered Personalization in Digital Marketing. Data Intelligence. p. In Press. https://doi.org/10.3724/2096-7004.di.2024.0055

Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data

Amirhosseini, M. H., Ayodele, A. L. and Karami, A. 2024. Prediction of Depression Severity and Personalised Risk Factors Using Machine Learning on Multimodal Data. IS'24: 12th IEEE International Conference on Intelligent Systems. Varna, Bulgaria 29 - 31 Aug 2024 IEEE. https://doi.org/10.1109/IS61756.2024.10705185

A reinforcement learning recommender system using bi-clustering and Markov Decision Process

Iftikhar, A., Ghazanfar, M. A., Ayub, M., Alahmari, S. A., Qazi, N. and Wall, J. 2024. A reinforcement learning recommender system using bi-clustering and Markov Decision Process. Expert Systems with Applications. 237 (Art.), p. 121541. https://doi.org/10.1016/j.eswa.2023.121541

Shifting the Weight: Applications of AI in Olympic Weightlifting

Bolarinwa, D., Qazi, N. and Ghazanfar, M. 2023. Shifting the Weight: Applications of AI in Olympic Weightlifting. PRDC 2023: 28th IEEE Pacific Rim International Symposium on Dependable Computing. Singapore 24 - 27 Oct 2023 IEEE. https://doi.org/10.1109/PRDC59308.2023.00051

Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis

Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis. Machine Learning and Applications: An International Journal (MLAIJ). 10 (2/3), pp. 1-15. https://doi.org/10.5121/mlaij.2023.10301

Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment

Bhatt, S., Ghazanfar, M. and Amirhosseini, M. 2023. Machine Learning based Cryptocurrency Price Prediction using historical data and Social Media Sentiment . 5th International Conference on Machine Learning & Applications (CMLA 2023). Sydney, Australia 17 - 18 Jun 2023 AIRCC Publishing Corporation.

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Chaudhury, M., Karami, A. and Ghazanfar, M. A. 2022. Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics. 11 (16), p. 2567. https://doi.org/10.3390/electronics11162567

A novel DeepMaskNet model for face mask detection and masked facial recognition

Ullah, N., Javed, A., Ghazanfar, M., Alsufyani, A. and Bourouis, S. 2022. A novel DeepMaskNet model for face mask detection and masked facial recognition. Journal of King Saud University - Computer and Information Sciences. 30 (10-B), pp. 9905-9914. https://doi.org/10.1016/j.jksuci.2021.12.017

Designing a Cost-Efficient Network for a Small Enterprise

Jafari, F., Karami, A. and Osemwengie, L. 2021. Designing a Cost-Efficient Network for a Small Enterprise. SAI Computing Conference 2021. Online 15 - 16 Jul 2021 Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_14

Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System

Kure, H. I., Islam, S., Ghazanfar, M., Raza, A. and Pasha, M. 2021. Asset Criticality and Risk Prediction for an Effective Cyber Security Risk Management of Cyber Physical System. Neural Computing and Applications. 34, p. 493–514. https://doi.org/10.1007/s00521-021-06400-0

Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs)

Khalid, A., Lundqvist, K., Yates, A. and Ghazanfar, M. 2021. Novel online Recommendation algorithm for Massive Open Online Courses (NoR-MOOCs). PLoS ONE. 16 (Art. e0245485). https://doi.org/10.1371/journal.pone.0245485

A novel centroids initialisation for K-means clustering in the presence of benign outliers

Karami, A., Ur Rehman, S. and Ghazanfar, M. 2020. A novel centroids initialisation for K-means clustering in the presence of benign outliers. International Journal of Data Analysis Techniques and Strategies. 12 (4), pp. 287-298. https://doi.org/10.1504/IJDATS.2020.111498

Identifying Users with Wearable Sensors based on Activity Patterns

Ehatisham-ul-Haq, M., Malik, M. N., Azam, M. A., Naeem, U., Khalid, A. and Ghazanfar, M. 2020. Identifying Users with Wearable Sensors based on Activity Patterns. The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020). Madeira, Portugal 02 - 05 Nov 2020 Elsevier. https://doi.org/10.1016/j.procs.2020.10.005

Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems

Ayub, M., Ghazanfar, M., Mehmood, Z., Saba, T., Alharbey, R., Munshi, A. M. and Alrige, M. A. 2019. Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems. PLoS ONE. 14 (Art. e0220129). https://doi.org/10.1371/journal.pone.0220129

Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm

Iqbal, Misbah, Ghazanfar, M., Sattar, Asma, Maqsood, Muazzam, Khan, Salabat, Mehmood, Irfan and Baik, Sung Wook 2019. Kernel Context Recommender System (KCR): A Scalable Context-Aware Recommender System Algorithm. IEEE Access. 7, pp. 24719-24737. https://doi.org/10.1109/ACCESS.2019.2897003

An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities

Karami, A. 2018. An Anomaly-based Intrusion Detection System in Presence of Benign Outliers with Visualization Capabilities. Expert Systems with Applications. 108, pp. 36-60. https://doi.org/10.1016/j.eswa.2018.04.038

A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets

Khan, U., Aadil, F., Ghazanfar, M., Khan, S., Metawa, N., Muhammad, K., Mehmood, I. and Nam, Y. 2018. A Robust Regression-Based Stock Exchange Forecasting and Determination of Correlation between Stock Markets. Sustainability. 10 (Art. 3702). https://doi.org/10.3390/su10103702

Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network

Sharif, M., Naeem, U., Islam, S. and Karami, A. 2018. Functional Connectivity Evaluation for Infant EEG Signals based on Artificial Neural Network. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01057-7_34

The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain

Okoye, Kingsley, Islam, S., Naeem, U., Sharif, M., Azam, Muhammad Awais and Karami, A. 2018. The Application of a Semantic-Based Process Mining Framework on a Learning Process Domain. Arai, Kohei, Kapoor, Supriya and Bhatia, Rahul (ed.) Intelligent Systems Conference (IntelliSys) 2018. London, UK 06 - 07 Sep 2018 Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_96

A Framework for Uncertainty-Aware Visual Analytics in Big Data

Karami, A. 2015. A Framework for Uncertainty-Aware Visual Analytics in Big Data. CEUR Workshop Proceedings. 1510, pp. 146-155.

Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options

Karami, A. and Johansson, Ronnie 2013. Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options. Journal of Information Science and Engineering. 30 (2), pp. 519-534.

Choosing DBSCAN parameters automatically using differential evolution

Karami, A. and Johansson, Ronnie 2014. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 91 (7), pp. 1-11. https://doi.org/10.5120/15890-5059

A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks

Karami, A. and Guerrero-Zapata, Manel 2014. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks. Neurocomputing. 149 (Part C), pp. 1253-1269. https://doi.org/10.1016/j.neucom.2014.08.070

A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking

Karami, A. and Guerrero-Zapata, Manel 2014. A hybrid multiobjective RBF-PSO method for mitigating DoS attacks in Named Data Networking. Neurocomputing. 151 (3), pp. 1262-1282. https://doi.org/10.1016/j.neucom.2014.11.003

An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking

Karami, A. and Guerrero-Zapata, Manel 2015. An ANFIS-based cache replacement method for mitigating cache pollution attacks in Named Data Networking. Computer Networks. 80 (April), pp. 51-65. https://doi.org/10.1016/j.comnet.2015.01.020

ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network

Karami, A. 2015. ACCPndn: Adaptive Congestion Control Protocol in Named Data Networking by learning capacities using optimized Time-Lagged Feedforward Neural Network. Journal of Network and Computer Applications. 56 (Oct.), pp. 1-18. https://doi.org/10.1016/j.jnca.2015.05.017

A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks

Siddiqui, A., Karami, A. and Johnson, M. O. 2017. A Wormhole Attack Detection and Prevention Technique in Wireless Sensor Networks. International Journal of Computer Applications. 174 (Art. 4). https://doi.org/10.5120/ijca2017915376

Stock market prediction using machine learning classifiers and social media, news

Download files

Accepted author manuscript

2438

4707

79

90

Export as

Related outputs