Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data
Ebenuwa, S., Sharif, M., Alazab, Mamoun and Al-Nemrat, A. 2019. Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data. IEEE Access. 7, pp. 24649-24666. https://doi.org/10.1109/ACCESS.2019.2899578
|Ebenuwa, S., Sharif, M., Alazab, Mamoun and Al-Nemrat, A.
Data are being generated and used to support all aspects of healthcare provision, from policy formation to the delivery of primary care services. Particularly, with the change of emphasis from curative to preventive medicine, the importance of data-based research such as data mining and machine learning has emphasized the issues of class distributions in datasets. In typical predictive modeling, the inability to effectively address a class imbalance in a real-life dataset is an important shortcoming of the existing machine learning algorithms. Most algorithms assume a balanced class in their design, resulting in poor performance in predicting the minority target class. Ironically, the minority target class is usually the focus in predicting processes. The misclassification of the minority target class has resulted in serious consequences in detecting chronic diseases and detecting fraud and intrusion where positive cases are erroneously predicted as not positive. This paper presents a new attribute selection technique called variance ranking for handling imbalance class problems in a dataset. The results obtained were compared to two well-known attribute selection techniques: the Pearson correlation and information gain technique. This paper uses a novel similarity measurement technique ranked order similarity-ROS to evaluate the variance ranking attribute selection compared to the Pearson correlations and information gain. Further validation was carried out using three binary classifications: logistic regression, support vector machine, and decision tree. The proposed variance ranking and ranked order similarity techniques showed better results than the benchmarks. The ROS technique provided an excellent means of grading and measuring the similarities where other similarity measurement techniques were inadequate or not applicable.
|7, pp. 24649-24666
File Access Level
|Digital Object Identifier (DOI)
|Web address (URL)
|25 Feb 2019
|Publication process dates
|28 Mar 2019
|21 Jan 2019
|21 Jan 2019
|© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
|All rights reserved
7views this month
0downloads this month