Handling Imbalanced Classes: Feature Based Variance Ranking Techniques for Classification

PhD Thesis


Ebenuwa, S. 2019. Handling Imbalanced Classes: Feature Based Variance Ranking Techniques for Classification. PhD Thesis University of East London School of Architecture, Computing and Engineering
AuthorsEbenuwa, S.
TypePhD Thesis
Abstract

To obtain good predictions in the presence of imbalance classes has posed significant challenges in the data science community. Imbalanced classed data is a term used to describe a situation where there are unequal number of classes or groups in datasets. In most real-life datasets one of the classes are always higher in number than others and is called the majority class, while the smaller classes are called the minority class. During classifications even with very high accuracy, the classified minority groups are usually very small when compared to the total number of minority in the datasets and more often than not, the minority classes are what is being sought. This work is specifically concern with providing techniques to improve classifications
performance by eliminating or reducing negative effects of class imbalance. Real-life datasets have been found to contain different types of error in combination with
class imbalance. While these errors are easily corrected, but the solutions to class imbalance have remained elusive.
Previously, machine learning (ML) technique has been used to solve the problems of class imbalanced. There are notable shortcomings that have been identified while using this technique. Mostly, it involve fine-tuning and changing parameters of the algorithms and this process is not standardised because of countless numbers of algorithms and parameters. In general, the results obtained from these unstandardised (ML) technique are very inconsistent and cannot be replicated with similar datasets and algorithms.
We present a novel technique for dealing with imbalanced classes called variance ranking features selection, that enables machine learning algorithms to classify more
of minority classes during classification, hence reducing the negative effects of class imbalance. Our approaches utilised the intrinsic property of the datasets called
the variance. As the variance is one of the measures of central tendency of the data items concentration within the datasets vector space. We demonstrated the selections of features at different level of performance threshold thereby providing an opportunity for performance and feature significance to be assessed and correlated at different levels of prediction. In the evaluations we compared our features selections with some of the best known features selections techniques using proximity distance comparison techniques and verify all the results with different datasets, both binary and multi classed with varying degree of class imbalance. In all the experiments, the results we obtained showed a significant improvement when compared with other previous work in class imbalance.

Year2019
PublisherUniversity of East London
Digital Object Identifier (DOI)doi:10.15123/uel.88183
File
License
File Access Level
Anyone
Publication dates
PrintSep 2019
Publication process dates
Deposited12 Jun 2020
Permalink -

https://repository.uel.ac.uk/item/88183

Download files

File
2019_PhD_Ebenuwa.pdf
License: CC BY-NC-ND 4.0
File access level: Anyone

  • 21
    total views
  • 15
    total downloads
  • 7
    views this month
  • 7
    downloads this month

Export as

Related outputs

Variance Ranking for Multi-Classed Imbalanced Datasets: A Case Study of One-Versus-All
Ebenuwa, S., Sharif, S., Al-Nemrat, A., Al-Bayatti, A. H., Alalwan, N., Alzahrani, A. I. and Alfarraj, O. 2019. Variance Ranking for Multi-Classed Imbalanced Datasets: A Case Study of One-Versus-All. Symmetry. 11 (Art. 1504).
Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data
Ebenuwa, S., Sharif, M., Alazab, Mamoun and Al-Nemrat, A. 2019. Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data. IEEE Access. 7, pp. 24649-24666.