Machine Learning Enabled Data Quality (DQ) Assessment Framework for connected vehicles data

Prof Doc Thesis


Wondie, M. 2025. Machine Learning Enabled Data Quality (DQ) Assessment Framework for connected vehicles data. Prof Doc Thesis Univeristy of East London Architecture, Computing & Engineering https://doi.org/10.15123/uel.8z326
AuthorsWondie, M.
TypeProf Doc Thesis
Abstract

Connected vehicles leverage innovations in sensors, IoT, cloud computing, AI, and 4G/5G to produce real-time vehicle data, enhancing applications in navigation, fleet management, diagnostics, and maintenance; improving cost-efficiency, revenue, customer satisfaction, and safety.

However, maintaining data quality in connected vehicles is challenging. Classical data quality assessment frameworks are inadequate for the complexity of connected vehicles, necessitating improved methods for assessing data quality in this domain.

This research integrates machine learning and statistical methods with classical frameworks to enhance data quality assessment. A literature review identifies data quality challenges, existing frameworks, their strengths and limitations. Implementing a classical framework with real-world connected vehicle data uncovers issues like missing, delayed, and invalid data but fails to answer some data quality requirements, which are identified as gaps leading to the development of three scenarios to leverage machine learning. Scenario I uses logistic regression to detect non-communicating vehicles addressing delayed and missing data issues.Scenario II forecasts missing mileage using a time series method. Scenario III assesses data accuracy using Light Gradient-Boosting Machine and Random Forest.

The implementation of these scenarios provided promising results. Scenario I detects noncommunicatingvehicles with F1-score of 0.85. Scenario II forecasts missing mileage with lower RMSE compared to state-of-the-art methods. Scenario III detects inaccurate fuel consumption with 97% accuracy and F1-score of 0.78, outperforming Isolation Forest.

In conclusion, implementing a classical data quality assessment framework with real-life vehicle data highlights various data quality issues and reveals certain limitations. Machine learning and statistical methods help to address these limitations. Therefore, a new framework that integrates classical data quality assessment with machine learning for connected vehicles
data is proposed.

Year2025
PublisherUniversity of East London
Digital Object Identifier (DOI)https://doi.org/10.15123/uel.8z326
File
License
File Access Level
Anyone
Publication dates
Online18 Mar 2025
Publication process dates
Completed20 Feb 2025
Deposited18 Mar 2025
Copyright holder© 2023 The Author. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Licence (https://creativecommons.org/licenses/by-nc-nd/4.0). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms.
Permalink -

https://repository.uel.ac.uk/item/8z326

Download files


File
2023_D.DataSc_Wondie.pdf
License: CC BY-NC-ND 4.0
File access level: Anyone

  • 19
    total views
  • 13
    total downloads
  • 19
    views this month
  • 13
    downloads this month

Export as

Related outputs

Improving data quality assessment of connected vehicles data with machine learning and statistical methods
Wall, J., Wondie, M. and Li, Y. 2022. Improving data quality assessment of connected vehicles data with machine learning and statistical methods. Pan African Conference on Artifical Intelligence 2022. 04 - 05 Oct 2022