Machine Learning Enabled Data Quality (DQ) Assessment Framework for connected vehicles data
Prof Doc Thesis
Wondie, M. 2025. Machine Learning Enabled Data Quality (DQ) Assessment Framework for connected vehicles data. Prof Doc Thesis Univeristy of East London Architecture, Computing & Engineering https://doi.org/10.15123/uel.8z326
Authors | Wondie, M. |
---|---|
Type | Prof Doc Thesis |
Abstract | Connected vehicles leverage innovations in sensors, IoT, cloud computing, AI, and 4G/5G to produce real-time vehicle data, enhancing applications in navigation, fleet management, diagnostics, and maintenance; improving cost-efficiency, revenue, customer satisfaction, and safety. However, maintaining data quality in connected vehicles is challenging. Classical data quality assessment frameworks are inadequate for the complexity of connected vehicles, necessitating improved methods for assessing data quality in this domain. This research integrates machine learning and statistical methods with classical frameworks to enhance data quality assessment. A literature review identifies data quality challenges, existing frameworks, their strengths and limitations. Implementing a classical framework with real-world connected vehicle data uncovers issues like missing, delayed, and invalid data but fails to answer some data quality requirements, which are identified as gaps leading to the development of three scenarios to leverage machine learning. Scenario I uses logistic regression to detect non-communicating vehicles addressing delayed and missing data issues.Scenario II forecasts missing mileage using a time series method. Scenario III assesses data accuracy using Light Gradient-Boosting Machine and Random Forest. The implementation of these scenarios provided promising results. Scenario I detects noncommunicatingvehicles with F1-score of 0.85. Scenario II forecasts missing mileage with lower RMSE compared to state-of-the-art methods. Scenario III detects inaccurate fuel consumption with 97% accuracy and F1-score of 0.78, outperforming Isolation Forest. In conclusion, implementing a classical data quality assessment framework with real-life vehicle data highlights various data quality issues and reveals certain limitations. Machine learning and statistical methods help to address these limitations. Therefore, a new framework that integrates classical data quality assessment with machine learning for connected vehicles |
Year | 2025 |
Publisher | University of East London |
Digital Object Identifier (DOI) | https://doi.org/10.15123/uel.8z326 |
File | License File Access Level Anyone |
Publication dates | |
Online | 18 Mar 2025 |
Publication process dates | |
Completed | 20 Feb 2025 |
Deposited | 18 Mar 2025 |
Copyright holder | © 2023 The Author. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Licence (https://creativecommons.org/licenses/by-nc-nd/4.0). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. |
https://repository.uel.ac.uk/item/8z326
Download files
19
total views13
total downloads19
views this month13
downloads this month