Improving data quality assessment of connected vehicles data with machine learning and statistical methods
Conference paper
Wall, J., Wondie, M. and Li, Y. 2022. Improving data quality assessment of connected vehicles data with machine learning and statistical methods. Pan African Conference on Artifical Intelligence 2022. 04 - 05 Oct 2022
Authors | Wall, J., Wondie, M. and Li, Y. |
---|---|
Type | Conference paper |
Abstract | The connected vehicle is a fast-growing phenomenon enabling enterprises to generate new revenue streams, reduce costs and increase safety by utilizing the data collected. Quality data is a fundamental pre-requisite in the process of extracting the intended value. Therefore, developing data quality assessment methods is important. Classical data quality assessment methods are not good enough for connected vehicles data due to the complex nature those system, such as the spatio-temporal aspects. In this research, machine learning and statistical methods for data quality assessment are investigated. For this, two scenarios are selected. The first is to classify if a vehicle, which is not communicating (not sending data), is experiencing a real issue or simply properly parked and the power is off. If issues are detected earlier, measures can be taken to avoid delays or lose data, which can be formulated as timeliness and completeness of data quality dimensions. Using real life data, a new feature is constructed using DBSCAN and a logistic regression model is trained to identify real issues from false alarms, with a 0.76 F-score. The second scenario is to detect inaccurate fuel consumption (accuracy of data quality dimension). Using a public dataset, first a machine learning model is trained to predict fuel consumption. Then the difference between the actual and predicted value is calculated. A control chart is applied on the calculated difference and values which are out of the control are marked as inaccurate. This method can accurately detect 85% of inaccurate values correctly from the test dataset. |
Keywords | Connected Vehicles; Data Quality; Machine learning ; Statistical methods; Data Quality dimensions; DBSCAN; Logistic Regression; Control Chart |
Year | 2022 |
Conference | Pan African Conference on Artifical Intelligence 2022 |
Accepted author manuscript | License File Access Level Anyone |
Publication process dates | |
Accepted | 20 Sep 2022 |
Completed | 05 Oct 2022 |
Deposited | 07 Mar 2023 |
Copyright holder | © 2022, The Authors |
https://repository.uel.ac.uk/item/8vqwq
Download files
Accepted author manuscript
panafricon-ai-2022-Mulluken submit.pdf | ||
License: All rights reserved | ||
File access level: Anyone |
214
total views90
total downloads0
views this month0
downloads this month