Data Quality Management in Large-Scale Cyber-Physical Systems

PhD Thesis


Alwan, A. 2021. Data Quality Management in Large-Scale Cyber-Physical Systems. PhD Thesis University of East London School of Architecture, Computing and Engineering https://doi.org/10.15123/uel.8990y
AuthorsAlwan, A.
TypePhD Thesis
Abstract

Cyber-Physical Systems (CPSs) are cross-domain, multi-model, advance information systems that play a significant role in many large-scale infrastructure sectors of smart cities public services such as traffic control, smart transportation control, and environmental and noise monitoring systems. Such systems, typically, involve a substantial number of sensor nodes and other devices that stream and exchange data in real-time and usually are deployed in uncontrolled, broad environments.
Thus, unexpected measurements may occur due to several internal and external factors, including noise, communication errors, and hardware failures, which may compromise these systems quality of data and raise serious concerns related to safety, reliability, performance, and security. In all cases, these unexpected measurements need to be carefully interpreted and managed based on domain knowledge and computational models.
Therefore, in this research, data quality challenges were investigated, and a comprehensive, proof of concept, data quality management system was developed to tackle unaddressed data quality challenges in large-scale CPSs. The data quality management system was designed to address data quality challenges associated with detecting: sensor nodes measurement errors, sensor nodes hardware failures, and mismatches in sensor nodes spatial and temporal contextual attributes. Detecting sensor nodes measurement errors associated with the primary data quality dimensions of accuracy, timeliness, completeness, and consistency in large-scale CPSs were investigated using predictive and anomaly analysis models via utilising statistical and machine-learning techniques. Time-series clustering techniques were investigated as a feasible mean for detecting long-segmental outliers as an indicator of sensor nodes’ continuous halting and incipient hardware failures. Furthermore, the quality of the spatial and temporal contextual attributes of sensor nodes observations was investigated using timestamp analysis techniques.
The different components of the data quality management system were tested and calibrated using benchmark time-series collected from a high-quality, temperature sensor network deployed at the University of East London. Furthermore, the effectiveness of the proposed data quality management system was evaluated using a real-world, large-scale environmental monitoring network consisting of more than 200 temperature sensor nodes distributed around London.
The data quality management system achieved high accuracy detection rate using LSTM predictive analysis technique and anomaly detection associated with DBSCAN. It successfully identified timeliness and completeness errors in sensor nodes’ measurements using periodicity analysis combined with a rule engine. It achieved up to 100% accuracy in detecting potentially failed sensor nodes using the characteristic-based time-series clustering technique when applied to two days or longer time-series window. Timestamp analysis was adopted effectively for evaluating the quality of temporal and spatial contextual attributes of sensor nodes observations, but only within CPS applications in which using gateway modules is possible.

Year2021
PublisherUniversity of East London
Digital Object Identifier (DOI)https://doi.org/10.15123/uel.8990y
File
License
File Access Level
Anyone
Publication dates
Online19 Jul 2021
Publication process dates
SubmittedMay 2021
Deposited20 Jul 2021
Copyright holder© The Author
Permalink -

https://repository.uel.ac.uk/item/8990y

Download files


File
2021_PhD_Alwan.pdf
License: CC BY-NC-ND 4.0
File access level: Anyone

  • 298
    total views
  • 947
    total downloads
  • 1
    views this month
  • 8
    downloads this month

Export as

Related outputs

HADES: a Hybrid Anomaly Detection System for Large-Scale Cyber-Physical Systems
Alwan, A., Baravalle, A., Ciupala, A. and Falcarin, P. 2020. HADES: a Hybrid Anomaly Detection System for Large-Scale Cyber-Physical Systems. FMEC2020: The Fifth International Conference on Fog and Mobile Edge Computing. Paris, FR 30 Jun - 03 Jul 2020 IEEE. https://doi.org/10.1109/FMEC49853.2020.9144751