Anomaly detection through spatio-temporal data mining, with application to near real-time outlying sensor identification
Galarus, Douglas Edward
MetadataShow full item record
There is a need for robust solutions to the challenges of near real-time spatio-temporal outlier and anomaly detection. In our dissertation, we define and demonstrate quality measures for evaluation and comparison of overlapping, real-time, spatio-temporal data providers and for assessment and optimization of data acquisition, system operation and data redistribution. Our measures are tested on real-world data and applications, and our results show the need and potential to develop our own mechanisms for outlier and anomaly detection. We then develop a representative, near real-time solution for the identification of outlying sensors that far outperforms state of the art methods in terms of accuracy and is computationally efficient. When applied to a real-world, meteorological data set, we identify numerous problematic sites that otherwise have not been flagged as bad. We identify sites for which metadata is incorrect. We identify observations that have been mislabeled by provider quality control processes. And, we demonstrate that our method outperforms enhanced versions of state of the art methods for assessment of accuracy using comparable or less computation time. There are many quality-related problems with real data sets and, in the absence of an approach like ours, these problems may have largely gone unidentified. Our approach is novel for the simple but effective way that it accounts for spatial and temporal variation, and that it addresses more than just accuracy. Collectively these contributions form an overarching data-mining framework and example that can be used and extended for data-mining method development, model building and evaluation of spatio-temporal outlier and anomaly detection processes.