Date of Award

2025

Type

Thesis

Major

Computer Science - Applied Computing Track

Degree Type

Master of Science in Applied Computer Science

Department

TSYS School of Computer Science

First Advisor

Dr. Mohamed Riduan Abid

Second Advisor

Dr. Yesem Kurt Perker

Third Advisor

Dr. Rania Hodhod

Abstract

Building energy load fault detection is a critical challenge in energy usage analysis. It helps uncover energy wastage, machinery/appliance degradation or inefficiency, and failures or faults in buildings’ HVAC (heating, ventilation, and air conditioning) systems. Early identification of machinery failure and energy wastages due to operational maintenance negligence in large sites such as campus buildings is indispensable for achieving energy efficiency. This is crucial for saving patrol and minimizing the response time to restore the building appliances or systems to their optimal state.

Advancements in state-of-the-art AI/ML data-driven algorithms and techniques enabled us to build accurate, efficient and scalable fault detection systems with research-backed results. This study leverages ML techniques to present a framework explicitly designed to operate effectively in unlabeled environments where ground-truth fault data is unavailable, in this domain of anomaly detection in the energy consumption of buildings. Prominent approaches for fault detection include XGBoost (Extreme Gradient Boosting) forecasting for anomaly detection based on forecast error, unsupervised clustering techniques, neural network algorithms for forecasting based (such as long short-term memory) and reconstruction error (using transformers and spectral residual based convolutional neural network), and hybrid composite solutions combining both supervised and unsupervised learning methods.

This research aims to study and compare solutions for fault detection using various statistical and unsupervised solutions that do not require fault labels to train and deploy in a production environment. We examine the performance of regression-integrated fault detection, probabilistic regression and matrix profile threshold approaches to determine a reliable, scalable, accurate, and efficient solution for building energy fault detection systems. We benchmark these algorithms on the publicly available Large-scale Energy Anomaly Detection (LEAD) dataset. The study also emphasizes the importance of thresholding strategies optimizations, such as global versus building-specific approaches.

Our Energy load prediction model experiments proved that a building wise trained XGBoost model with lag features ranked as the best prediction model with R2 . This demonstrates that ensemble machine learning offers the strongest accuracy. However, as the paradigm “no one size fits all” says, one single model cannot generalize effectively on the entire dataset. Buildingwise models' approach consistently outperformed the global model. Our study found that regression integrated fault detection with statistical methods like z-score (building-wise threshold) and IQR (global threshold) provided superior performance with an F1-score of 0.52 and 0.46 respectively. These methods outperformed probabilistic regression and matrix profile threshold methods, which both yielded F1-scores of less than 0.1, on evaluation against the complete LEAD dataset. The matrix profile approach struggled to distinguish precisely anomalies at the hourly level, which resulted in numerous false positives. Similarly, probabilistic regression requires further optimization to reduce false positives.

Share

COinS