Fault detection for air conditioning system using machine learning

ABSTRACT


INTRODUCTION
Buildings consumed about 40%-41% of the total energy consumption, which is more than the energy used in the transportation sector and the industrial sector [1]. Heating, ventilation, and airconditioning (HVAC) system is one of the leading energy consumers in the building and use up to 50% of the total building energy consumption [1][2]. In Malaysia, office buildings use the most energy for air conditioning system compared to other buildings such as hotels or shopping complexes [3][4]. In practice, the coefficient of performance (COP) is used as a measurement index to measure the performance of the air conditioning system [4]. The COP is a ratio of the rate of net heat removal of the rate of total energy input. The higher COPs equate to the lower operating costs. Thus, it can relate that the higher COPs mean less energy the system used. Fault in the air conditioning system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to lower COP and hence wasting the energy usage. Propose that energy can be saved about 20% to 30% when fixing the HVAC faults. Therefore, Fault detection and diagnostics (FDD) techniques can be used to observe the operation of the air-conditioning systems, and at the same time, can detect abnormalities or faults. Extensive researches of FDD have been done on HVAC for past years [5]. Three types of FDD techniques for building systems which are modelbased, rule-based and data-driven techniques [6]. The model-based method using the first principal and simplified lumped-parameter model to mathematically modeled the HVAC system. This method requires detail physical knowledge of the system. The drawbacks are it is complex to design and the developed system and the fault modeling are limited to that specific system only [7][8].
Meanwhile, in the rule-based method, expert knowledge is used to develop rules describing the system behaviour. Hence this method can restrict the portrayal of the system performance to some certain faults only. Therefore, combine both rule-based and data-driven methods to detect and diagnose faults in HVAC system. Results show the proposed method has much more diagnostic ability to identify and diagnose faults [9][10]. Lastly, the most popular method among previous researches is the data-driven method [6]. This technique does not require any physical nor expert knowledge to model the system. It uses historical data to train models, thus reduced the modeling complexity. Successfully implement a data-driven method to detect and diagnose faults in the air handling unit (AHU) [11][12][13].
However, there are a lot more previous researches regarding FDD in HVAC especially in chiller and AHU system, but until now, no analysis considers faults across the entire system [14]. Successfully implement the data-driven method using PCA to detect and diagnose faults in the chiller system [15]. Compares model-based and non-model-based diagnostic algorithms for (AHU) using Bayesian network diagnostic model [16]. Since the recent data-driven method requires a longer time for the computational process, [13] combined PCA and SVM to reduce the learning time for HVAC FDD. The proposed method was successfully tested on the commercial AHU system. Introduce Wavelet-PCA method to detect faults in AHU by eliminating the effect of weather changing conditions [11].
Several data-driven methods such as machine learning, artificial neural network (ANN) and support vector machine (SVM) are widely used in FDD. Support Vector Machine (SVM) is very efficient and widely used as a classifier [17]. In WEKA, the SVM classifier is also known as Sequential Minimal Optimization (SMO), which is a fast and straightforward method to train SVM [18]. Meanwhile, the Multilayer Perceptron (MLP) is supervised learning classifiers feed-forward back-propagation ANN, and the most frequently used in pattern recognition [17].
Propose fault detection and classification using deep learning in Tennessee Eastman (TE) process. They introduce 20 faults in the TE process and compare the performance of 6 classifiers. Results show deep learning method outperforms the other five methods [19]. Proposed FDD to identify abnormalities from normal operation and isolate variables related to faults in a chiller using PCA. The proposed method successfully identifies four faults in the chiller [15]. Compare the performances of decision tree, MLP, Naïve Bayes, SMO, and Instance-Based for K-Nearest Neighbour in detecting breast cancer. The results show SMO is the highest accuracy as a single classifier [17]. Successfully implement multi-layer perceptron (MLP) to detect high impedance faults in distribution networks [20]. Compare the performance of Naive Bayes, Random Forest, Logistic Regression, and MLPand KNN in predicting breast cancer using WEKA. The result shows that KNN is the most accurate classifier follow with MLP as the second most accurate classifier [21]. Compare the performance of a few algorithms to detect breast cancer, and the result shows SVM has the highest accuracy among all [18,22].
This paper aims to investigate the impact of different faults on COP and to analyse the performances of machine learning algorithms to detect faults across the centralised chilled water air-conditioning system. The performances of three classifiers were investigated in terms of six classes of faults. The detail of the research methodology used in this paper is explained in Section 2. It includes data collection, data classification, and pre-processing data procedures. The results are discussed and analysed in Section 3. Finally, Section 4 concludes the overall findings of this paper.

RESEARCH METHODOLOGY
This section describes the structure of the system and the research methodology involved in this work. The process of data collection, data classification, and data pre-processing were explained in this section.

Data collection
A lab-scale of chilled water system is used in this paper as described in [23][24][25]. It is a prototype of a chiller system that consists of a cooling tower, chiller, AHU and two test rooms. The chiller system has a chilled water tank to supply chilled water to the cooling coil of AHU. The cooling tower is designed as a  Figure 1. There were four types of sensors used to collect data from the prototype. There were temperature sensors, air flowrate sensors, water flow rate sensors, and current sensor. A total of 14 sensors were installed in the system, and the distribution of them is shown in Figure 1.

Data classification
Five types of faults were generated in this paper. The collected data were divided into six types of condition class. Class 1 was categorized as the normal condition in which every element works perfectly. Meanwhile, Class 2 was cooling tower fan faulty, Class 3 compressor failure and Class 4 supplied air damper stuck. Last but not least, Class 5 was supplied chilled water clogging and, Class 6 supplied air ducting leakage. The detail of all classes and the location of the fault is tabulated in Table 1. The fault locations are various across the whole system. All faults tested were shortlisted from previous studies and surveys conducted among air conditioning system contractors in Johor Bahru. The faults consist of abrupt and soft faults. The abrupt fault is easy to identify, however the soft fault is challenging to detect unless the degradation of performance is noticeable in terms of thermal comfort, equipment failure or excessive power consumption. Among all faults listed in Table 1, three of them were soft faults or degraded types of faults, whereas two were abrupt faults

Data pre-processing
All 14 sensors output installed in the lab-scale system was used as the input to the machine learning model. Firstly, all data were normalised using the min-max feature scaling method. It is to avoid features with higher range values influence more the accuracy of the training result. Data normalisation will equalize the data range as well as the variability of the data. The normalised data were then segmented for mean values for every 1min interval. The total of 75180 data for each class was combined with a dimension of 14

Simulation setup
The models of deep learning, support vector machine (SVM) and multi-layer perceptron (MLP) were built using WEKA toolkit [26]. As for deep learning model, the optimization algorithm used was the stochastic gradient descent (SGD) and the activation function for hidden layers was sigmoid and the output layer was softmax. Meanwhile, the kernel function set for SVM was polykernel, which is the best for HVAC's FDD [13]. Lastly, for MLP activation function used was sigmoid, the weight was set to 0.3, and the training time was 500 epochs. The hidden node used in this paper was ten as formulated in (1). Table 2 shows the parameter setting for the simulation.

RESULTS AND ANALYSIS
The first part of this section represents the energy consumption and COP of the system follows by the performance analysis of three classifier models. A total of 3760 instances were used to train the classifiers, 1610 instances to evaluate the model and 1344 instances to validate it. The accuracy and precision of each model were explained and analysed in detail. Accuracy is the number of relevant instances that have been retrieved over the total amount of relevant instances. While precision is the number of related instances among the retrieved instances.

Energy consumption and COP
Previous researchers have identified that faults in the air condition system could lead to energy wastage. During experiments, the energy consumption of the prototype system was logged to measure the performance of the system. The performance of an air-conditioning system is measured by the Coefficient of Performance or COP [4]. The COP can be calculated as in (2). Table 3 shows the energy consumption and the COP of the system recorded for an hour. The COP of the system reduced when faults were injected into the system. During the normal condition, the COP of the system was 3.38, but the performance degraded when the system had faults. Thus, it will lead to energy wastage in the long run if no proper action taken. Since the air-conditioning system is a complex system, therefore, it is crucial to have FDD system to monitor any abnormalities in operation.  Figure 2 shows the overall performance of deep learning model. The training model was able to classify all classes with an accuracy of 94% and a precision of 94.1%. Meanwhile, the evaluation and the 113 validation of the model were successfully obtained more than 93% for both accuracy and precision. Table 4 until Table 6 shows the confusion matrix for all training, testing and validating the model. The confusion matrix is widely used to represent the accuracy of the classifier [17]. It is used to indicate the correlation between results and expected classes. From Table 2 until Table 4, they show that Class 2 and 3 were among the lowest accuracy for this model.     Figure 3 shows the result of SVM. The overall accuracy of the training model increased as compare to the deep learning model. It managed to achieve up to 97% accuracy with a precision of 97.1%. It shows that SVM has better performance in classifying faults of the system compare to deep learning. The detailed accuracy of SVM was tabulated in Tables 7-9. The accuracy of Class 2 and 3 also increased tremendously as compared to the deep learning model. The classifying performance for Class 2 and 3 increased by 5% -6% than the previous model.   Table 8. Confusion matrix for the testing dataset   Class  1  2  3  4  5  6  1  291  0  0  3  7  1  2  5  274  7  2  14  0  3  1  4  189  0  7  0  4  0  0  0  200  0  1  5  0  0  0  0  302  0  6  1  0  0  3  6  292   Table 9. Confusion matrix for validating dataset of the SVM method

Multi-layer perceptron
As shown in Figure 4, the accuracy and precision of the model were 99.4%, which are the highest among the three models. Furthermore, the accuracy result for each class was not much different as compared to the previous models. The accuracy of each class is shown in Tables 10-12.   Table 11. Confusion matrix for 30% testing data    Figure 4. Percentage of accuracy and precision of training and testing data using the MLP model Figure 5 shows the overall performance for all the classifier models used in this paper. It is clearly shown that MLP has the best accuracy and precision to classify all six different classes introduced in this paper.

CONCLUSION
This paper showed that the system's COP would degrade into different values when different faults occurred in the air-conditioning system. Also, the performances of machine learning algorithms to detect and classify different faults have been investigated. Three algorithms were employed on the lab-scale prototype air-conditioning system dataset. The simulation results show that the MLP has the best accuracy and precision up to 99.4% than SVM and deep learning. The second most accurate classifier was SVM with correctly classified the data up to 97%.