Improvement of transformer dissolved gas analysis interpretation using J48 decision tree model

Dissolved gas analysis (DGA) is widely accepted as an effective method to detect incipient faults within power transformers. Gases such as hydrogen, methane, acetylene, ethylene and ethane are normally utilized to identify the transformer fault conditions. Several techniques have been developed to interpret DGA results such as the key gas method, Doernenburg, Rogers, International Electro Technical Commission (IEC) ratio-based methods, Duval triangles, and the latest Duval pentagon methods. However, each of these approaches depends on the experts' shared knowledge and experience rather than quantitative scientific methods, therefore different diagnoses may be reported for the same oil sample. To overcome these shortcomings, this paper proposed the use of decision tree method to interpret the transformer health condition based on DGA results. The proposed decision tree model employed three main fault gases; methane, acetylene, ethylene as inputs, and classified the transformer into eight fault conditions. The J48 algorithm is used to train and developed the decision tree model. The performance of the proposed model is validated with the pre-known condition of transformers and compared with the Duval triangle method (DTM). Results show that the proposed model delivers better precision and accuracy in predicting transformer fault conditions compared to DTM with 81% and 69%


INTRODUCTION
Dissolved gas analysis (DGA) has been used extensively to assess the power transformer condition. The decomposition of paper and oil occurs due to high thermal and electrical stress on the transformer insulation system, producing gases that dissolve in the oil and decrease its dielectric strength [1]. The subsequent paper decomposition yields carbon monoxide (CO) and carbon dioxide (CO2). Hydrogen (H2), methane (CH4), acetylene (C2H2), ethylene (C2H4), and ethane (C2H6), on the other hand, are produced as a result of oil decomposition and the formation of faults [2], [3]. Every fault produces unique characteristic gasses that can be used to identify the faults and measure their severity [4], [5]. The low energy level, partial discharge, produces H2 and CH4 gases while the high energy level, arcing, can produce all gases including C2H2. On the other hand, high thermal fault produces gases C2H4 and C2H6 [1]. Thus, the nature of the fault can be determined based on Int J Artif Intell ISSN: 2252-8938  Improvement of transformer dissolved gas analysis interpretation using … (Norazhar Abu Bakar) 49 the type of gas generates. However, the analysis is not always straightforward because at the same time there is a possibility of more than one fault occurred [6]. On the basis of the DGA findings, various interpretation methods such as key gas method (KGM), Doernenburg ratio method (DRM), Rogers ratio method (RRM), International Electro Technical Commission (IEC) ratio method (IRM), Duval triangle method (DTM) [7], and Duval pentagon method (DPM) [8] have been established to determine the transformer's state. However, most of the aforementioned approaches use the DGA test statistics used by professionals to provide knowledge-based diagnostic recommendations, while other approaches are based on the theory of thermodynamics, which may not necessarily lead to the same conclusion for the same oil sample [9]. Several soft computing methods were suggested to remove these limitations in order to overcome those problems.
To eradicate these shortcomings, artificial neural networks (ANN) [10]- [12], support vector machine (SVM) [13], [14] and fuzzy logic (FL) [15]- [17] were introduced. However, each of these methods also has some limitations. ANN is time-consuming, requires a large number of data samples to train the network properly to achieve consistent efficiency, requires a lot of time to learn and is prone to overfitting [18]. On the other hand, developing fuzzy rules and membership functions is tedious, and fuzzy outputs can be interpreted in a variety of ways that make analysis become complicated. In addition to being computationally expensive and complex, the key issue with SVM is the selection of the right function kernel [19]. Various kernel functions give different effects. This paper proposed another machine learning method, J48 decision tree to interpret DGA findings which offers a relatively quicker and less complex algorithm compared to SVM. Additionally, the structure of J48 decision tree is more comprehensible compared to ANN architecture.

DECISION TREE
Decision trees are one of the most effective methods in data mining for creating multiple covariates classification systems or for designing predictive algorithms for a target variable. This method is frequently used in numerous applications since it is user-friendly, straightforward, and stable even when missing values are present. In the decision tree method, a population will be classified into branch-like segments that create an inverted tree with a root node, internal nodes, and leaf nodes as shown in Figure 1 [20]. A root node, also called a decision node represents a decision that will allow all records to be subdivided into two or more mutually exclusive subsets. The output of those decisions which do not contain any further branches is known as leaf nodes. Each leaf node symbolizes the mark of the particular class. On the other hand, several possible decisions available in the tree structure which connected between the root node and leaf nodes are called internal nodes.
Several decision tree algorithms like ID3, J48, CART, C5.0, SLIQ, SPRINT, random forest, and random tree have been developed for classification [21]. ID3, J48, and C5.0 algorithms implemented the topdown decision tree construction concept to obtain the output, while the CART algorithm is based on binary decision tree construction [22]. In this work, the J48 decision tree algorithm is chosen to classify the fault types of the transformer.
J48 decision tree or C4.5 algorithm developed by Ross Quinlan is an expansion of ID3 algorithm which allowed the target value of new test data to be decided with respect to the different attribute values of training data [20]. It improves the ID3 algorithm by dealing with both continuous and discrete attributes, missing values and pruning trees after construction. The J48 algorithm exploited a top-down greedy search through the given sets to test each attribute at every tree node [23]. As a supervised learning algorithm, a set of example data that consists of relationships between input objects and the desired output value is required to develop the J48 decision tree model [24]. This dataset will be used for training purposes. J48 decision tree induction methods begin with a root node representing the entire data set and separating the data into smaller subsets recursively by checking at each node for a given attribute. A root node is picked based on the highest gain values obtained among all attributes, while the splitting process is executed by considering the characteristics that are related to the degree of 'purity' in the dataset. This process is repeated until the subsets are "pure", whereas, all instances in the subset fall within the same class, at which time the tree growing is terminated. In the cases, where the stopping rules do not work well, then, the pruning process is conducted to decrease the classification errors [25]. Pruning is a process of removing the unnecessary nodes from a tree in order to get the optimal decision tree and also prevent the overfitting or underfitting rules been developed. The process of decision tree development using J48 algorithm is summarized in Figure 2.

J48 DECISION TREE MODEL
To establish a DGA interpretation model, a total of 500 data collected from various operating transformers under different operating, age and health conditions were used to train by the J48 decision tree algorithm. Instead of using all fault gases, the proposed model only concentrated on the three main gases; CH4, C2H2, and C2H4 as inputs attribute to interpret the transformer condition. These three gases are the same gases used in the DTM method. On the other hand, the output variable of the model that represents the transformer health conditions are classified into eight (8) categories as in Table 1.
The process of developing a DGA interpretation model is summarized in Figure 3. The process began by training a set of 500 transformers data with the known fault condition using J48 algorithm to obtain the decision tree model. These 500 datasets consist of all fault categories stated in Table 1. This training was performed using cross-validation with 10 folds procedure to increase the effectiveness of the proposed interpretation model.   Table 2. It can be seen that the proposed interpretation model developed was successfully classified each transformer fault type at an average of more than 80% except for D1, which a bit lower. After the best possible decision tree model has been achieved, an additional 100 datasets of known transformer fault types are used to evaluate further the performance of the proposed model. The proposed interpretation model must succeed at least 80% accuracy in classifying the overall transformer fault types before its ready to be used. Otherwise, the model will be modified and the training process is repeated until it succeeds 80% of prediction accuracy. From 100 datasets, the proposed model is able to correctly classified 81 of transformer faults as shown in Table 3, which equivalent to 81% of accuracy, hence surpassing the minimum requirement that has been agreed.

RESULTS AND DISCUSSION
In this section, the performance of the proposed interpretation model is compared with the DTM (as shown in Figure 4), which recognized as the best interpretation technique by industries so far. Although the latest improvement of DTM method is available, DPM, however its only works as a complementary to existing DTM, and does not replace it [8]. To evaluate the performance of both methods, another set of 65 transformers data with known fault conditions were used to examine the prediction accuracy. The confusion matrix is employed to analyze the performance of both methods in classifying the fault types. The confusion matrix is a table that reports the number of True positive (TP), True negative (TN), False positive (FP), and False negative (FN) which permits the visualization of classification accuracy and the performance of the method. The following are definitions of those terms: i) TP: Cases in which correctly predicted Yes ii) TN: Cases in which correctly predicted No iii) FP: Cases in which predicted Yes, but actually is No iv) FN: Cases in which predicted No, but actually is Yes.
The precision, recall, and F-measure are performed to examine the classification performance. The precision is to quantify the number of positive class predictions that actually belong to the positive class, while the recall will quantify the number of positive class predictions out of all positive examples in the dataset. On the other hand, F-measure provides a single score that balances both the concerns of precision and recall in one number. In the meantime, the accuracy of a classifier is referred to the probability of the method correctly predicting the actual fault of the transformer. The precision, recall, F-measure, and accuracy can be computed as:  Table 4 and Table 5 show the confusion matrix obtained for DTM and J48 Decision tree model respectively for 65 datasets of the transformer. According to Table 4, the DTM was successfully diagnosed 42 out of 65 cases, while the remaining cases were wrongly classified. On the other hand, the J48 model gives a better prediction with 53 out of 65 cases were correctly classified as shown in Table 5. Further analysis is shown in Table 6, whereas the precision, recall, F-Measure, and accuracy for each fault class are been analyzed. Based on Table 4, it is noticed that the DTM is precisely classified the actual T2 fault (correctly classified 6 out of 6). However, it also frequently misinterprets other faults as T2, hence reducing the recall and accuracy of DTM in classifying T2 fault. In contrast with T1 results, although the DTM only manage to correctly classified 6 out of 11 cases, however there is only 1 case where DTM is wrongly predicted. Therefore, the accuracy of DTM in classifying T1 fault is higher than T2. From results, it also noticed that the most truthfully classified by DTM is T3 with F-measure and accuracy are 0.79 and 0.65 respectively. The overall accuracy for DTM in classifying the fault types is only 40%.
On the other hand, the proposed J48 decision tree model has an average of 81% precisely classified fault types. The most precise class predicted by the J48 model is NF with 100% (4 out of 4) correct and followed by T3 with 92% (11 out of 12). Different from DTM, the J48 model generates more consistent interpretation results whereas the average recall achieved about 83%. The lowest recall is given by T2 whereas it is wrongly classified two cases as T2, which suppose to be T1 and T3. In the meantime, the proposed J48 model shows better accuracy compared to DTM with 69%.

CONCLUSION
This paper proposes a J48 decision tree model to interpret the transformer fault types based on the dissolved gas analysis data. The proposed model has been developed using a set of transformer historical data with the pre-known health condition. Three fault gases, CH4, C2H4, and C2H2 are selected as inputs to the model and interpreted the transformer into eight fault classifications. The performance of the proposed model is evaluated using another sixty-five datasets and compared with the Duval Triangle method. Although the proposed model shows superior performance to DTM, however its accuracy can be improved further by considering more DGA samples during the training phase. Besides that, adding other fault gases such as H2 and C2H6 also have the potential to enhance the model accuracy. However, by doing so, it may also increase the tree size and introduce overfitting issues if not considered carefully. ISSN: 2252-8938  Improvement of transformer dissolved gas analysis interpretation using … (Norazhar Abu Bakar) 55