Development of option c measurement and verification model using hybrid artificial neural network-cross validation technique to quantify saving

Received Nov 1, 2019 Revised Jan 10, 2020 Accepted Jan 23, 2020 This paper aims to develop a hybrid artificial neural network for Option C Measurement and Verification model to predict monthly building energy consumption. In this work, baseline energy model development using artificial neural networks embedded with artificial bee colony optimization and cross validation technique for a small dataset were considered. Artificial bee colony optimization with coefficient of correlation fitness function was used in optimizing the neural network training process and selecting the optimal values of initial weights and biases. Working days, class days and cooling degree days were used as input meanwhile monthly electricity consumption as an output of artificial neural network. The results indicated that this hybrid artificial neural network model provided better prediction results compared to the other model. The best model with the highest value of coefficient of correlation was selected as the baseline model hence is used to determine the saving.


INTRODUCTION
In recent years, measurement and verification (M&V) has become the popular method in determining energy saving. M&V is the process of using measurements to reliably determine actual savings created in relative to the baseline energy. This process is important in energy saving field in order to quantifying saving. The M&V process involves modeling, metering and sampling activities which create uncertainties in reporting of energy savings. It is important to precisely consider the accuracy and develop an accurate M&V methodology [1][2]. In M&V, the development of a baseline energy model is one of the important steps to determine the relationship between energy consumption and independent input variables. Thus, the baseline energy model was used to develop and estimate the adjusted baseline energy pattern hence to determine the savings.
There are several established protocols and guidelines for performing M&V of energy savings. Among all the guidelines, International Performance Measurement and Verification Protocol (IPMVP) is the most prominent and widely used M&V protocol. IPMVP is a support document clearly describes the common practice in measuring, computing and reporting savings achieved by energy or water efficiency projects at end user facilities. It presents M&V principles, IPMVP framework and explanations on common M&V issues. IPMVP provides four measurement options to evaluate the saving [21] according to their area of application, Option A, B, C and D where Option A: Key Parameter Measurement, Option B: All Parameter Measurements, Option C: Whole Facility and Option D: Calibrated Simulation [3].
Recently, a study of the literature relating to the M&V has become popular all around the world. One of the initiatives was the study of energy and demand impact of the steam feed pump refurbishment and high-pressure turbine re-blade in coal fired power station aimed to increase efficiency in South Africa [4]. Meanwhile in Brazil, M&V concept was applied to evaluate energy efficiency project by replacing electric shower with solar water heating system [5]. In China, [6] reported that energy saving can be generated from power line transformation project. Recent study by [7][8] involved a groups of building in United States to study the impact of building retrofitting projects to energy consumption as well as energy saving. A number of authors from Malaysia have applied M&V to study the impact of energy saving in the commercial building as well as educational building [9][10]. Most of the widely used M&V application were energy efficiency lighting retrofit projects to improve the efficiency of the lighting system and to reduce the energy consumption [11][12][13][14]. According to IPMVP, the baseline energy development has been identified as one of the important and crucial steps in M&V. Key aspects in developing the baseline energy resulting in reporting energy savings are accuracy and uncertainty [3]. To date, mathematical model using linear regression is the most common method used in formulating the baseline energy model [10,15]. Nonetheless, linear regression only suitable for linear relationship and may contribute large error for non-linear data.
Artificial Neural Networks (ANN) has been applied in several works to replace the linear regression method. ANN is one of the popular techniques for forecasting that imitates the operation of human brain. It has been used to solve various engineering problems [16][17][18][19]. In order to increase the performance of ANN model, hybridization of ANN with various optimization techniques to automatically find the optimum ANN parameters as opposed to the trial and error technique were introduced by some researchers. By doing this will lead to a better ANN performance accuracy and save time for experimenting. Optimization techniques is one of an artificial intelligence method that have been widely used by researchers [20][21]. To get the best result, large training data is needed as ANN learns from examples. When large data set are easily available, there is no problem to split the data into training, validation, and testing sets in ANN. However, there are some situations where the measured data are very limited, expensive or difficult to find. In such cases, the allocation of the available data to train, valid, and test is the main challenge to build an accurate ANN prediction model. The ANN model is sensitive towards the inputs and outputs data, where fewer number of inputs and outputs data may reduce its accuracy. Too small data may not be able to train the network properly and may not be able to evaluate the network performance accurately [22].
Option C data were derived from the monthly utility bills and usually, only a small dataset is available. Therefore, the available data were insufficient to train the network and predict the energy consumption. In view of the above mentioned facts, several sampling techniques were studied to increase the accuracy of small data and the most common techniques used is cross-validation (CV) [22][23][24]. This study focuses on the development of Option C baseline energy model and the hybridizetion of ANN with ABC optimization. Cross Validation (CV) is integrated with this hybrid artificial neural network (HANN) to get a better accuracy of ANN prediction. This method may avoid any overfitting of the data. Overfitting creates the network to memorize training patterns, but they cannot generalize well to new data (testing set) and generates poor accuracy. This chapter is organised as follows: Section 2 briefly explains the proposed Option C M&V HANN model including baseline model development and saving calculations. Section 3 discusses the result of the proposed methods. Finally, Section 4 provides the conclusion.

RESEARCH METHOD
As savings cannot be directly measured in M&V, the savings can be determined by comparing the measured energy used before and after ECM implementation. Figure 1 shows the energy use during baseline period and post retrofit period. The baseline period is the time before the retrofit installation while the post retrofit period is the interval after installing the ECM. According to the IPMVP, to properly calculate savings using M&V, the baseline energy model is first developed to determine the relationship between energy use and independent variables using regression analysis. The independent variable is a parameter that is expected to change regularly and have impact on energy use. To fairly compare the energy use before and after the ECM implementation, the variable conditions in baseline and post retrofit period must be similar to some extent. In such case, the baseline energy model is needed to adjust the baseline energy to the same variables condition as in the post-retrofit period. In other word, the baseline energy model is used to estimate how much energy would have used if there had been no retrofit implementation. This estimation refers to the adjusted baseline energy in the post-retrofit phase. This adjusted baseline energy is compared with the energy  The development of Option C M&V HANN Model was divided into two phases, 1) M&V Baseline Energy Development phase and 2) Post-retrofit Saving Calculation phase. In the previous study, the baseline energy model for Option C was developed using ANN with CV resampling technique [25]. In this study, an improved baseline model was concerned by using the selection of best methods in [25] which were 6,9 and 20 number of neurons in hidden layer.

M&V Baseline Model Development
In the M&V baseline energy model phase, the baseline energy model was developed using CV resampling techniques since limited data were available for this study. The CV resampling techniques were introduced to improve the prediction accuracy of Option C. These techniques also overcome the problem of overfitting, to check the model robustness and generalisation abilities for a small data application in predicting energy consumption for Option C. ABC optimization was applied to develop the HANN baseline energy model. ABC was embedded with CV to train the network to optimise the synaptic weights and biases and predict the baseline energy consumption. A step-by-step flowchart of the HANN-CV model development is shown in Figure 2. Once all the setting has been determined, the CV resampling techniques were applied to split the data set into training, validation, and testing sets. Then, the ABC optimisation technique was executed and fitness for each set of data was evaluated by calling the ANN programme, where ANN was trained to maximise the fitness which was coefficient of correlation (R).
ABC is also a swarm-based optimisation technique, proposed by [26]. It is inspired by the foraging behaviour of bees to find the optimal solution. Generally, ABC optimisation is composed by four main phases, which are initialisation, employed bee, onlooker bee, and scout bee phases. In addition, the proposed HANN-CV technique was implemented using the following steps: 1. In the initialisation phase, the ABC control parameters were prescribed. There are three control parameters: the colony size, the food number and the maximum cycle number. The number of parameters to be optimised, D was based on the number of neurons in the hidden layer. In this work, the optimised parameters are the number of synaptic weights and biases. 2. ABC randomly generated initial population (foods) which are initial weights and biases using (1). The initial population was evaluated by calling the ANN programme, where ANN was trained to maximise the fitness which is the coefficient of correlation and calculate the fitness values as (2).
Where is the initial population or initial foods (current candidate solution), the parameters to be optimised, is the lower bound of the parameter, is the upper bound of the parameter, is the Where , is the new candidate solution, , is the current candidate solution, . is the neighbouring candidate solution, and ∅ , is a random number between −1 and 1. 4. Then, the fitness was evaluated for each set of food by calling the ANN programme. Greedy selection was applied between the current candidate solution and the new candidate solution. If the fitness of the food source of the employed bee better than the current candidate solution, the solution is replaced with the new candidate and the trial counter is reset. 5. In the onlooker bee phase, the new candidate solutions were produced according to (3) depending on the probability, as in (4). The probability was selected using Roulette Wheel selection mechanism. Then, Greedy Selection was applied between the new candidate and the current candidate to select the better solution.
6. The position of the best food source was memorized and recorded. If the position cannot be improved or a predefined limit, then the food source is abandoned. 7. Therefore, in the scout bee phase, to discover the abandoned solution and replace it with the new solution, the scout bee randomly searched using (1). Then, the ANN was called and evaluated for a new solution. The best fitness value and food source were recorded. 8. The process continued until the maximum cycle number was reached. The iteration will stop executed when all the datasets have been trained and evaluated. In this case, the networks run for 5 times due to the 5-fold of CV dataset were created. Then, the average values of all performance functions were calculated and saved. The R performance of HANN-CV and ANN-CV [25] in all baseline energy models were compared. The higher R indicates the strong correlation between the targeted and the predicted output, was selected as the best model and used for predicting the adjusted baseline model in the post-retrofit saving calculation phase as well as determining energy saving

Applying HANN model for determining energy savings in post-retrofit
In this phase, the post-retrofit data were used to determine the adjusted baseline to quantify savings. In principle, M&V quantifies energy savings by comparing energy consumption before and after the retrofitting process. The energy consumption after the retrofitting process is known as the post-retrofit energy consumption. The post-retrofit input data were loaded into the HANN-CV baseline energy model to predict the output. The predicted output is known as the adjusted baseline energy. Savings in terms of energy avoided were determined from the differences between the adjusted baseline energy and post-retrofit energy consumption as in (5).
Where is the energy avoided, is the adjusted baseline energy, and is the post-retrofit energy consumption.

Data collection
For Option C, the baseline and post-retrofit data were obtained on monthly basis from the Facility Management Office, Universiti Teknologi Mara (UiTM), Shah Alam, Selangor, Malaysia, except CDD. The CDD data were obtained from Malaysian Meteorological Department. The whole dataset of Faculty of Electrical Engineering, UiTM is presented as in Table 1 with the minimum, maximum, and mean values.
The data were divided into two types: 1) 23 monthly energy and independent variables baseline data in 2012 to 2014 and 2) 20 monthly post-retrofit data from 2014 to 2016. Three input variables were measured in developing the baseline energy model: working days, class days and cooling degree days. These parameters were assigned as ANN input and the targeted output for the baseline which was the monthly electricity consumption.

RESULTS AND DISCUSSION
This section explains the result of baseline energy model using CV resampling techniques which were incorporated with ANN and ABC. The HANN and ANN methods from the previous paper were compared at the end of this section. The performance between the targeted and predicted output were evaluated and compared to find the most accurate method. The model with the highest values of R was selected as the most accurate baseline energy model and used in the post retrofit phase to calculate the adjusted baseline energy model, hence to determine the energy saving.

Baseline energy model development results
The network configurations with 6, 9, and 20 neurons in the hidden layer were applied to HANN-CV. In this study, ABC was implemented in ANN and evaluated together with the CV method to avoid trial and error method and increase the accuracy of the energy baseline model. For each selected neuron in the hidden layer, five iterations of training, validation, and testing sets were performed, and the performance of each fold was measured performance evaluation functions. The average Rtest and Rall values for each subsample is tabulated in Table 2. From Table 2, the average Rtest and Rall values for all subsamples were above 0.93, indicates a close match between the measured and predicted energy consumption during the testing and overall training processes.
In order to clearly show the overall performance of ANN-CV and HANN-CV with the combination of 6, 9, and 20 neurons in the hidden layer, these models were compared to each other as shown in Table 3. In comparison to ANN-CV method, the average R values of HANN-CV obtained better results in terms of accuracy and robustness. This research finds out that the resampling technique with HANN, was able to produce better result and predict a more accurate energy consumption than the methods with ANN-CV. The values for all average coefficient of correlation for HANN method were greater than 0.86. These results indicate the HANN method with limited data available avoid network overfitting and produce a very high prediction accuracy model. On average, HANN-CV model with 6 neurons in the hidden layer have higher average accuracy compared to the other two models. Even though the R_valid values were lower than the other values, it was still acceptable and met the IPMVP requirement. The capability of ANN to learn and predict is supported by the degree of acceptability of the training, validation, and testing sets performance.

Applying hybrid ANN model for determining energy savings in Post Retrofit
In order to quantify energy saving, the adjusted baseline model need to be developed. Therefore, the HANN-CV model with 20 neurons in the hidden layer was applied to the post-retrofit data for Option C to develop the adjusted baseline model. The M&V timeline for Option C is graphically shown in Figure 3 period. The graph shows the actual energy consumption for the baseline and post-retrofit, the HANN-CV predicted values for the baseline and post-retrofit for the given timeline. The predicted values for post-retrofit period was also known as adjusted baseline. From the figure, a bigger gap was present between the adjusted baseline and actual consumption in the post-retrofit period. This gap represents the energy saving obtained, the difference between the adjusted baseline and energy consumption for the post-retrofit period. Therefore, the energy savings obtained for 20 months was 1,149,491.56 kWh ± 0.48% at 95% of confidence level with 39.95% of energy savings. The relative precision was computed using a t value from the normal t-distribution table. The uncertainty presented in the savings complied with the requirements by the IPMVP which the standard error of the baseline value should be more than twice.

CONCLUSION
The application of HANN-CV for modelling the baseline energy for small dataset was presented in this study to improve the learning accuracy of a small dataset problems in the prediction of energy consumption of Option C. The ABC optimisation was used and embedded in this method to find the optimum values of weights and biases to enhance the prediction of the baseline energy model.
The presented results in the previous section shows that the resampling techniques are capable to train the neural network even with limited data available. Apart from that, the results show that the baseline model with HANN performed better than ANN in terms of the predicting ability and accuracy. The predicted values obtained by HANN model corresponded closely to the measured values (targeted output) and quite satisfactory correlation where the average was more than 84% for training, testing, and validation sets. The most appropriate and accurate method for Option C is HANN-CV model with 20 neurons, where the percentage of all Rs were 97.85%, which can be considered very highly correlated.
This approach proves to be a very promising alternative to the Option C and this proposed method can improve learning performance significantly when working with small dataset.