Neural network-based pH and coagulation adjustment system in water treatment

ABSTRACT


INTRODUCTION
Drinking water treatment systems in aqueducts involve a chemical dosing process to control the key parameters that guarantee the quality of treated water; these processes present a correlation between their different variables, making it incredibly complex to set classical control strategies. Proportional, integral and derivative (PID) controllers and their variants, which are part of classical control, present many advantages at an industrial level, as explained Yu et al. [1] Abdullah and Ali [2]. However, they apply well to linear systems, whose models are feasible to establish. For systems that do not meet this condition, there are control strategies based on artificial intelligence [3], [4], called intelligent control techniques [5].
Intelligent control also allows to replicate PID strategies as [6] and [7]; however, it is more complex to implement. Therefore, it is used with non-linear plants in prediction systems presented Mohamed et al. [8] or satellite dynamic altitude control systems [9]. Intelligent control applications cover vast and diverse fields from academic settings to industrial processes such as those in [10]- [16]. Neural controllers have even been developed for security management in residential homes [17]. This has allowed systems as complex and delicate as water treatment systems to also choose intelligent control techniques [18]- [20]. In [21], [22], a recent review of state of the art is exposed where they highlight the use of neural networks as an essential technique in water treatment, where one of the critical parameters of the process is the control of pH as stated in [23]- [25].

561
At the ariari regional aqueduct (ARA) in Meta Colombia, it is desired to adjust the chemical dosing process in the treatment plant. pH is a critical factor in meeting national regulations because the high correlation between the intervening variables affects it. It is observed that the advantages of intelligent control based on neural networks can contribute to automating this process, which due to its complexity, has been carried out manually with the technique known as jar tests [26], which lasts between 20-30 min., making adjustment difficult.
This article is composed of four sections that expose the development of the work done. The first section is the present introduction. In section 2, the methodology where dataset and network architecture are shown. Section 3 presents the analysis of results. Finally, section 4 exposes the conclusions drawn.

METHOD
At the dosage required for drinking water treatment, surface water catchments and raw water physicochemical parameters can change suddenly. Which affects water quality if proper adjustments are not made to the process chemistry. Figure 1 shows the main stages that are an integral part of the development in the intelligent control model, which will be discussed in next sub-sections.

Data collection and pre-processing
In the drinking water treatment process is essential to make adjustments to pH stabilizer and coagulant immediately when variations occur to guarantee the water quality. For data collection, within the analysis of chemical dosing, the following essential input variables were identified: color, turbidity, pH, amount of qualified dosed, and amount of Aluminum sulfate type A dosed, and as output variables, color, turbidity, and pH. Empresa de servicios públicos del meta (EDESA S.A. E.S.P.) in ARA provided a set of 720 jar analysis reports that comprised 488 data vectors.
After the data collection stage, the data processing is started by performing the format assignment in the Excel database. Here, the number format was selected to facilitate loading the data to the modeling process later. The data preprocessing phase was about identification. As a first instance of the amount of lost or missing data, these incomplete or empty data sets are eliminated from the general dataset since these "empty" data affect the behavior of the data machine learning models.
The data collected for the variables of turbidity, color, pH, lime, and Aluminum Sulfate type A, contain values within a wide range and not uniform with each other, which is inherent to the process. Still, the learning model's training process automatically generates inconveniences of data dispersion that increase convergence times of the algorithm and sometimes give poor results. Therefore, scaling converts the data to a uniform range, normalizing original data to a range between 0 and 1; this is performed by maximums and minimums (1) without affecting the dataset and maintaining the proportionality of each data. As an example, Figure 2 shows the graph of the preprocessing for the variable turbidity (input), on the left initial purified data and on the right normalized. For training of neural networks, percentage distribution of the total data sample (488) was defined as in Table 1. The database is distributed in 3 sets for training and validation. Each sample corresponds to the tabulated information of the manual jar testing process.

Neuronal arquitecture
Chemical dosing is a stage with information on the values of the input variables and output variables of interest, so this problem must be adjusted to a supervised type of machine learning model, such as artificial neural networks. It is essential to clarify that the chemical dosing model is carried out in two stages. First, both pH turbidity and color are measured to adjust and determine lime and aluminum sulfate's initial level. The second stage network takes as input data the turbidity, color, pH, lime, and Aluminum Sulfate. As output variables, it takes the turbidity, color, and pH, so the neural architecture of the model to be implemented will follow the same dosage form. That is, two neural architectures, one model for each stage shown in Figure 3.  Figure 4 shows a general representation of the proposed models with internal architectures of the neural networks of models 1 (left) and 2 (right), respectively. The architecture of the neural network model 1 had 20 (twenty) neurons in the hidden layer, which is twice that of model 2, due to the amount and complexity of data used for the training process. The output of model 1 becomes inputs for model 2 according to the manual dosing process used. The number of neurons in the hidden layer of each model was selected by iterating five by five until the best possible approximation was obtained. Although the greater the number of neurons in the hidden layer, the greater the time required for training, validation, and testing of the model, a high computational load was not presented. A fixed number of epochs was used for each training trial, set at 1000 epochs in both neural network models. Figures 5 and 6 show the training, validation and testing of each neural network model. The value of R indicates the relationship obtained between output data and the target value, a value of R= 1 indicates a perfect fit, for the case under study, in model 1 the value of R was 0.81325 is a higher value. In model 2, the value of R was 0.96158, which also shows an excellent relationship between the values output and objectives.  1 and model 2), a dataset equivalent to 15% of the total available data was reserved. In Figure 7, the error is plotted against each training cycle of model 1. It indicates that the best performance was obtained at time 394 with an error of 0.022209, and the smaller the mean square error, the more approximate they are predicted and observed values.

RESULTS AND DISCUSSION
The algorithm delivers the results Figure 9 from the jar test in the machine learning model developed to calculate the dosage of lime and sulfate to treat drinking water at ARA. In other words, it is possible to determine the discharge of lime and sulfate required to adjust the pH level of the treated water. The neural model seeks to replicate the appropriate dosage levels according to the standard, according to the parameterization of the network.  Table 2. According to Colombian regulation, the water parameters for human consumption are color 0-15 platinum cobalt units (PCU), turbidity of 0-2 nephelometric turbidity unit (NTU), and pH between 6.5 and 9.0. Analysis of Table 2 shows that the values of the dosage of lime and sulfate obtained through RN are lower than those applied by the operators. Still, these generate a color and turbidity output within the norm and equal or improve the pH value. However, the parameter remains below the norm's values, an aspect that is adjusted by placing an offset to the minimum value obtained in predicting the neural model.

CONCLUSION
The performance of the algorithms of artificial intelligence in solutions-oriented to the industry was demonstrated with the coagulant and pH control system for water treatment developed through a neural model. The optimal dosage of sulfate and lime obtained by this method generated an output pH lower than 7.5 and output turbidity lower than 8 NTU. The treatment plant's output presents low pH problems, as could be evidenced in the data from the jar tests obtained since they are below the range suitable for human consumption. For this reason, the predictive model created from this data optimizes and standardizes the chemistry of the process. Still, it is necessary to correct pH in the jar tests and thus be able to update the database and retrain the model to correct output error. Then, it is conclusive that to make more efficient models that meet the requirements of drinking water treatment, more effective characterization data must be obtained.