River classification and change detection from landsat images by using a river classification toolbox

ABSTRACT


INTRODUCTION
Information on water bodies is essential for both geographical and remote sensing applications in several aspects. For examples, it can be used for city planning and administration in specific sites [1], coastal erosion detection [2], water encroachment detection [3], river change detection [4], and flood analysis [5]. In the past, acquisition of water bodies or rivers data required human resources for site surveys, inevitably causing great delay and waste of budget, and taking significantly long time for data collection and processing. However, with continuous advances in remote sensing and computing technologies, processing of these hydrological data and relevant management processes can be executed ever much quickly, with virtually no need for human intervention. According to recent literature on remotely sensed data acquisition and processing, in particular, those based on image analysis, it was established that Landsat images were the

RESEARCH METHOD
The main contribution of this paper is the development of an effective toolbox for river classification and its change detection. The toolbox consisted of six modular steps, i.e., data collection and preparation, water index calcuation, ML based river classification, accuracy assessment, and river change detection, and report presentation as shown in Figure 2. Each step is described in more detail the following sub-sections. River change detection algorithm in this paper shown in Table 1. Figure 2. Diagram of the proposed methodology and the developed toolbox. It consists of data preparation, water index calculation, ML based river classification, accuracy assessment, change detection, and data reporting. Those highlighted in grey indicate the novelties of this study

Data collection and preparation
Imaging data analyzed in this research were Landsat 8 and Sentinel-2 images, obtained from USGS website. Both radiometric and geometric errors in original images were first fixed by image correction. Then they were fed into Geoprocessing module, by which the locations and extents of studied areas were specified. Prior to processing these images by the developed toolbox, the ground truths of the rivers were prepared for accuracy evaluation and assessment. To this end, the corresponding LULC data were received from the Land Development Department of Thailand.

Water index calculation
Subsequently, water features were required for training and evaluating ML classification modules. Following previous studies, it was accepted that two of the most suitable water indices for river extraction were NDWI and MNDWI. Empirical results reported in the linterature have demonstrated accurte and successful usecases of the indices. In this study, they were calculated from Landsat 8 images, using a toolbox in ArcGIS Desktop. Essentially, this step was considered preprocessing prior to moving onto the next important steps. The expressions and references of theses indices are summarized in Table 2.

River classification using machine learning
Classification based on machine learning (ML) have widely been adopted in several similar studies. They may be categorized into unsupervised and supervised strategies. To determine the appropriate strategies for the present context, both categories were, therefore, benchmarked. For unsupervised classification, K-Means and ISODATA were used, while for supervised classification, MLC and SVM were considered. During 2 class's classification, each pixel in an image would be labelled either of river or non-river, depending on its water indices. While in the first category, no training data was needed, 1,000 points per classes were prepared for the second one, to create the respective supervised ML models. The detailed description and properties of each ML algorithm are described as follow.

K-Means
This algorithm is one of the unsupervised learning. With K-Means, objects are parted into K clusters, each of which is represented by its mean, or centroid. It is used to measure the distance between data within the same cluster. As the first step, the number of expected clusters (K) and their associated means must be initialized. Next, data clusters and their relationship are built, among those data points closest to the respective centroids. Subsequently, a centroid is recalculated, for each cluster, based on the newly updated membership. If a centroid of any cluster is relocated as a result, its membership is also reconsidered, based on nearest data points. These reciprocal updates are itearated, until no change is made on any of the K centroids. In the proposed model, K-Means included parameters as reveal in Table 3.

ISODATA
ISODATA is another unsupervised learning technique, mostly employed in applications involving satellite or aerial images. With this technique, wavelength distancing was considered along with several pixel classification repetitions, each of which consists of statistical re-calculation and re-classification. Once a user has specified the number of classes and their means, each pixel was assigned to a cluster based on its shortest distance. Unlike K-Means, the new centroids would then move toward the statistically rearranged data, so as to resolve K-Means limitation. This process is repeated until convergence is reached. In the proposed model, ISODATA included parameters as reveal in Table 4.

MLC
This ML is considered a supervised learning technique. It was widely used in processing of remote sensing data. With this technique, the probability of each data point being predicted as a class is determined by class mean and covariance. More specifically, given a probability distribution of all involving classes, decision is made for a given point being associated with any class, by comparing respective class probability in multi-class mixture models. Accordingly, in multi-spectral remotely sensed images, each pixel is classified jointly by the probability in specific frequency domains. In the proposed model, MLC included parameters as reveal in Table 5.

SVM
This supervised ML is specially designed for building a highly generalized classifier. Support vector machine (SVM) performs best with unseen dataset, both with low and high dimensionality. In the former case, adjustment was made to the classifier on the input space by trivial discriminant, while in the other case, it was made by more flexible adjustment, called kernel function, on the reducible feature space. Either case, the resultant is expressed as a set of optimal hyperplanes that jointly separate samples in different classes. Therefore, SVM is transparent and highly scalable, i.e., with respect to the complexity of the input data. In addition, unlike other MLs, data need not to be normally distributed, nor do they need to be sampled by large amount. Compared to MLs with similar computational complexity, SVM is reported to be much efficient and accurate, in many applications. In the proposed model, SVM included parameters as reveal in Table 6.

Accuracy assessments
In this paper, the accuracy of the river classification by the abovementioned ML techniques were assessed by comparing the resultant extractions against the manually digitized LULC data, provided by the Land Development Department of Thailand. To this end, four evaluation metrices, i.e., Accuracy, Recall, Precision and F-measure, were calculated. Then, the best performing with the highest measures was chosen for the river classification toolbox. The extracted rivers, at each annual period, would then be forwarded to the change detection module, described in the next sub-section. All other relevant accuracy assessment metrics are revealed in Table 7.

River change detection
On detecting river changes, the developed module was validated on Landsat imaging data acquired at the years 1999, 2006, 2011, and 2017. The changes between subsequent annual periods was determined by comparing extracted rivers at the respective years. This comparison was made using raster operations on their differences and then implemented in a toolbox. These changed areas, i.e., transitions, were characterized and reported on a map, as either from river areas to non-river ones, or vice versa.

RESULTS AND DISCUSSION
The experimental results reported in this paper are divided into three main parts, which are i) the toolbox development for river classification and change detection, ii) the river classification, and iii) the river change detection. For the first part, the toolbox was developed by using the model builder. Its screen shots are shown in Figure 3. The toolbox consisted of three functions, i.e., water index calculation, river classification, and river change detection. In the first function, water indices consisted of NDWI and MNDWI calculations. Likewise, river classification consisted of different ML algorithms, i.e., MLC, ISODATA, SVM, and K-Means. Finally, the change detection simply exploited built-in functionalities. Apparently, this toolbox could be extended and applied to other areas. By not operating directly on regular ArcGIS Desktop, a user is not required perform these functions by clicking and typing step-by-step commands, from the start to the end. Last but not least, using toolbox was straightforward and its data processing on studied areas was very fast. The results of the second part, i.e., river classification based on water index are reported in Figure 4. Therein, classification of different ML techniques, both supervised and unsupervised, are compared at four different annual periods. By visual assessment, it is evidient that MNDWI yielded more accurate results than NDWI, both by supervised and unsupervised technqiues. Therefore, MNDWI was chosen and its metrices were calculated, for each technique, and presented in Table 8 and Figure 5. According to these objective evaluations, it was found that the best performing algorithm in all four values (Accuracy, Recall, Precision and F-measure) was SVM, with accuracy = 96.89%, precision = 98.61%, recall = 96.59%, and F-measure = 97.59%. It was followed by MLC, ISODATA, and K-means, respectively. Therefore, this research employed MNDWI and SVM algorithms for river classification, and developed the river change detection toolbox, accordingly.   water bodies extracted at the years 1999, 2006, 2011, and 2017 AD, respectively. It can be clearly noticed from the figures that river changes in all considered areas took similar pattern. Specifically, there existed progressive erosion in all rivers and canal as they got narrower. Particularly, this was much pronounced for Phunphin canal than other areas. In summary, the developed change detection toolbox of rivers by processing their satellite images was found beneficial in various aspects. Not only that it was easy to use and getting used to, but the toolbox could also reduce manpower and time required for actual onsite surveys. As such, it could offer government agencies, sufficient information for, and an efficient means of planning, following up, and monitoring changes, so that they can prepare for upcoming geographical changes. Data obtained from change analysis are useful for predicting the likelihood of other relevant change, such as flooding and landslides. Compared to the previous river classification and change detection [22], it was found that this research has the strength to investigate changes in river regions without using step-by-step commands in ArcGIS. It also reduces the processing time.

CONCLUSION
This research presented river change detection using Landsat image processing, and development of a toolbox for that purpose. Both supervised and unsupervised classification ML were applied to water index, calculated from Landsat 8. The experimental results showed that river classification could be made with high accuracy. Based on the studied areas, the best performing algorithm was SVM, with accuracy 96.89%, followed by MLC, and others. River changes could also be detected in each particular period, from years 1999-2017 AD. It was evident in all considered rivers and cannel that they got narrow than in the past, due to riverbank having undergone erosion. Thanks to the developed toolbox, no manual intervention was required by a user, at any step in the whole process. It is also worth noted that, thanks to versatility of the ArcGIS platform, the toolbox can be extended and applied to other areas, subject to data availability. Since no other assumptions were imposed on the toolbox development, satellite images of higher resolution, for example, can be employed to gain higher classification resolution. The change detection proposed herein by comparing extracted water bodies of aligned data at different periods is straightforward and efficient. It can greatly reduce costs of onsite surveys. Thus far, the current study remained limited by only using data from NIR band. Future directions include predicting the likelihood of other related hydrological changes.