Comparison between fuzzy kernel k-medoids using radial basis function kernel and polynomial kernel function in hepatitis classification

Received Feb 20, 2020 Revised Dec 15, 2020 Accepted Jan 21, 2021 This paper compares the fuzzy kernel k-medoids using radial basis function (RBF) and polynomial kernel function in hepatitis classification. These two kernel functions were chosen due to their popularity in any kernel-based machine learning method for solving the classification task. The hepatitis dataset then used to evaluate the performance of both methods that were expected to provide an accurate diagnosis in patients to obtain treatment at an early phase. The data were obtained from two hospitals in Indonesia, consisting of 89 hepatitis-B and 31 hepatitis-C samples. The data were analyzed using several cases of k-fold cross-validation, and the performances were compared according to their accuracy, sensitivity, precision, F1-Score, and running time. From the experiments, it was concluded that fuzzy kernel k-medoids using RBF kernel function is better compared to polynomial kernel function with the 6% increment of accuracy, 13% enhancement of sensitivity, and 5% improvement in F1-Score. On the other side, the precision of fuzzy kernel k-medoids using polynomial kernel function is 2% higher than using the RBF kernel function. According to the results, the use of RBF or polynomial kernel function in fuzzy kernel medoids can be considered according to the primary goal of the classification.


INTRODUCTION
Hepatitis is a severe health problem and one of the leading causes of death across the globe. According to the global hepatitis report 2017 [1], approximately 257 million people were living with hepatitis B and 71 million with hepatitis C in 2015. However, in Indonesia, the prevalence of clinical hepatitis was estimated at 0.6% in 2007 [2]. These kinds of viral hepatitis tend to become chronic, thereby causing more deaths. Therefore, the prevention of viral hepatitis, as stated by Hou et al. [3], consists of behavior modification, passive immunoprophylaxis, and active immunization. Earlier prevention of viral hepatitis is also estimated using various machine learning techniques, which were expected to help patients take treatment in the earlier phase of the virus, thereby stopping it from being amplified [4].
Some researchers have published the use of machine learning in hepatitis classification [4][5][6][7]. In this paper, fuzzy kernel k-medoids is used to develop hepatitis classification to make it more accurate in providing a diagnosis. The kernel technique that was introduced by Vapnik [8] and later developed by Scholkopf et al. [9], and Christianini [10] will be used in fuzzy kernel k-medoids to overcome the ISSN: 2252-8938 Comparison between fuzzy kernel k-medoids using radial basis… (Glori Stephani Saragih) 61 possibilities of not separable linearly data set. Fuzzy kernel k-medoids have been previously used in problems related to anomaly detection [11] and multiple data detection such as breast cancer Wisconsin, diabetes, image segmentation, iris, and much more [12]. Furthermore, the machine learning method based on the kernel has previously been used in diagnosing several diseases and deliver excellent accuracy [13][14][15][16][17]. The kernel function is useful to avoid misclassifying the dataset with a spherical shape which is only solved by a linear function.

RESEARCH METHOD 2.1. Material
The hepatitis dataset, which was also used by Kurniawan and Rustam [18], was obtained from Tangerang and Mitra Keluarga Kelapa Gading Hospitals, consisting of 89 hepatitis B and 31 hepatitis C samples. Each sample is described by features such as gender, serum glutamic oxaloacetic transaminase (SGOT), serum glutamic pyruvic transaminase (SGPT), anti-HCV, HBsAg, urea, and creatinine. All of these features will be used in the process of classification.
The membership value is updated using the formula in (2) and the medoid is calculated as the formula in (3).
The algorithm of fuzzy kernel k-medoids [11] is given in Figure 1.  This method utilized the RBF and polynomial kernel function. The RBF kernel mostly used because of its simplicity that has fewer hyperparameters. The number of hyperparameters used in the kernel usually influences the complexity of model selection [21]. Meanwhile, polynomial was also one of the kernel functions that commonly used mainly for the lower polynomial degree, because the infinite degree of a polynomial has the same form with the gaussian RBF kernel [22] the polynomial kernel has more hyperparameters than the RBF kernel. The formulas [23] are shown in (4-5), respectively. RBF kernel function: Polynomial kernel function:

Research methodology
The k-fold cross-validation [24] will be used in this paper for evaluating the fuzzy kernel k-medoids algorithm. For example, when we used 3-fold cross-validation, the data is divided into three folds for each class. Therefore, we get the number of points in every fold, as shown in Table 1. The k-fold cross-validation for classification tasks using fuzzy kernel k-medoids might be unfamiliar due to its utilization that commonly used for clustering or unsupervised learning [25] methods in machine learning. In this fuzzy kernel k-medoids, a fold was used to obtain the centroids of the clusters according to the algorithm in Figure 1. In contrast, the rest k−1 folds were used to evaluate the method by determining the class of every data point according to its nearest centroid. Consider the data labeled hepatitis B belongs to class 1 and the data labeled hepatitis C belongs to class 2. If the data point was nearer to the centroid of class 1, then the predicted class for this data point is hepatitis B. Meanwhile, if the data point was nearer to the centroid of class 2, then the predicted class for this data point is hepatitis C.

RESULTS AND ANALYSIS
The performance of fuzzy kernel k-medoids is evaluated using k-fold cross-validation in which k = 3, 5, 7, 10. However, this research makes use of RBF and polynomial kernel function with several kernel parameters and polynomial degrees examined. The performance of fuzzy kernel k-medoids using RBF kernel function is shown in Table 2.
According to Table 2, the kernel parameter = 0.0001 performs excellently in every performance measurement of each cross-validation. However, the highest value of accuracy, sensitivity, precision, and F1-Score of this kernel parameter are obtained when 7-fold cross-validation is used. The performance of fuzzy kernel k-medoids using polynomial kernel function is shown in Table 3.

ISSN: 2252-8938
Comparison between fuzzy kernel k-medoids using radial basis… (Glori Stephani Saragih) 63   Table 3 shows that the tenth polynomial degree almost achieves the best performance in every cross-validation. The results are more complicated in the 7-fold cross-validation because the highest value of every performance measure is obtained in a different polynomial degree. However, further analysis shows the fourth polynomial degree as the best performance following the values and the measurements. Therefore, fuzzy kernel k-medoids using RBF kernel function of σ=0.0001 and fourth polynomial kernel function are compared, as shown in Figure 2. If we analyze Tables 2-3 further in comparing each of its highest value, we can conclude that fuzzy kernel k-medoids using RBF kernel function is better compared to polynomial kernel function with the 6% increment of accuracy, 13% enhancement of sensitivity, and 5% improvement in F1-Score. On the other side, the precision of fuzzy kernel k-medoids using polynomial kernel function is 2% higher than using the RBF kernel function. Based on this figure, it is concluded that fuzzy kernel k-medoids performs better when using RBF than polynomial kernel function. The comparison shows that RBF makes fuzzy kernel k-medoids performance to become more excellent in accuracy, sensitivity, and F1-Score. On the other side, the polynomial degree makes fuzzy kernel k-medoids better in precision. The RBF kernel function performs better in these three measurements and in running time. As shown in Table 4, the fuzzy kernel kmedoids using RBF kernel function is faster in running time than the polynomial kernel function used in every evaluation method.

CONCLUSION
Early detection of hepatitis is expected to help patients to obtain proper treatment, considering this disease as one of the crucial causes of death worldwide. There are several types of hepatitis; however, most found cases are hepatitis B and hepatitis C. Therefore, this paper proposed the use of the fuzzy kernel kmedoids using RBF and polynomial kernel function for the hepatitis classification. Data were obtained from two hospitals in Indonesia, consisting of 89 hepatitis-B and 31 hepatitis-C samples. According to the experiments, it is concluded that RBF using σ=0.0001 delivers better performance than the fourth polynomial kernel function in the fuzzy kernel k-medoids. Furthermore, the comparison shows that the RBF kernel makes fuzzy kernel k-medoids performance improve in accuracy, sensitivity, and F1-Score. On the other side, the polynomial degree makes fuzzy kernel k-medoids better in precision. Even though the proposed method in this paper already delivered excellent performance, the other methods with some technique to obtaining balance data can be used as future work to obtain a better, more accurate, and precise diagnosis.