An efficient convolutional neural network-based classifier for an imbalanced oral squamous carcinoma cell dataset

ABSTRACT


INTRODUCTION
With growing availability of large scale of unstructured and complex data required for prediction and classification functions, it has been a critical task to extract summarised information to support decision making.Data analysing tools and knowledge discovery techniques have exhibited tremendous success in several real world applications such as recommendation systems, financial market analysis, customer review analysis and many more.Despite the success history, some data groups fail to address the predictive analytical problems.
One of the reasons behind such failures for decision making is the class imbalance dataset.The model which is trained for such data is tuned more towards the majority samples.Hence, processing such skewed data often produces biased results.It has been reported in the literature [1], [2] as a crucial factor in training the imbalanced data.Most classifiers assume equal distribution of individual class instances.Hence, when these algorithms are presented with imbalanced datasets, they lack generalization and exhibit poor performance metrics.Past studies highlight the implications of binary imbalanced datasets in biomedical applications [3].Most often, real time data collected in the health sector suffer from such a problem.Due to the significant difference in number of instances of individual classes, machine learning (ML) algorithms tend to exhibit  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 487-499 488 inappropriate results [4].Sometimes, the performance measures of the classifiers guide towards misleading conclusions out of the model behaviour.For example, consider a dataset with class distribution as 20%:80%.It means that for one class (positive class) the number of sample instances is 80 and that is 20 for the other class (negative class).Even if the classification model results into 90% accuracy, the model won't be considered good because the negative class instances are projected as positive that enhances the false positive metric of the model.Though logical, it is an undesired consequence [5], [6].
Skewness in class samples is also very pervasive in many data mining applications namely text classification [7], risk management, detection of oil spills in satellite radar images of ocean surfaces, medical diagnosis, the detection of fraudulent calls, and spam mail recognition.Class imbalance problems are addressed by many techniques out of which two ways are mostly reported in literature [8].One is to undersample the majority class instances [9], [10] and the other one is to generate synthetic data from minority class tuples.In [9], a technique synthetic minority oversampling technique (SMOTE) is proposed that generates new samples from existing samples of minority class; i) The major contributions of the research article are as follows; ii) Employ oversampling to reduce the difference in class frequencies of data samples; iii) Set up a model by properly setting the hyperparameters for effective binary image classification; iv) Evaluate the model using performance measures like precision, recall, and area under curve; v) Apply the model for two different imbalanced medical image datasets; vi) To confirm the statistical significance of the classification model using McNemar test.
Remaining part of the paper is comprised of six more sections.Section 2 describes related work collected from existing literature.Objectives of the work are stated in section 3. Section 4 deals with basics of convolutional neural network (CNN) and proposed methodology.Data collection and processing are presented in section 5. Results and discussions are elaborated in section 6.At last, section 7 concludes the study with possible future scope.

RELATED WORK
For this study, different research article databases namely Science Direct, IEEE Xplore, Springer and Web of Science have been searched.Specifically, browsing is based on keywords like 'Classification for oral squamous cell carcinoma (OSCC) dataset', 'Data augmentation for image', and 'machine learning for imbalanded image dataset'.Current study focuses on recently published research articles based upon machine learning algorithms for imbalanced medical image datasets.Other cited documents has been referred to discuss the efficiency of machine learning tools in various domains, performance measures of the classifiers, and data sampling applications for imbalanced datasets.Summary of all the referred papers that employ some form of deep learning methods for imbalanced datasets is framed in Table 1 [1]- [35] (see in Appendix).It provides the literature summary table that includes the synopsis of all the related works considered in this study.

OBJECTIVES
The summary table of related works point out the application of several deep learning and data augmentation techniques adopted for imbalanced medical image datasets.However, the efficiency of those models is bounded upto 92% in terms of F1 score and 95% in terms of area under curve (AUC) respectively.The main objective of our study is to minimize the failure rate in classification for class imbalance dataset.By inspctful hyperparameter tuning, the proposed binary classifier reduces both false positive and false negative rate to nearly 0. In this work, a customized convolutional neural network is presented to classify OSCC images with 99% accuracy.The performance of the model is confirmed against a statistical McNemar's Test.Data collected for the study suffers from the disproportionate class sample distribution problem which has been overcome by data augmentation techniques, the outcome of the proposed model may assist the health experts in the detection of oral squamous cell carcinoma.The proposed model exhibits promising classification results compared to the existing state of the art models.

METHODOLOGY
Advancements in computing power and algorithm efforts have led to the tremendous ability of deep learning techniques in analysing medical images [25]- [28].These computer assisted findings can be used as an alternative cross verification tool for pathology tests by healthcare professionals.Deep learning [30], [31] methods have been adopted in different domains for the task of object detection, image segmentation, image classification and so on.In contrast to traditional machine learning algorithms in which features are extracted computationally, CNN helps the data analyst by automatically drawing out those.Nevertheless, feature map is also reduced significantly.The standard process of CNN is depicted in Figure 1 The second layer comprises of convolution with 64 filters.In both of these layers pool size has been taken as 2×2 and activation function is rectified linear unit (ReLU).In Algorithm 1, the procedure for predicting the disease of oral cancer is presented.

Algorithm 1: Oral cancer prediction technique using CNN
Augment the images /*so that the gap in the number of positive and negative image samples becomes negligible*/ 2.
Resize the images of size (x*y*z) to (x'*y'*z) where x' < x 3.
Normalize both the set of images 5.
Initialize the parameters of the proposed model 6.
Train the model with training data 7.
If  Usually, amongst many activation functions, researchers prefer ReLU because it does not perform expensive computations and in practice, shows better convergence performance.After the CNN operations, the processed image pixels are flattened and fed as the input to the ANN layer.Finally in the output layer, the sigmoid activation function is used to classify the image into either normal class or carcinoma class.

RESULTS AND DISCUSSIONS
This section presents the results obtained from simulation and their interpretations.Literature on data mining mention different performance metrics of classifiers like accuracy, sensitivity(recall), specificity, precision, F1-score, confusion matrix, receiver operating characteristic (ROC)-AUC, Log-loss and so on.Mathematical formula for some of these performance metrics are provided in (1) to (7).Some common terminologies used in classification are listed in Table 2. Depending on the problem statement, the meaning of positive is decided.For example, in the given problem, detection of carcinoma cells is considered as positive.
TP means an image is originally has carcinoma and is predicted as also carcinoma.Similarly, if an image does not have carcinoma and is predicted as non-carcinoma, then it is treated as TN.On the contrary, if an image has carcinoma but is not predicted as carcinomous, then it is considered as FN0.FP means the actual image does not have carcinoma but predicted as carcinomous.Hence, accuracy is computed as the ratio of total number of correct predictions and total number of predictions.
The recall does not include information about the FP cases.It only finds the ratio of TP and total number of actual positives.It indicates how good the model is in detecting all the TP cases.It is also referred as sensitivity (same as TPR).Specificity (same as FPR) is defined as the proportion of actual negatives, which got predicted as the negative.Another important metric is the ROC curve.Basically, it is used for inspecting the output quality of a binary classifier at different threshold settings.This curve is plotted against two parameters: TPR (shown in Yaxis) and FPR (shown in X-axis).In some literature, it is also suggested to take other parameters along X-axis.An example of ROC curves is shown in Figure 6.As observed from the figure, there is a huge gap between training accuracy compared to validation accuracy though the rate of loss in training data is satisfactory and accuracy is nearly 0.9.Due to the presence of data imabalance, the classification performance is poor in terms of FPR.The poor classification performance can be observed from the ROC plot which is depicted in Figure 9(a).The confusion matrix representing the FP, TP, FN, and TN is presented in the form of a heatmap in Figure 9(b).
To overcome the model overfitting, an augmentation technique has been applied using ImageDataGenerator class available in Python library.The class expands the datasets by transforming images through various transformation techniques.After the image augmentation, the model could classify the subjects with full accuracy that can be validated from Figure 10.Due to data augmentation technique, the proposed model is trained with a balanced dataset which removed the biased outcome of the model.Figure 10(a) demonstrates the ROC plot drawn for the augmented dataset 1 using FPR and TPR of the proposed classifier.The confusion matrix generated for the classifier is depicted in Figure 10 It can be seen that the AUC for the ROC is ideal compared to that of the un-augmented data.The confusion matrix for augmented dataset 1 is outlined in Figure 10(b).Similarly, for un-augmented dataset 2, the ROC and confusion matrix are portrayed in Figures 11(a) and 11(b) respectively.ROC and confusion matrix for the augmented dataset 2 has been delineated in Figures 12(a) and 12(b) respectively.Furthermore, the authors depict the results through bar plots for clarity in visualization.Figure 13(a) showcases Accuracy, FPR, and FNR comparison for both unaugmented and augmented data considered in dataset 1.Similarly, Figure 13(b) illustrates Accuracy, FPR, and FNR comparison for both unaugmented and augmented data considered in dataset 2. It can be observed from both plots that even if there is a thin difference between corresponding accuracies, a remarkable gap is present in FPR and FNR.In the healthcare decision-making process, false predictions are very hazardous.The statistical significance of the proposed model is confirmed by McNemar's test.This test is applied on 2×2 contigency table.In this article the contigency table is the confusion matrix which stores the discordant pairs.In health domain, the rate of false predictions is as decisive as of true predictions.A model may not be considered as a worth one even if it gives 90% correct predictions as because, a significant number of false predictions could be fatal.McNemar's test is applied to determine the probability of difference between false positive and false negative predictions.The chi-square distribution is used to compute whether the row and column marginal frequencies are equal for paired samples.For the test, null (H0) and alternate (H1) hypotheses are defibed as follows: H0: There is no significant difference between the marginal proportions of discordant pairs.H1: There is a significant difference between the marginal proportions of discordant pairs.McNemar's test statistic is computed as follows: , where b and c are the discordant pairs from the confusion matrix.The degree of freedom is computed as (2-1)*(2-1)=1.The study considers 5% significance level for the test after reference to the literature.P-value for the test is obtained as 0.012 which is less than 0.05.Hence, the null hypotheses is rejected.The inference from the statistical test is there is marginal differences between the discordant pairs.

CONCLUSION
The proposed classifier implements a convolutional neural network to categorise medical images into either of the two classes: diseased and normal.The output of the classifier will extend an additional affirmation regarding the presence of carcinoma cells in the oral cavity.Datasets containing the images are skewed because the number of samples of diseased subjects is in multiples of that of normal subjects.Hence, the performance of the classifier was impoverished which is demonstrated through ROC and confusion matrix plots.To bring diversity and quality into the data different transformations such as rotation, scaling, shifting and flipping are applied to oversample the minority class instances.After application of the transformations, modified datasets are again applied to the model for training which enhances the classification performance to the superiority.Nevertheless, the suggested CNN model involves less complexity and time efficient as only two layers of  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 487-499 496 convolution and pooling have been employed before flattening the input.This model also exterminates the use of cloud services as software as a service.The study has not performed any experiment on the augmentation technique for multiclass/non-binary class problems.This objective may be explored in the future extension of the current research work.

Figure 5 .
Figure 5.Samples of augmented images the harmonic mean of precision and recall.The confusion matrix is a two-dimensional array in which the cells indicate TP, TN, FP, and FN cases.It helps in estimating what way the model is correct or wrong.The confusion matrix for binary class problem is depicted in Table3.

Table 2 .
List of acronyms used for classification performance metricsBut, accuracy is not an appropriate performance indicator for a model trained with imbalanced data.Precision is another metric which determines out of all predicted positive cases how many are actually positive.It is useful in problems where FP cases are to be reduced.

Table 3 .
Confusion matrix for binary class problem

Table 4 .
Description of dataset

Table 1 .
Summary table of related work An efficient convolutional neural network-based classifier for an imbalanced … (Usha Manasi Mohapatra) 497

Table 1 .
Summary table of related work (continue)