Machine learning classifiers for detection of glaucoma

ABSTRACT


INTRODUCTION
Glaucoma is one of the major causes of blindness, which affects the optic nerves due to the abnormal high pressure in the eye, glaucoma may occur at any age but generally affects older adults. It must be treated at the early stage because the vision loss cannot be regained. An essential pathological characteristic in the progression of glaucoma is the increase in the size of the optic cup with respect to the optic disc. The optic disc, or optic nerve head, is the region in the retina where the optic nerve fibers, or ganglion cells, aggregate to form the optic nerve, which is connected to the brain. Within this optic disc is a depression known as the optic cup.
During glaucomatous progression, the death of the ganglion nerve cells leads to increased excavation of the optic cup, and a corresponding increase in the optic cup to disc ratio (CDR). The CDR is thus a vital indicator of glaucomatous neuropathy as it is indicative of actual pathological changes in the retina during glaucoma. The most common way to detect glaucoma is by finding CDR. Optic disk detection is the foremost thing in developing automated diagnosis systems.
A normal eye has an average CDR of 0.3 to 0.5. The CDR gives us the information about the different stages of glaucoma in an eye. The proposed system utilizes artificial intelligence to detect the presence of glaucoma in a human eye. Glaucoma can be detected by the analysis of CDR. The first part involves to build a machine learning model using a support vector machine (SVM) and K-means classifier and the rest involves building a convolutional neural network for the same. The use of machine learning models and convolution neural networks present at the backend of the designed web application makes it simpler to detect glaucoma [1]- [3] than the traditional methods which involve manual detection. The proposed autonomous system develops a complete decision support system for early glaucoma detection. Paper [4]- [6] It provides analysis, design, implementation, and testing of novel algorithms for a complete decision support system for glaucoma detection with powerful feature selection strategies, we have also performed extensive experiments to evaluate the performance of the proposed glaucoma detection system on real datasets and comparisons with existing state of the art techniques for performance assessment and provide a screening process which is easy to access to everyone.

METHODOLOGY
The overall methodology implemented is shown in Figure 1. The fundus image obtained from optical coherence tomography (OCT) [7] is split into red, green and blue (RGB) channels. Contrast limited adaptive histogram equalization (CLAHE) [8] filter is applied to increase the contrast adaptive to the image as it will make the cup and disc extraction easier. The mean and standard deviation (SD) of the pixel values are obtained. Red channel is used for the extraction of the disc and green channel is used for extraction of the cup, by determining the threshold value from the mean and SD values of the pixels, histogram analysis is done to perform thresholding. Then repeated morphological opening and closing operations are performed to reduce the noise in the image. Convex hull is adopted to join the contours which are found to be more effective. The contours of the obtained binary image are detected. The ellipse is drawn around the largest contour obtained. A bounding rectangle is drawn around the ellipse to obtain the major axis of the ellipse. Similar procedure is followed to obtain the diameter of the cup. TheCDR is obtained by dividing the length of the cup axis by that of the disc axis.

Non-adaptive histogram equalization
Histogram equalization normally improves the contrast of the image. Another important feature is that, even if the image was a darker image, after equalization we acquire almost the same image. As a result, this is used as a "reference tool" to make all images with the same lighting condition.

Contrast limited adaptive histogram equalization
The global contrast of the image in histogram equalization [9] in many cases is not considered to be efficient method, due to over-brightness most of the information is lost as histogram is not confined to a particular region, and in order to overcome this issue adaptive histogram equalization is proposed. In this method the image is divided into small blocks called "tiles" then each of these blocks are histogram equalized, in a small area, histogram would be confined to a small region (unless there is noise). In the presence of noise, it will be amplified. To avoid this, contrast limiting is implemented. If any histogram bin is above the specified contrast limit, those pixels are clipped and distributed uniformly to other bins before applying histogram equalization. After equalization, to remove artifacts in tile borders, bilinear interpolation is implemented.

a. Thresholding
From a grayscale image, thresholding can be used to create binary images. A threshold value of pixel intensity is chosen and when thresholding [10], [11] is applied all the pixels with values greater than this threshold are turned white and the rest of the pixels are blacked out. The OCT-drishtiGS_033 is depicted in Figure 2. To make the process adaptive to images an adaptive threshold value is computed by multiple trials and errors: the thresholds, T1 and T2 as depicted in Figure 3 and Figure 4 respectively is computed by the (1) and (2), where, T1 = threshold for segmentation of optic disc T2 = threshold for segmentation of optic cup m = size of Gaussian window σG = standard 25 deviation of Gaussian window σRI = std dev of the pre-processed red channel σGI = std dev of the pre-processed green channel μGI = mean of the preprocessed green channel It is normally performed on binary images. It needs two inputs, one is original image, and the second one is called a structuring element or kernel which decides the nature of operation. Two basic morphological operators are erosion and dilation. Then its variant forms like opening, closing and gradient can also be considered. Opening and closing operation is performed in removal of noise and closing is useful in closing small holes inside the foreground objects as depicted in Figure 5.
In this methodology convex hull using Sklansky algorithm is implemented [12]- [14], given a set of points in the plane, the convex hull of the set is the smallest convex polygon that contains all the points of it. The convex hull of a binary image is the set of pixels included in the smallest convex polygon that surround all white pixels in the input. OpenCV provides us with a convex hull using the Sklansky algorithm. Convex hull is capable of joining nearby contours to form a single contour blob, which might have been broken down because of unwanted noise or disturbance, this algorithm not only successfully joins the contours but also does not alter the size of the original blob unlike other morphological operations [15] as depicted in Figure 6 and Figure 7. Instead of the morphological operations [16], one of the methods adopted to improve the image processing mechanism is the convex hull approach [17] on the thresholded image. The idea is to join nearby contours to form a blob of the size of the optic cup, the methodology adopted is. − Define a function to measure distance between the two contours which are passed as its arguments. − Define a function which involves nested loops which is used to compare distances between each contour. − If the distance is lesser than the threshold defined, a convex hull is used to join the contours, else the smaller contour is ignored.

c. Contour detection
Contours is a curve joining all the continuous points (along the boundary) as depicted in Figure 8, having the same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition. In OpenCV, finding contours is like finding white object from black background. There are three arguments in cv. findContours () function, first one is source image, second is contour retrieval mode, third is contour approximation method. And it outputs the contours and hierarchy. Contours [18] is a Python list of all the contours in the image. Each individual contour is a Numpy array of (x, y) coordinates of boundary points of the object.

d. Ellipse fitting and bounding rectangle
A bounding rectangle is drawn around the contour. The rectangle is then rotated so that the area is minimum, an ellipse is drawn inside the rotated bounding rectangle joining the midpoints of each side as depicted in Figure 9. The major and minor axis can be obtained as the two sides of the rotated rectangle.

Design of the classifier
After processing the data, the underlying ground truth of the images are stored in a CSV file. The CDR values of the training set are determined and stored along with its label in a CSV file. The SVM [19] and K-mean clustering [14], [20], [21] classifiers are trained with the training data. The testing data are passed through the classifier to predict the label of the test data for the corresponding CDR values which is calculated for it. The errors in calculation of CDR are determined by subtracting the obtained values from the ground truth values provided by the dataset. The web application architecture developed is depticed in Figure 10. The users by using any of the browsers like Chrome, Mozilla Fox, Opera and Safari, can access the front end of the web application.

Cup detection
The adopted convex hull approach proved to perform better than the less effective morphological opening and closing [22], [23]. For instance, considering the image drishtiGS_6 as depticed in Figure 11, this has an experimental CDR value of 0.78, as depicted in Figures 11 (a) to 11 (c). The morphological opening generally dilates the image beyond the cup boundary while joining the contours. This generally results in a larger cup to disc ratio than the original, the computation of CDR value is depicted in Figure 12, unlike convex hull as per Figures 12 (a) to 12 (c) the CDR value with convex hull has lower value. Considering another data from drishtiGS_70 as represented in Figure 13, where the original CDR is 0.82 As there was a larger distance between the separated contours, the morphological erosion was having a greater impact than the dilation as depicted in Figures 13 (a) to 13 (c). Figure 14 shows the comparison of the CDR values when these small contours were removed upon repeated erosion. This resulted in a smaller CDR [24]- [26] than expected as depicted in Figures 14 (a) to 14 (c).

Disc detection
On observing an image of an eye as represented in Figure 15, it can be noticed that the diameter of the ellipse fitted onto the disc contour remains unchanged. However, the inside noise is cleared which does not really contribute to the CDR as per Figures 15 (a) to 15 (c). Thus, keeping time constraints in mind we can skip the convex hull on the disc, however it is a robust but a time-consuming method with time complexity O (n squared), where n is the number of contours in the thresholded image.
(a) (b) (c) Figure 15. Image of an eye; (a) region of interest, (b) disc with repeated morphology, and (c) disc with convex hull

Calculation of CDR, mean errors and standard deviation
The CDR values calculated from the model are compared with the ground truth values provided by the dataset. The dataset provides CDR values found from four different experiments. The difference between the obtained CDR values with each of those experiments is taken and then the average and the standard deviation of errors of all the images are calculated. The CDR values for each image is calculated. The images are shuffled to training and test data set, and CDR for both are calculated and compared to the four values by the experts which Drishti dataset has been considered. When the values are averaged, the standard deviation and mean errors are obtained. The CDR values of the training and the test images were calculated and stored in two separate CSV files. The CDR values of the training set along with the labels in the ground truth were used to train the model. The CDR values of the test image were passed through the model and the predicted labels were matched with the labels in ground truth. The SVM classifier [27]- [29] was used first and an accuracy of 0.8539 was obtained.
Then the same pair of CDR values of the training set and its labels were passed through the K-means [24]- [26] classifier with K=2 i.e. 2 clusters representing glaucoma and non-glaucoma sets. When the CDR values of the test data were passed through the same an accuracy of 0.7077 was obtained as depicted in Table 1. The model thus proved to have performed better when it was supervised; the Drishti dataset has provided us complete details on each of the 100 images which serve well for supervised learning. The above accuracies are obtained for a 2-dimensional classifier. The web application developed is dependent on the logical relations based on the CDR values obtained after processing the images. After entering the local address in the address bar of a search engine i.e., the homepage displays as depicted in Figure 16, upload the OCT obtained from the hospital through tomography. The website prompts users to enter patient details. Upon submitting, the report is displayed as represented in Figure 17, with the OCT features and report summary stating whether the patient has glaucoma.

CONCLUSION AND FUTURE SCOPE
The methodology implemented has the comparsion of the performance of SVM against a K-mean clustering algorithm from the CDR obtained using the individual methods. SVM proved to provide better accuracy when compared to K-mean clustering. Furthermore, the result implies lesser error range for the SVM method thus ensuring a better consistency of CDR determination. Convex hull approach was used to diagnose from the fundus image and classify it accurately. Thus, the severity of glaucoma is identified and detection is done at a very early stage. An alternate inexpensive and user-friendly web application is designed and developed for the screening process. In spite of being a cheap and effective way to detect glaucoma, there is always scope for improvement. One such drawback might be the slow working of the convex hull algorithm used to join contours. The application and the algorithms used to develop it are dependent on the OCT images which are photographed using well trained professionals and making use of tomography equipment available in the hospitals. By making use of proper lenses (preferably a concave) native camera it is still possible to click a photograph of the optic disc. However, the clarity and the precision of the photos might be off the mark, which requires a much robust method for processing such images to obtain the optic disc and cup.