Deep learning based object detection in nailfold capillary images

ABSTRACT


INTRODUCTION
Locating the occurrences of objects in an image or a video is typically performed through machine learning or deep learning algorithms to provide satisfactory performance. Deep learning is used in various classification problems which includes data collection, pre-processing, augmentation of data, extraction of features and matching. Humans are able to identify objects of interest in a matter of moment when they look at images or video, often by looking at its qualitative features. Object detection aims to realize this capability automatically through a computer. The quantitative features of capillaries are measured and these features coupled with other qualitative features are used to differentiate between diseased and healthy images.
Recent methods comprise acquisition of high resolution images by using nailfold videocapillaroscope whereas image processing and interpretation of low quality images procured by economical hardware is a challenge. Devices that have been employed to assess the capillaries at nailfold region include stereo microscopes, ophthalmoscopes, dermatoscopes, video capillaroscopes and digital traditional microscopes [1]. Ophthalmoscopes are economical options that are generally available and can be utilized with nominal training. Nailfold capillary viewing is conducted through dermatoscopes which are portable devices having moderate cost. Stereo microscopes have the advantage of moderate cost and ease of use, but could be very challenging to be used on subjects having joint contractures. A videocapillaroscope is very expensive and is available only at large clinics, but is accepted as the gold standard for nailfold capillary imaging. The proposed algorithm employs a universal serial bus (USB) microscope to capture capillary images with a low-cost hardware and measures quantitative features of capillaries, which humans would not be able to do with the naked eye. These quantitative features coupled with other qualitative features are used to differentiate between diseased and healthy images, through deep learning-based architectures.
The paper is arranged in the following manner: section 2 deals with nailfold capillaroscopy (NFC) and its applications. Section 3 elaborates on object detection algorithms and section 4 explains the proposed object detection algorithm for features extraction in NFC images. Section 5 presents the results and discussion, while conclusion is presented in section 6.

NAILFOLD CAPILLAROSCOPY
NFC is a very safe and convenient approach for measuring capillary morphology present closer to the skin called proximal nailfold (PNF) and helps to evaluate microvasculature related abnormalities. The normal PNF capillary density is 7.63±1.12 capillary/mm in healthy Indian adults [1]. Capillary changes such as meandering capillaries, tortuosity, and microhemorrhages are observed in considerably many healthy individuals. Table 1 shows the different morphological changes observed in the capillaries. This provides a standardized approach to measure parameters and morphological abnormalities in the capillaries. Bushy capillary Small, multiple buds originating from the distal loop 6.
Diffuse micro hemorrhages Multiple micro-petechiae present in groups 8.
Avascular area Two or more adjacent capillaries absent 10.
Bizarre capillary Morphology not conforming to predefined morphologies Jee et al. [2] tries to extract qualitative and quantitative parameters of nailfold capillary images by smartphone-dermatoscope in connective tissue disease-interstitial lung disease (CTD-ILD). In addition, they also aim to evaluate the relation of nailfold parameters with clinical variables in ILD and CTD diagnosis. Appearance of giant capillaries indicate scleroderma spectrum disorders [3]. Micro-hemorrhages that are caused due to collapsing of giant loops along with bushy capillaries and capillary loss are predominantly observed. Sivasankari et al. [4] presents the microvascular abnormalities and disease severity in psoriasis, which is essential for therapeutic study of treatment. Paper [5] focuses on measuring capillary dimensions in fingers and toes.
USB digital microscopy is used as onychoscopy [6] which performs similar to nailfold videocapillaroscopy (NVC) in detecting microcirculation abnormalities in CTDs. In this work, various abnormalities of capillaries are looked at and their association with disease activities like systemic sclerosis (SSc), dermatomyositis (DM) and systemic lupus erythematosus (SLE) are explored. Elongated capillaries are featured in SLE than SSc, while prominent sub-papillary plexus favored SLE compared to DM and SSc.
As per the findings Cutolo et al. [7], NFC is considered very reliable for rheumatology and although it is suitable for daily use, care should be exercised for interpretation. Raynaud's phenomenon (RP) is a significant feature of many autoimmune rheumatic diseases and related with many coronary diseases [8]. Normal nailfold capillaroscopic pattern shows uniform disposition of the capillary loops while one or more alterations is observed in secondary RP patients. Architectural disorganization, giant capillaries, hemorrhages, avascular areas, loss of capillaries, and angiogenesis are observed in more than 95% of SSc patients.
Roon et al. [9] considers RP patients to comprehend patterns in capillary morphology and its correlation to abnormal pulmonary function tests. Classification of patterns consisted of normal, SSc and non-specific. Among these non-specific subjects, 17% were found to have a normal capillaroscopic pattern, 48% had SSc pattern while 35% had non-specific pattern. Paper [10] evaluates nailfold capillary variations in SLE patients and determines the mapping of nailfold capillary changes to disease activity. Capillaroscopy has garnered increase in usage by physicians internationally to discriminate between primary and secondary RP as shown in [11]. It is widely utilized to predict the progression of disease and monitor the effects of treatment. Thus, standardizing the acquisition process and analysis of capillaroscopic images is crucial. This paper offers algorithms for acquisition and analysis of capillary images along with various capillaroscopic approaches, characteristics and scoring systems.

OBJECT DETECTION
Humans are able to identify objects of interest when looking at video or images, often by looking at its qualitative features. The aim of object detection is to attain this intelligence in a computer and program it to identify objects automatically. Nitkunanantharajah et al. [12], imaging of nailfold capillaries normal controls and SSc patients is performed and compared through optoacoustic imaging (OAI). The vascular volumes differ between the two cohorts. Artificial neural network was trained to determine how sensitive OAI is to the anatomical differences that occur in the capillaries. The model employs transfer learning to classify images having an area of 0.897 under the receiver operating characteristic (ROC) curve. Sensitivity and specificity of 0.783 and 0.895 respectively were obtained and the capability of raster-scanning optoacoustic mesoscopy (RSOM) as an imaging tool for SSc is established. This also proves the use of the proposed algorithm for in-depth study of disease progression.
Paper [13] provides a non-invasive approach centred on nailfold capillary image which exploits the optical characteristics of white blood cells (WBCs) and the proposed automated algorithm is not cost effective for detecting and counting WBCs. In this work, deep learning-based capillary segmentation algorithm is proposed along with video stabilization, and WBC event detection. The proposed algorithm provides better performance of WBC event counting, compared to conventional approaches.
Suma and Rao [14] connected component labeling based capillary density computation is proposed for identifying rarefaction of capillaries. Avascular region is detected through measuring the distance between the peaks of the capillaries. Hariyani et al. [15] proposes a nailfold capillary segmentation approach based on dual attention U-net architecture (DA-CapNet). This integrates a dual attention module for capturing better feature maps from input images. The proposed algorithm outperforms adaptive Gaussian algorithm, SegNet and the original U-net in terms of intersection over union (IoU), precision and recall.
In CapillaryNet [16], the capillary density is quantified and red blood cell velocity is computed from videos obtained from handheld microscope. This also measures several novel microvascular parameters like capillary hematocrit and intra-capillary flow velocity heterogeneity. The system analyzes skin microcirculation videos from COVID-19, acute heart disease, and pancreatitis patient groups. The proposed system excels from existing capillary detection systems as it is merges the accuracy of convolutional neural networks (CNNs) with the speed of traditional computer vision algorithms.
Paper [17] aims to compute the velocity of red blood cells by employing two methods. First method employs analysis of intensity of pixels for detecting object and for computing velocity of red blood cells and displacement in a capillary. In the second method, artificial neural networks are employed and a stochastic approach is done. U-net is utilized to detect capillaries, while GoogLeNet or ResNet to extract features. Red blood cells velocity is approximated by engaging the long short-term memory network. An accuracy of 0.96 was obtained for mean velocity approximation.
Berks et al. [18] propose a system to extract quantitative biomarkers from NFC through a layered machine learning approach. This method gives statistically significant differences between patients with potentially life-threatening SSc and those with benign primary RP. After decades of arduous research work, artificial intelligence (AI) has now reached significant break-throughs, permitting computers to outperform comprehension of medical images by humans in very specific areas. In a review conducted, Stoel [19], a brief explanation of various AI approaches is provided and a demonstration of usage of these for rheumatological imaging, specifically with rheumatic arthritis (RA) and SSc as examples.
U-net [20] is proposed for segmentation of medical images obtained through magnetic resonance imaging (MRI), computerized tomography (CT) scan, microscopy and X-rays. Paper [21] propose a novel model for detecting potholes and vehicles, using deep learning. This model employs Inceptionv2 and faster region-based CNN (R-CNN) and performs better than single shot detector (SSD) and you only look once (YOLO) with an improvement by 5%.
In the analytical study of [22], a comparison of various models such as MobileNetv2, ResNet50 and faster R-CNN for detecting accident vehicles is provided. ResNet50 model fared better than all the other models considered. Jabir and Falih [23], CNN is employed to identify weeds in wheat fields and an intelligent system is proposed for spraying herbicides locally. The system is operated in real-time on Raspberry TM and employs object detection. Identifying of leukocytes through CAD3, YOLOv2 and CNN is proposed Abas et al. [24]. CAD3 is found to be the most efficient in leukemia cases with a higher accuracy of 94.3%.

OBJECT DETECTION AND FEATURE EXTRACTION
In the proposed algorithm, loacalization of capillary loops at the nailbed is performed through object detection and classified into 5 classes namely, "normal", "wide", "elongated", "tortuosity", and "hemorrhages". Through exhaustive research on relevant literature and discussion with clinicians, a chart is prepared to define the criteria for classification of capillaries wherein the quantification for each class is mentioned. The chart is presented in Table 2.
The "normal" class is not specified in Table 2 and this new class was added by taking into considera-tion the classes. As "Rarefaction" and "Avascularity" classes do not describe the capillaries, but indicates the lack of capillaries, these classes are combined to a new class named "normal". This class together with the remaining four classes describe the total number of capillaries in an image. Capillary density is determined through the classes "normal", "wide", "elongated", "tortuosity", and "hemorrhages" alone. For "Avascularity" class, the distances between consecutive capillaries is measured and compared with 500µm as in Table 2. Deep learning is employed effectively for object detection through scoring fusion and shape indexing [19], [20]. Following are the steps involved in the proposed object detection and feature extraction algorithm: i) create training and test dataset for object detection; ii) create object detection network; iii) train detector and evaluate performance; iv) perform inferencing on-chip; and v) statistical model for combining features.

Create training and test dataset for object detection
A custom dataset is created for object detection, in which the data was collected from M S Ramaiah Medical College, Bangalore with the required approval of the ethics committee. The dataset comprises of 600 training images (split among healthy, diabetic and hypertensive subjects) belonging to the age group of 18-70 years. The distinction between healthy, diabetic and hypertensive images does not matter in the training process as the dataset is not used for classification, but for generating scores as explained in step 5. No preference is necessary with respect to gender as the capillary features vary only with disease progression and not across genders. A USB digital microscope with the following specifications is used to capture images of the nailbed: AG-Ptek iT33 make having magnification upto 200x and resolution ranging from 640480 to 1,6001,200. Care is taken during acquisition not to allow light reflections to cover the capillaries or hemorrhages. Training images were manually annotated through the rules specified in Table 2 for ground truth labels and bounding boxes for all the five classes of capillaries. For the testing phase, a dataset of 205 images was considered. Size of all images is 640480 [25].
The dataset size is increased through data augmentation of the training set by using the following transformations: i) modification of saturation, hue and contrast; and ii) random horizontal and vertical flips. The original dimensions of the image were retained and images were not scaled. This is to preserve the crucial information of capillary dimensions which in turn helps in classification of capillaries. Data augmentation improves the performance of object detection by providing the network with random images obtained by transforming ground truth data during training. The bounding boxes and the images obtained through annotation process also needs to be augmented. For an impartial evaluation with no biases, test set is not augmented. The test images are representative of the ground truth data and are left unmodified. Figure 1(a) presents a sample image with the bounding boxes around capillaries while Figure 1(b) depicts data augmentation of the image in Figure 1

Create object detection network
YOLOv3 network [26] is considered for object detection and SqueezeNet [27] for feature extraction. SqueezeNet was chosen because of its deep compression feature resulting in a small model size of about 0.5 MB, which makes it easier to deploy the model on hardware. Another advantage is the low bandwidth requirement when the model is to be exported from the cloud onto the hardware.
Given the small dataset of 600 images, cloud-based training is performed as and when new data is acquired and added to the dataset which makes the model more robust. Two detection sub-networks (or two detection heads) are considered to predict bounding boxes at two different scales. The second sub-network is twice the size of the first sub-network and this enables detecting smaller objects namely shortened capillaries and micro-hemorrhages.

Train the detector and evaluate performance
Before training the object detector, the augmented training dataset is pre-processed to suit the network architecture. The images and their bounding boxes were resized to 2272273 and pixel values were normalized to 1. The YOLOv3 detector, as shown in Figure 2, takes into account anchor boxes annotated in ground truth images. This enables accurate prediction of the bounding boxes through better initial priors of the dataset [28].
In each cell, 6 anchor boxes as in Figure 3 are estimated with a common centre using the k-means clustering algorithm as this accommodates a variety of capillary shapes and sizes and achieves a good tradeoff between mean IoU and number of anchors. A mean IoU higher than 0.5 is preferable. YOLOv3 uses these anchors to create offsets and predict bounding boxes through the governing equation in (1) that determines the offsets. Here, tx, ty, tw and th are the outputs of the neural network and are transmitted through a sigmoid function  so as to obtain the centre coordinates of the predicted bounding box. Pw and Ph are the dimensions of the anchor box and the dimensions of the predicted bounding boxes are obtained by multiplying with an exponential term.
The resized images with modified the capillary measurements are resized back to original dimensions after training and object detection. This requires several hyper-parameters namely learning rate, epochs, mini batch size, optimizer and regularization to be defined. A small mini-batch size of about 8 or 16 is considered with large number of epochs of 100 to 200 to train a model with better performance. In addition to this, other hyper-parameters are set as following. Learning rate-0.001, optimizer-stochastic gradient descent with momentum (SGDM), L2 regularization-0.0001. Figure 4 depicts the training progression of the model, where it depicts the change in total loss and learning rate with respect to iterations.  For medical applications, standard performance metrics like sensitivity (recall) and specificity (precision) are employed. A plot of precision vs recall is depicted for the five classes of capillaries individually, from the performance obtained on the test set. This curve provides information on precision at a specific level of recall. It is preferred to have a high recall so that all capillaries are detected and the operating point is set at the highest possible value of recall so that the corresponding precision is not small. This provides a reasonable trade-off between recall and precision.

Detect objects using inferencing on-chip
The next step is to test the model on images from a real time clinical scenario, which are unseen by the model. The images are captured through an USB digital microscope interfaced with NVIDIA GPU Jetson Nano for real time inferencing with additional light emitting diode (LED) display and user interface. Azure cloud platform is utilized to provide cloud connectivity to the device. A storage is provided on-chip to retain images and other important patient details.

Statistical model for grading diseases
Clinicians and researchers are interested in grading the severity of disease progression. The outputs from the object detector (the scores for each class mentioned in Table 2) are treated as input variables to a mathematical relation mapping them to an output U as in (2), which is employed for grading of various diseases.
The input variables R, A, T, H, E and W denote the classes pertaining to rarefaction, avascularity, tortuosity, haemorrhages, elongation and wide capillaries respectively, with the output Ψ ( ) ∈{normal, diabetic, hypertensive}. A generalized exponential curve is fit to the input variables based on the scores obtained from ground truth data. Based on Table 2, a set of scores {0, 1, 2, 3} is assigned to each of the six features. As a value of 0 is not supported by the exponential expression, the scores are incremented by 1 which results in Ψ( ) ∈ {1, 2, 3, 4} for all the features. In (3) gives the exponential relation between the input and output variables.
where is the coefficient and , , , , and are the exponents of the input variables. The values of these coefficients are obtained by framing 7 equations which are derived from ground truth data. The derived  [10] then the subject is considered healthy, if U ∈ (10, 20] then it is diabetic and if U ∈ (20, 30] then it is hypertensive. Grading the severity of diabetes is performed based on where U lies in the above-mentioned intervals. There are two advantages to the proposed approach. Firstly, it is disease-independent and can be used for any other disease with vascular involvement (such as arthritis and systemic sclerosis). Secondly, grading of a number of diseases can be done using the same device with just a few adjustments to the exponential relation given in (3), without the need to train the model repeatedly for different disease classes. Here, the grading of two diseases i.e. diabetes and hypertension is demonstrated.

RESULTS AND DISCUSSION
The training images were pre-processed and seven anchor boxes were estimated with a mean IoU of 0.71 before training. The estimated anchor boxes were considered in YOLOv2 and YOLOv3 architectures and after training, the models are evaluated initially on the training dataset, followed by a manual evaluation on the test dataset. The performance of YOLOv3 and YOLOv2 architectures in terms of accuracy for the five classes are tabulated in Table 3.  a) to 5(e) shows the variation of precision with respect to recall for the 5 classes of "wide", "normal", "elongated", "tortuosity" and "hemorrhages" respectively. These plots are for the object detection performance on the training dataset and for the best performing model which is YOLOv3. Object detection performance of a class is better if the area under this curve is large. "tortuosity" class is the easiest to detect and has a mean precision of 0.99 as the features of this class is quite distinct compared to other classes.
The operating point of the model is where the recall of the model increases with decrease in precision ("normal" class has an area under curve of 50%). In "normal" class, recall is considered significant over precision as high recall value indicates accurate estimation of linear density and avascularity. The mean precision values for other classes are relatively higher. Since manual evaluation is performed on the test dataset, a sample of the output of object detection is presented in Figure 6. Figure 6(a) shows a sample raw image while Figure 6(b) depicts the detected labels, followed by Figure 6(c) giving the confidence scores. Figure 6(d) and Figure 6(e) shows the computed values of dimensions namely capillary width and length respectively. All capillaries are detected accurately as detection capability depends on recall and accuracy depends on precision.
This example depicts the detection of two qualitative features namely "hemorrhages" and "tortuosity" and other capillaries are detected to be "normal". The quantitative features are extracted from distances between capillaries and the dimension of bounding boxes. This example has a linear density of 6 capillaries per mm, and avascularity score of 0 as the distance between two consecutive capillaries does not exceed 500µm. Other quantitative features such as length and width of capillaries are shown in Figure 6(e). The physical length of one pixel as seen by the USB microscope is 2µm.
There are a small but significant number of images that have poor contrast and the capillary visibility is very poor. The performance of the object detector is adversely affected due to these images and bounding boxes are not as accurately predicted as shown in the example. Poor contrast ensues from a combination of two factors, the capability of sensors in a cheap and simple USB microscope and the hardthickened skin that some individuals have in the nailbed region. Although high-end microscopes are expensive, it would help acquire better quality images with good contrast.
The performance of the proposed statistical model obtained through the testing phase of the algorithm for 205 images is tabulated in Table 4. Binary classification between healthy and diabetic subjects was considered because of the small sample size of hypertensive images (30 train and 10 test images). The statistical model in (1) and (2) is used for computing the score and images are classified as either 'Normal' or 'Diabetic'. is also used to grade the severity of the disease through further experimentation and validation.

Pranav Nanda
is undergraduate student in the Department of Electronics and Communication Engineering at Ramaiah Institute of Technology, Bengaluru. His field of interest are image processing and artificial intelligence. He can be contacted at email: pranavnanda9@gmail.com.

Manisha Shetty
is undergraduate student in the Department of Electronics and Communication Engineering at Ramaiah Institute of Technology, Bengaluru. She is keenly interested in image processing and deep learning. She can be contacted at email: manishamshetty14@gmail.com.

Vikas Mallikarjuna Swamy
is undergraduate student in the Department of Electronics and Communication Engineering at Ramaiah Institute of Technology, Bengaluru. He is keenly interested in embedded system design and artificial intelligence. He can be contacted at email: snmvikas@gmail.com.

Kushagra Awasthi
is undergraduate student in the Department of Electronics and Communication Engineering at Ramaiah Institute of Technology, Bengaluru. His areas of interest include machine learning and artificial intelligence. He can be contacted at email: kushagra.awasthi33@gmail.com.