Segmentation atrioventricular septal defect by using convolutional neural networks based on U-NET architecture

ABSTRACT


INTRODUCTION
Congenital 10 of 1.000 births in nearly all countries [1].CHD has been estimated to be 32-40 thousand per year, in which the birth rate is 4 million per year in Indonesia.The venous pole defects can cause some abnormalities, such as atrioventricular defect (AVSD), atrial defect (ASD), and ventricular defect (VSD) [2].Still, AVSD is a relatively common congenital heart defect, caused by a hole that allows communication between all four chambers [3].However, the hole detection of AVSD is still a challenging task due to the small size of the heart or unintentional fetal movements [4].The false detection carries a moderate to high risk for both the mother and her fetus.
CHD can be examined by using ultrasound imaging views.Ultrasound is a type of imaging modality process which often applied because of its non-invasive nature compared to other image modalities [5].The two-dimensional (2D) fetal ultrasound imaging assists the medician for monitoring the gestational age, size, and weight of the fetus, specifically, heart structure and function [6].However, the sonographers that have different degrees have a lack of expertise to assess the fetal heart.Experts, generally manually represent objects to define the ground truth to be used in the segmentation process [6].The heart hole detection can be ISSN: 2252-8938 Int J Artif Intell, Vol. 10, No. 3, September 2021: 553 -562 554 addressed by the precise semantic segmentation.To improve the automated segmentation, new technologies standardize measurements for optimal assessment of the fetal heart [4].
As computer science, machine learning can be implemented for segmentation tasks.Nguyen et al. [7] discuss the surface extraction using support vector machine (SVM) based texture classification for fetal ultrasound imaging.Gupta et al. [8] using the conditional random field method for the segmentation process in 2D images of ultrasound fetal with exploiting context information.Rahmatullah et al. [9] conducted a process of automated image analysis method based on a machine learning algorithm for detecting important anatomical landmarks employed in manual scoring of ultrasound images of the fetal abdomen.From the results of the previous research analysis using machine learning can do the segmentation process, especially on the fetus.However, the results obtained do not provide maximum results in the process of segmenting the fetus.Besides, the lack of feature representation of ML can affect the result of segmentation.
To date, deep learning (DL), a widespread algorithm and parts of ML that generalize the layered structure of artificial neural networks (ANN) can be implemented for medical applications [10], [11].Deep learning has developed into sophisticated algorithms for sharing aspects of image and video processing for segmentation, object detection, and classification [12].Yu et al. [13] The fetal left ventricle is segmented in an echocardiographic sequence based on a dynamic convolutional neural network.Then, Kaluva et al. [14] identified the standard fetal cardiac planes from 2D ultrasound data.Sundaresan et al. [15] conducted an automated characterization of the fetal heart in ultrasound images using fully convolutional neural networks (CNNs).Baumgartner et al. [16] carried out the process of real-time detection and localization of fetal standards using CNNs.From the aforementioned literature, CNNs are the latest generation of models to overcome in segmentation problems, especially for automated feature extraction.
This research proposes an approach that will be carried out using a deep learning algorithm with a CNNs architecture to conduct the semantic process of ultrasound image segmentation in congenital heart disease affected by septal defects.CNNs have found to be more suitable especially in ultrasound images compared to the traditional contextual model-based approach of all images in the training data process [16].Furthermore, this study proposes a U-NET-based CNNs method for conducting semantic segmentation processes on ultrasound image data.Overall, this research has a contribution to the semantic segmentation process using U-NET architecture.

RESEARCH METHOD
Convolutional neural networks (CNNs) are a method that will be used in this study using U-NET architecture.Figure 1 shows the steps of the research carried out.

Data preparation
This study using data are normal and abnormal fetal heart ultrasound videos.The video data used in this study was viewed with the view of 4 chamber views.The ultrasound video format used uses the mp4 format with a video range of 0-20 seconds.The video data used are normal and abnormal namely atrioventricular septal defect.Figure 2 is screenshots of the video proposed in this study.Figure 2(a) shows the information from atrioventricular septal defect while Figure 2(b) is the normal one.The video used consists of two videos of normal and abnormal patients.The raw video data details that will be used in this study can be seen in Table 1.This table contains information about the ultrasound video that will be used.

Preprocessing
The video data obtained will be processed to the next stage.The stages to be carried out consist of four steps which are summarized in Table 2 and will be discussed in detail.The preparatory steps taken are image resize, ground truth and data augmentation.a. Input video.At this stage, the ultrasound video data obtained will be processed into several images.The framing process carried out in this process is carried out using the help of the OpenCV tool in the python programming language.The resulting frame is obtained from the video inputted.Furthermore, the frame will be processed back to the next stage.b.Image resize.The next stage is to resize the image on the frame results obtained previously by resizing the image to 400x300 pixels.This is done to be able to balance all picture frames that will be used at a later stage.The process of the resize stage is the same as the above frame using the OpenCV tool in python programming.c.Ground truth.This stage organizes image data in the manual segmentation of images that will be seen in the heart.At this stage, manual segmentation is performed using Adobe Photoshop CS 6 Software Table 3 is the detailed results of manual segmentation.d.Data augmentation.This stage is a technique used to increase the amount of data in order to produce the best model without losing the essence of the data.This can be done by performing random transformations for the data.The data augmentation used was augmentation flipping technique.The image is randomly fixed at the predetermined initial size of 400x300 and does not change the previous process.

Convolutional neural networks (CNN)
The CNN architecture is part of a deep learning technique that has variations from the multilayer perceptron (MLP).The CNN has many successful applications like hand-written character recognition, object detection, and classification, where CNN has significantly outperformed traditional methods using handcrafted features or other learning based approaches [17].The most important part of the CNN consists of nodes that are connected to each other based on the weight of the neural network.CNN's important points are the convolution layer, pooling layers, and fully connected layer.Data which consists of three 2D arrays is CNN input which has height, width, and depth.CNNs introduced by Cun et al. [18] are a class of biologically inspired neural networks of convolutional filters and simple non-linearities [19].An illustration of the CNNs architecture, in general, is shown in Figure 3. CNNs have a hierarchical architecture.The essential components of each convolution consist of three components of the convolution layer, the max-pooling layer, and the activation function, as shown in Figure 3.For the convolutional layer, each channel of its output is computed as: The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarizing the features lying within the region covered by the filter.For the pooling layer, its output is computed as: CNNs is usually composed of convolution layer, pooling layers and fully connected (FC) layer.The last FC layer, as usually configured in classification problem, we use sigmoid function (3).

Int J Artif Intell ISSN: 2252-8938
Segmentation atrioventricular septal defect by using convolutional neural… (Ade Iriani Sapitri) From output of each layer, there is often a nonlinear active function, such as sigmoid and ReLU.In this research, we adopt the basic CNNs architecture U-NET and is shown in Figure 4. CNNs are convolution models that have the main categories for image recognition, namely segmentation, classification, identification of object detection.Segmentation is an area where CNNs are widely used [20].U-NET is a semantic segmentation architecture on CNNs which is often used to perform segmentation processes using medical data.U-NET has two important points in the convolution process, namely contraction path and expansion path, to improve the accuracy of both architectures using the dice coefficient.The network architecture is illustrated in Figure 4 which consists of the contraction path and expansion path.The contraction path follows a typical convolutional network architecture [21].This consists of a 3x3 repetitive convolution process that is not added.This process is done using ReLU and 2x2 maxpooling to do the downsampling process.Every downsampling process is carried out, this process is duplicated in feature channels.In the expansion path, always consistent with the process of upsampling the feature map with the convolution process 2x2 (up-convolution) which divides the two features with the feature map cut from the contraction path and the 3x3 convolution process using ReLU.At the final layer convolution, 1x1 is used to map each vector feature of 64 components to the desired number of classes.

RESULTS AND DISCUSSION
The results of the segmentation research after training the U-net model on deep learning on ultrasound images.U-NET is able to use hyperparameter with activation function (ReLU), sigmoid activation function, and loss using dice coefficient function [22].In this section, the fetal heart segmentation process uses ultrasound images detected by septal defects and normal.The technique used in this study uses a convolutional neural network method based on U-NET architecture.The U-NET architecture information used is summarized in Table 4.
The training data process is done by using a dataset that has been done preprocessing is 372 image data consisting of original image data and ground truth.Used data contains septal and normal defect images.The data information used can be seen in the Table 5 segmenting the process especially in the analysis of medical data to require a very long process in processing image computing performed training process.We use learning rate 1e-5 with Adam optimizer and smooth loss 1 −5 , threshold 0.5.In the training data, the data augmentation process is carried out on the data generator model that is applied in this research, namely U-NET.b.Post-processing: What is obtained from the prediction data from a model sometimes has several regions with different labels, unlike the ground truth segmentation that has been done before.This study uses postprocessing to help get maximum results, while postprocessing is used using the thresholding algorithm.Thresholding is the process of dividing an image into two or more classes of pixels, as in this case, it is "foreground" and "background".The thresholding algorithm can assist in image processing in terms of eliminating noise and allowing it to increase high accuracy [23].The predicted results obtained at the next stage of U-NET architecture process the results of the prediction image using the thresholding algorithm used in this study using the fixed thresholding.The next stage is the process of validation and evaluation to see how well the CNNs method works with the U-NET architecture on ultrasound image data affected by septal defects.c.Metrics evaluation: Segmentation is an important step for pre-processing in image analysis application [24].From the result of segmentation, it will then be possible to identify the area of interest and the object of the occurrence of an event which is very useful in subsequent image analysis [25].We conduct test on the semantic segmentation process using metrics evaluation pixel accuracy, mean accuracy, mean iou, precision, recall and F1 score.The results of the segmentation methods that have been carried out above can be seen in Table 6.We conducted training on the U-NET model with random data of 594 picture frames with two patient objects and testing data of 75 picture frames with two objects.Table 6 shows the results of a segmentation sample from U-NET and V-NET.
The results of the U-NET and V-NET training test results based on the plot graph of the accuracy results as a feature extractor with 1000 epochs are shown in Figures 5 and 6.Based on the prediction results of the U-NET model that has been carried out the next step is to test the model with metrics evaluation based on the segmentation case in the image.The performance results for segmentation can be seen in Table 7.Then, in this study summarizes the results of previous studies and makes a comparison of the results with the segmentation case.The results of comparisons in previous studies can be seen in Table 8.

CONCLUSION
The research results show the results of segmentation predictions using performance in segmentation cases using the U-NET model as feature extraction.The result after training has a good segmentation with pixel accuracy is 97.79%, mean accuracy is 97.82%, mean IoU is 96.10%, precision is 96.41%, recall 95.72% and F1 score is 96.02%.

Figure 1 .
Figure 1.The workflow of fetal echocardiography

Figure 2 .
Figure 2. Video of (a) abnormal atrioventricular septal defect and (b) normal

Figure 3 .
Figure 3. Architecture of a convolutional neural networks

Figure 4 .
Figure 4. Architecture for segmentation using U-NET . a. Training parameters: We did the training process of the U-NET based convolutional neural network model with epoch 1000 and batch size 64.Using epoch 1000 was due to get the right results for

IntFigure 5 .Figure 6 .
Figure 5.The result of (a) accuracy and (b) loss of the pretrained U-NET model

Table 1 .
Video data information

Table 4 .
Summaries of architecture layers of U-NET

Table 5 .
Split data training and testing

Table 7 .
Comparison performance measures for the segmentation results

Table 8 .
The comparison CNNs-based U-NET architecture performance in dice coefficient