Intra-class deep learning object detection on embedded computer system

ABSTRACT


INTRODUCTION
Single-board architectures (SBAs) for computational (SBCs) and microcontroller (SBMs) purposes began popular and interesting solutions in the last decade, their due to low cost and low consumption, small size, and great flexibility, which makes them an alternative in many applications [1], [2].Included onboard innumerable integrating sensors and state-of-art of communication technologies that are increasingly used for do-it-yourself (DIY) projects, internet of things (IoT) devices in the field of science, technology, engineering, educational and academic project [3]- [5].Recently, the SBAs have been significant development toward artificial intelligence (AI) inside capability devices and cluster-computation purposes [6]- [8].This is possibly due to the significant improvement of capabilities of the advance reduce instruction set computing (ARISC Machine/ARM) CPU of SBAs.Advances application of SBCs are translated into high energy efficiency of giga floating point operations per second/Watt (GFLOPS/W) ARM SBCs-cluster, value for money/$ and space-utilize/M2 (GFLOPS/$, GFLOPS/M2).For example, a 16-node inexpensive and green cloud hosts ARM SBCs run and reaches 60 GFLOPS for computational modeling only consumes of 80-Watt energies [1].Furthermore, the chips industry had developed embedded systems with onboard powerful computing graphics Int J Artif Intell ISSN: 2252-8938  Intra-class deep learning object detection on embedded computer system (Putri Alit Widyastuti Santiary)

431
processing units (GPU) to provide high processing capabilities.This feature transforms an ARM SBC into a tensor processing unit (TPU) device with fast object recognition capability.The ARM SBCs-based AI is a field of computer science that aims to make functional hardware and software into something that can think like humans.AI is widely used to solve various problems such as business, natural language, perception, diagnosis, engineering, analysis, finance, science, and reasoning [9]- [12].Machine learning (ML) can be defined as a computer algorithm that works by learning from data and producing predictions in the future.The learning process to acquire intelligence goes through three stages, namely training, testing, and validation.ML is concerned with how to build computer programs so that they can provide solutions based on experience.A common ML topic is a classification based on deep learning (DL).DL has known as a revolutionary method in computer vision.The development of computer vision has increased due to most applications using cameras today.ML algorithm solves the problem by dividing them into several tasks and combined at the final stage, whereas DL solves all tasks in one algorithm.DL learns high-level data features incrementally and eradicates core feature extraction and domain expertise.DL allows the growth of intelligence without a clue (determining feature) but requires large resources [13]- [15].
AI and ML implementation leads to portable, mobile, and embedded system (EBD) devices.EBD systems are small-form special-purpose computers with limited capacities.High efficiency and or high detection accuracy are the goals of the application of EBD system.For those, the algorithm implemented on EBD system must be efficient enough to use low computing capabilities.And achieve high detection accuracy in which objects accomplish with intra-class variations, illuminations, and environmental disturbances.Furthermore, an algorithm must be able to achieve high efficiency on limited memory, speed, and computing capabilities on low-end mobile devices [16]- [18].
The object of this research detection is a flower with the Latin name Plumeria (from Charles Plumier 1646-1706, a French botanist).This plant comes from Central America.Flowers with a distinctive fragrance, with five petals of white to purplish red.Easy to cultivate brings up many varieties of flowers that vary in the shape and color of the petals.So that raises problems in naming new varieties.In addition, it has not been recorded in detail in the form of a physical identification database, and the dimensions of the petals and the color of the various flower petals are difficult to classify manually [13].In this study, performances of an intraclass DL of flower Plumeria detections on EBD are presented.It aims to get the average precision (AP) and speed of frame per second (FPS) of detection high as possible through an optimal row input dataset.By this method, the weight of the network model results become light and runs faster on the EBD.

METHOD
This study used the EBD of Raspberry Pi 4 8Gb ram.EBD is a special-purpose computer system, all the necessary parts are integrated into the device.The word embedded indicates this system is a complete set including mechanical and electrical systems.EBD has certain preset capabilities and tasks, unlike generalpurpose PCs, EBD systems have limited resources.The EBD system is small-form (credit-card size), making it easy to plant.EBD is implemented including a microcontroller with several general-purpose input-output (GPIO) pins.This system or application is used in medical instrumentation, process control, automated vehicle control, and communication devices [19]- [21].
DL is a development of neural network learning.DL is a specialized field in ML that focuses on the representation of data and adds successive learning layers to improve the representation of input data.DL requires large resources.Some problems in the classification process using the DL method are labeling objects that have high intra-class variations.Figure 1 shows the block diagram of the classification process using the DL method for high intra-class variations [13].
Previous research has tried to run the DL method on the EBD system.As is known DL requires large computing and storage resources, and it becomes a challenge to be able to run well on the EBD system.One researcher combined the offline training predictive model method with the Learnet-model to select the DL model for the new input.Other researchers selected several combinations of architectural hardware to be able to run traffic sign detection and concluded that the tensor processing unit (TPU) gave faster results than the graphic processing unit (GPU).Other researchers have reduced the computational load and memory requirements by compromising the accuracy of DL detection.Another concentrated on the dimensions of the image with a down-up scale using the DL Learnet [20], [21].
The state-of-the-art research approach is to minimize the size of the DL training weight results by minimizing the dimensions of the image dataset that is supervised by an expert (operator) (bold box Figure 2).The original image dataset is scaled down again using the operator-supervised bicubic method.The operator is used here, assuming the operator is a perfect AI engine compared to the distance-median algorithm to check the image quality of down-scaling [22]- [25].Figure 3 shows the research design of the Plumeria L classification.There are four block stages of the process to be carried out so that the output can classify Plumeria L. flowers with a very good level of precision where the system will run on EBD-TPU.The first    [27], [28].The successfully trained result, the model weight then export to TensorFlow lite version for EBD system.The results of intra-class flower Plumeria L detection in the confusion matrix are compared with or without optimally of the raw-input dataset with or without using tensor processing unit (TPU) acceleration [24].
Intra-class deep learning object detection on embedded computer system (Putri Alit Widyastuti Santiary)

433
The original dataset of the flower Plumeria L has been collected with the dimension of 300 by 300 pixels.Then, downscaling to find the optimal dimension of an image using the bicubic interpolation method.Bicubic uses 4 by 4 kernel pixels, where the obtained pixel is interpolated from a summation of four coefficients or weights.Bicubic produces a sharp image, and balances between processing time and output quality [29]- [31].
The dataset preparation consists of two datasets, the original and optimal dimensions of images for comparative study.Each dataset is separated randomly into train, test, and validation folders about 80, 10, and 10 percent of total.For annotation, the image uses the Pascal Visual Object Classes (VOC) on xml file.All dataset is uploaded and mounted as Google drive.In Google Collaboratory, TensorFlow 2 machine learning framework is used to train a custom model.Some parameters are set to obtain high average precision, lowest loss, and minimum model weight.If the results do not satisfy, the process is back to the optimal sizing image process.The minimum model weight result then exports into TensorFlow lite version to run on EBD for the inference system [27], [28].
Measurement of performances by comparison of all process results between with and without optimally of raw-input dataset i.e., the total file size, processing time of upload dataset, output of Google Collaboratory training (APs, Losses, train process time and size of file model.)and results on detection in the confusion matrix.The confusion matrix is used to determine the performances of model classification.This differs from classification accuracy, where it shows ratio in percentage of correct predictions to total predictions made.The misclassification or error rate formula is shown as in (1), where the classification accuracy can be obtained by inverting that formula.
Classification accuracy alone can be misleading and might encounter problems in practice because it hides detail for a better understanding of the performance of the classification model.Common problems occur when using multiclass classification accuracy i.e., a high classification score does not reflect true accuracy because one or more classes can be neglected by the model.Or those score model archives always predict the most common class value (imbalanced datasets).
The performances of a classification algorithm summarizing with a confusion matrix [32].Calculating a confusion matrix can give an overview to understand the model about what types of errors it is making.For multiclass of confusion matrix, there is no positive, or negative actual and predicted class, instead on a square of class matrix each of the predicted rows (P row) contains a cell of true-positive (TP), when the predict and actual class agrees, and other cells are called false-positive (FP) for each other predicted class.On the each of actual columns (A column) classes are called false-negative (FN) for each cell of the other predicted class.The remaining cells are called true-negative (TN).For clarity, Figure 4 shows an illustration of a multiclass confusion matrix of three classes [33]- [36].The recall or sensitivity describes how the success of the model in retrieving information, as in (4).The F-1 score or F-measure is the harmonic mean of precision and recall.In other words, the statistical measure of model accuracy, as in (5).

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (𝑇𝑃 + 𝑇𝑁)/(𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁)
(2) The EBD-TPU system consists of an HD Webcam mounted on a light emitting diode (LED) ring with a tripod stand, a Raspberry Pi 4 8Gb on a metal chase with a DC power supply, and a USB Google coral TPU accelerator.The Raspberry Pi runs on the Bullseyes OS, python with OpenCV 2 and TensorFlow lite interpreter library, and other dependencies.To simplify, the VNC remote desktop also runs on the EBD system.The EBD-TPU system shows in Figure 5.This study demonstrated only five classes from the total of 28 classes of flower Plumeria L. The multiclass confusion matrix will be five by five class matrix as in Figure 4.The performances of the classification model are computed from (2) through ( 5) after the results are obtained.

Optimal of raw-input dataset
The following are the results of the downscaling of raw images process into an image with a lower dimension than the original shown in Table 1.The scale uses a percentage of 80 to 5, the resulting image becomes smaller in dimension with a smaller kilobyte file size.The down-scale method can be seen in chapter 2, the methodology of this research.Supervision is marked in the marking column, where the resulting image blur cannot be used, then the down-scale process stops.The result is the optimal resolution of the image (marks are marked as optimal), where down-scale is no longer possible.For the results in the Table 1, the optimal down-scale is 16%, with a file size of 6.8 kB, and the image dimension becomes 48 by 48 pixels.Processing and resource efficiency obtained about 84 percent, from the original image.These processes repeat for 28 other classes of flower Plumeria L.

Evaluation of training processes dataset
Figure 6 shows the result of the training loss and validation loss comparison between the original image (ORI) and the optimal size image (OPT).Figure 6(a) shows the training loss and Figure 6(b) shows the validation loss for both datasets.In general, training and validation loss for the original size image has less than the optimal size image, but the gap does not so much significant.For both datasets, the validation loss has less compared with the training loss.Figure 7 shows the average timestep in seconds for one-step training for both datasets.Here, the optimal size of raw-input data gets the training process four times faster.

Average precision (AP)
Table 2 shows the result of AP for various test modes for TensorFlow (TF) and TensorFlow lite (TF lite) versions for the original image dataset (ORI) and optimal image dataset (OPT).In column AP, there are general AP, AP for 50% or 70% of test data, AP_/ for each class, and other any specific hardware architecture.In general, there is a significantly improves in AP values for the optimal dataset and AP for each class and other any specific hardware architecture.There is a trade between the TF and TF lite.AP decreases but not significantly, and trades with a lighter framework of TF, hence it can be run on EBD system.

Object Detection on EBD system
Object detection of various variants of the flower Plumeria L was carried out as in Figure 5.For the preliminary study, only five classes from 28 total classes of flower Plumeria L have been prepared i.e., 0204, 0604, 0704, 0805, and 3003.Class naming or labeling of this collection of datasets of flower Plumeria L following the result of [13].There are two processes identical for object detection i.e., detection using the original dataset and using optimally dataset.This is done by only changes of custom model weight respectively according to the dataset.The interested class is chosen randomly from five actual class objects and placed on a green screen with LED lighting.The number of replication (repetition of detection of one class) is 25 times.By statistically, adequate to draw reliability of the conclusions.Then, the EBD system predicts the class (predicted class object) in real-time.The results on the camera are a bounding box of object detection and class label predictions along with average precision (%) and frame movement speed (FPS) in seconds.
Figure 8 shows the results detection of class flower Plumeria L runs on EBD system for both datasets.3 for both datasets.Table 3(a) show result for the original dataset, while Table 3(b) show result for the optimal dataset.In general, all actual classes (column-header name) perfectly predicted row-header name classes for both datasets.The 25 value is a repetition of detection of one class.The zero value means no other class is predicted.In this section's result, no differences are affected by the optimal of the raw-input dataset.Except for a very small difference in average FPS and accuracy.437 detection of one class is 25 times) results agree between the predicted and actual class.The true-negative (TN) number for the actual class 0204 is 100 (total repetition detection of five classes).Hence, false-positive (FP) and false-negative (FN) numbers become zero.For other classes, (remain rows in Table 4) give the same result as class 0204.The detection performances are computed using (2) through (5) and results are also shown in Table 4.All detection performances are one because all replication detection perfectly predicts TP.In Table 4 precision and recall numbers for all classes are one, this is the goal of object detection.Unfortunately, hard to get a coral USB TPU accelerator due to chips outage.By using this, the FPS number will be double.

CONCLUSION
Detection of the intra-class of flower Plumeria L which has a high variation between classes has been carried out with good results.By optimizing the size of the raw-input dataset, resources and processing obtained more efficiency.The average precision obtained dramatically improved.Most useful in the training process, where train timesteps are shorter and loss becomes converges faster.This can avoid training process failure which took a long-time process.Performances of object detection on the EBD system are perfectly archived.This shows that artificial intelligence with deep learning methods on recognition between intra-classes is not only based on object structure but can recognize differences from specific object features, such as color.This proposed method is novel and can be used to resource efficiency and improve detection results.


ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 430-439 432 block is the optimization stage of the sample pixel size (1), followed by (2) the dataset preparation process (there are about 34 classes of Plumeria L. flowers, and in each class, there are 200 sample images).The next block (3) is the training process in Google Collaboratory.with a large limiting average precision (AP) of 0.86 (high enough), the difference (error) training-loss with validation-loss 0.1 (low enough) and a minimum network weight are achieved [26].

Figure 3 .
Figure 3. Algorithm of raw-input data optimization

Figure 5 .
Figure 5. EBD-TPU system uses in this research

Figure 6 .Figure 7 .
Figure 6.Loss and validation loss comparison between original and optimal image dataset, (a) for loss and (b) for validation loss

Table 1 .
Optimal of raw-input dataset for class-0104 * Indicates the optimal percentage of downscaling Int J Artif Intell ISSN: 2252-8938  Intra-class deep learning object detection on embedded computer system (Putri Alit Widyastuti Santiary) 435

Table 2 .
Comparison of AP between original and optimal image dataset

Table 3 .
Comparison of summarized confusion matrix results between both datasets

Table 4
shows part of the confusion matrix and performance results for both datasets.The true-positive (TP) number for the actual class 0204 (the first row in Table4) is 25.Because all the replication (repetition of Intra-class deep learning object detection on embedded computer system (Putri Alit WidyastutiSantiary)

Table 4 .
Part of confusion matrix and performances result for both datasets