Proposing WPOD-NET combining SVM system for detecting car number plate

Received Oct 8, 2020 Revised Jun 22, 2021 Accepted Jun 25, 2021 Nowadays, there are many smart parking lots using plate detection system to control in/out vehicles. However, the disadvantages of systems are a fixed environment and necessity of manual labor and requirement of checkpoints in entrances. To solve the problems, a novel algorithm for wide-angle detecting car number plate using warped planar object detection (WPOD-NET) and a modified support vector machine (SVM) system is proposed. Comparing to other models, the proposal improves not only the range of detection angle but also the accuracy of detecting in shady conditions. The results show that the accuracy of proposal model is up to 95.1% with 1000 testing images in various scenarios.


PROPOSAL SYSTEM 3.1. Overview
Detecting car registration numbers is a small branch of detecting subject. It employs automated methods to verify or recognize the existence of car license plate based on the characteristics of region and shapes. By detecting a rectangle region that contains a group of digits and characters, system is able to find the object of interest (car number plate). System uses pre-trained model and provides information for two main sections, namely law enforcement and commercial applications. Detecting car plate number is used to control the transportation infrastructure and to reduce the damage from congestion to the national economy. Detection system plays an important role in measuring the daily route of vehicles that helps to find the solution for traffic management. In the big cities (Hanoi or Ho Chi Minh), police departments have to maintain the traffic safety and order. Street cameras are set up to supervise moving vehicles and report them. Detecting car registration numbers algorithm consists three major phases, namely bounding box (car plate), detecting the character and digits, and recognizing them as shown in Figure 1. Algorithms containing all listed phased are considered as fully automatic systems and give output results of license plate with the text of digits and characters as shown in Figure 2. In Figure 2(a), we give the result of detecting license plate with front view. Figure 2

Proposal system
The proposal method includes three steps as shown in Figure 3. In Figure 3(a), we divide into three steps including detecting license plate, extracting characters and digits of plate, and recognizing them. Figure 3(b) shows more detail of performing steps. When we receive an input image, the first module (WPOD-NET) will find the area that has the highest confidence ratio to be the car plate. Since the red, blue, green (RGB) input image of WPOD-NET is setup the dimension from 288 to 608 pixels, images with different sizes are rescaled according to the designed configuration. After reforming, the module skews the detected object to the frontal view corresponding to [1] using T matrix to adjust the angle of characters and digits without losing their features.
In the second step, several obstacles that appears on the plate after changing to grayscale image will appear. They have the approximate values to those of numbers and characters on the plate, and thus it is difficulty to process (problems miss in aforementioned paper). The solution is therefore proposed a practice: using gaussian mask of sigma equal to 10 combining with filters and a threshold. Areas that are not surpass the threshold will be set up to value 1 (white) or 0 (black), respectively. In area inside bounding box (predicting box), OpenCV and other libraries will normalize the object and eliminate noise and obstacles that create problems for the final step. The positive outputs of the second step that is rescaled to the size 30x60 are taken to the SVM model to predict.
The difference between proposal and [1] is that SVM model is first used as the recognition model, as shown in the first path in Figure 3. The purpose of changing detection model is to improve accuracy and 660 time processing of system. Normally, a detection system is designed by neural networks (NN). In contrast, the experiment uses a hybrid system combining two heterogeneous model for application of "smart" parking for significant reasons:  Small, required data source for SVM compared to NN.  Non-trivial parameter optimization (SVM just requires 2-3 parameters).  SVM is more interpretable than NNs.  Commercial product needs to low price without changing the accuracy of system since SVM is the best choice.
(a) (b) Figure 3. Three processing steps of proposal system: (a) detecting license plate, extracting characters, digits of plate, and recognizing steps, (b) detailed steps to implement blocks

Recognizing car plate
Detecting license plate, being the first phase has important role of following steps and directly affects to the output result. Most automated system using to detect car registration numbers performs for a fixed environment. Besides, the diversity of shape and place where car plate set up creates challenges, as shown in Figures 4 and 5 for the system to have the optimal output. Manual activities proceed to collect the information that requires from the beginning instead of totally replacing by the computer.
In the paper, we choose WPOD-NET model to recognize the plate. WPOD-NET consists of 21 convolutional layers and 14 of them are inside residual blocks. The size of all internal filter is 3x3 and ReLU activations are the algorithms using in the network excepting the detection block 4 max pooling layers (size 2x2) with stride 2 that decreases the input image by a factor of 16. In the final box, there are two parallel layers to submit for two cases: one infers the probability that is operated by SOFTMAX function, one uses linear function. More details can be seen in [1].

Detecting characters and digits
In the second phase, the automated system will use an algorithm to detect the characters and digits inside the object that is found in the previous phase. By using Python and library such as TensorFlow, open source computer vision library (OpenCV), numerical python (NumPy) or PIL to process the images before starting to detect, the region has the highest probabilities to be a digit or character. The results are shown in Figure 6. Figure 6(a) is an example of detecting characters and digits of license plate by using TensorFlow.  Figure 6(b) is the input images with different viewing angles. They will be used for identification and classification in the next steps.

Recognizing characters and digits
There are many methods to recognize figures and the most well-known libraries have been used SVM [16], [17] and tesseract optical character recognition (OCR) [18], [19].  Tesseract OCR engine: This is a top engine in the world. It has been distributed with open-source Apache 2.0 that supports recognition characters in images and extracts them into raw material, html, pdf, tsv. Its function can be used through API. Tesseract OCR is an open-source project starting by Hewlett-Packard.
In 2018, the latest stable version 4.0.0 is based on long short-term memory (LSTM). LSTM is a famous form of recurrent neural network (RNN) and used to solve the text of arbitrary length. Furthermore, it supports many image formats and is gradually added a large number of languages.  SVM: Model analyzes data using for classification and regression. SVMs are considered as the highest classification accuracy as a binary classifier [20]- [22]. It is the learning technique that is considered an effective method for general purpose because of its high performance without adding other knowledges. At the beginning state, SVM finds the hyperplanes (decision boundaries) that classify the data. It performs to separate the largest possible fraction of points of the same class on one side while optimizing the distance from either class to hyperplane. This hyperplane is called optimal separating hyperplane (OSH) that minimizes the risk of misclassifying not only the examples in the training dataset but also the unseen example. There are several advantages of SVM model: i) It is a very good algorithms for the unknown database, ii) It is appropriate for specific working background similar to text classification, iii) It has great possibility in scaling to high dimensional data. Due to the fact that each region in the world uses a different font for characters and digits on the plate, we used a different dataset for Vietnam car plates. The recognition module is a support vector machine model. The primary reason for choosing an SVM model is that it only requires a smaller data source compared to a neural network. Besides that, an SVM model only requires three parameters to setup. SVM is popular in text classification tasks, where consider the norm is high-dimensional spaces. In this paper, a type of SVM is used for OCR module-C-SVM. 36 groups of characters and digits in binary format are separated by hyperplanes with penalty multiplier C equals 1 for outlier.
There is an obstacle that affects to detecting results for poor image quality. When the features are extracted, the discrimination functions between each pair are learned by SVMs. Therefore, a binary tree structure to recognize the testing samples is proposed to construct in the paper. For detecting characters and digits, multi-class SVM is used to assign labels to instances. The approach to the problem creates a difficulty of multiple binary classification. The common method is to distinguish one object from all others. It is performed based on [17] that have the classifier with highest output function.

Setup
WPOD-net combining with SVMs algorithm is used to detect the car registration numbers in the paper. The testing data consists of 1000 images of vehicles for reality scenarios. In our experiment, each of the object has three angles of license plate including on left, right, and in front of. There are four cases of bad conditions, namely in the evening, in the shadow of tree, lack of brightness, and faded numbers of plates. Several of them are slightly blurred or distorted. Distance from camera to plate is variance to consider and it is manifested through plate and original image. Most of them are in good condition with clear view. We divided two main groups of license plate, namely one for random object and one for a group with different angles of each car. Algorithm is performed by LG Gram Intel ® Core™ i5-7200 CPU @ 2.50GHz 2.71GHz with 7.86 GB RAM, 64-bit Windows 10. The ratio between training and testing dataset is 4:1.
A few cases show that the system has a mistake in determining whether square plate and rectangle plates. As a result, the output results are not correct. Due to different angles of the objects, there are several mistaken shape classifications.

Results
There are several salient instances as shown in Table 2. Those are images of vehicle are captured in challenging scenarios by ourselves. We also use the filter to check the quality of images when apply algorithm. As a result, the algorithm failed to identify the characters and digits on plate without the filter as shown in Figure 4. On the other hand, several cases are identified fully the characters and digits with filter.
In Tables 2 and 3, there are several cases that contains plate with complex scenarios. The first three cases are applied Gaussian mask to improve the output results for detection step where all characters and digits are recognized. No. 4 and 5 do not apply the noise filter, and thus result in recognition step is not good. No. 6 is the case that image is affected by streetlight. As a result, no. 4 is missed in predicting step. No. 7 is 663 an example of our system executing in shady condition with uneven brightness. By using filter, the output is 100% correct recognition. Our goal of experiment is to recognize correct the whole string of characters and numbers of the plate. Final result contains 951 correct images with full sequence numbers and characters of objects. On the other hand, 49 incorrect results give outputs with mistaken numbers and characters or missing object due to conditional environment. As shown in Tables 2 and 3, most of incorrect results happen for 42 cases since recognition system has limitation in detecting characters and digits in plates that is transformed from oblique views. Besides, seven objects taking images from frontal views have incorrect outputs because of common errors such as mistaking in recognition of B and 8 or the brightness of environment. Figures 5 and 7 illustrate intuitive scenarios of smart parking. Processing time to detect the plate is from 0.7 to 1.2 seconds (depending on the change of environment and quality of images). However, the result of proposal is not good in several cases based on the weather condition, the blurring problem, or shadow of object covers. We will therefore improve the system to solve the problems and combine others advantaged networks [23]- [26] to improve the accuracy.

CONCLUSION
This paper proposes the car numbers detection algorithm for parking systems. The system needs to improve the step of increasing standard of object detected images. In the real applications, environment plays an important role of detection. The factors include: the camera has low quality; the weather condition is bad; or the blurring problem or shadow of object covers the plate. All problems cause to decrease the resolution of plate detection system. The shadow creates different brightness in the plate that makes the system to be unable to normalize the images for further steps. Therefore, we will create an adaptive wavelet filter optimizing the pre-process module and combine with other networks to improve the accuracy for the proposal system. Cuong Vu Quoc is student of Electronic and Telecommunications at Hanoi University of Science and Technology (HUST), Vietnam. Currently, he is working in Future Network lab. at HUST. His main duty is developing smart products which relates to digital image, video processing, machine learning and internet of things (IoT).