Sophisticated face mask dataset: a novel dataset for effective coronavirus disease surveillance

ABSTRACT


INTRODUCTION
The coronavirus disease (COVID-19) pandemic has caused a global health crisis, leading to significant morbidity and mortality worldwide.Health authorities have recommended face masks as a critical measure to prevent the transmission of COVID-19 [1].Face masks are essential when physical distancing is impossible, such as in crowded public spaces and public transportation [2].However, ensuring compliance with face mask mandates and guidelines is challenging, as it requires reliable detection of individuals wearing masks.
Computer vision and deep learning techniques have shown significant potential in automating face mask detection for enhanced COVID-19 surveillance and control [3], [4].Recent research presents diverse models for this purpose.In [5], MobileNetV2 and YOLOv3 achieved 99% accuracy for mask detection and 94% for social distancing.As seen in [6], hybrid approaches combining eigenfaces and neural networks attained test accuracies of 0.87, 0.987, and 0.989 for varying components.Utilizing MobileNetV2, Hassan et al. [7] developed a real-time mask recognition system on embedded devices with a recognition rate of 99%.In [8], a machine learning model accurately inferred emotions both with and without masks using Haar feature-based cascade classifiers.Hassan et al. [9] employed a Jetson Nano, infrared temperature sensor, AMG8833, and C920e camera to achieve 99% and 100% accuracy during training and testing [10] introduced a portable IoT device for COVID-19 guideline enforcement, encompassing mask detection, social distance alerting, crowd analysis, health screening, and assessment.A real-time face recognition system for attendance with mask detection was proposed in [11], investigating eigenfaces and local binary pattern histograms.Mobilenet-V2-based models demonstrated 95% accuracy and a 0.96 F1 score [12], [13] utilized YOLOv3 trained on celebi and wider-face databases to achieve 93.9% accuracy for mask detection on face ISSN: 2252-8938  detection data set and benchmark (FDDB) [14].However, the accuracy and effectiveness of such methods depend on the quality and diversity of the training data used.Currently, there is a shortage of high-quality, annotated datasets of individuals wearing face masks, which limits the ability to develop robust and accurate detection models.
In this study, we introduce the sophisticated face mask dataset (SMFD), a new collection of highquality face mask images annotated with detailed information on mask type, fit, and wearing behavior.We compare our dataset with two existing datasets, the real-world mask face dataset (RMFD) [15] and the masked face recognition dataset (MFRD) [16], using state-of-the-art deep learning models including EfficientNet-B2 [17], ResNet50 [18], and MobileNet-V2 [19].The results show that the proposed dataset outperforms both RMFD and MFRD on all three models in terms of accuracy, precision, recall, and F1 score.
The contributions of this study are twofold.Firstly, we present a new dataset of high-quality face mask images that can serve as a valuable resource for researchers working on COVID-19 surveillance and control.Secondly, we demonstrate that our dataset can be used to train deep learning models that are robust to challenging conditions such as occlusion due to facial hair or hand.Overall, our findings suggest that the SMFD has the potential to improve the accuracy and reliability of face mask detection and contribute to the efforts to control the spread of COVID-19.

METHOD
The proposed dataset is being compared with two benchmark datasets, RMFD and MFRD.To assess the performance of the proposed dataset against these benchmarks, three state-of-the-art models, namely EfficientNet-B2, RseNet-50, and MobileNet-V2, have been employed.The reason for selecting these models is that they are particularly suited for resource-constrained devices, such as face mask surveillance systems, and can provide quick responses with high accuracy.

Dataset description
The Sophisticated FaceMask dataset is a publicly available dataset that contains images of faces with and without masks and incorrectly masked faces.The dataset is diverse and unbiased to ensure its effectiveness in various computer vision problems.Each category is further subdivided based on their properties, which can be useful for other computer vision problems.For example, the without-mask subcategory can be used for simple face detection problems, while the complex without-mask category can be used for face occlusion detection.The dataset includes some sample images, which are displayed in Figure 1.The information about the dataset is shown in Table 1 and Table 2.The real-world masked face dataset (RMFD) encompasses 5,000 images portraying individuals both with and without masks, evenly split into 2,500 images each.Annotations in bounding boxes around faces are provided, facilitating the evaluation of both masked faces and general face detection algorithms.Conversely, the MFRD serves as a benchmark, containing 3,000 images of 600 individuals, each with 5 images.For each individual, 2 images exhibit masks, while 3 showcase unmasked faces.Diverse mask types, including medical, cloth, and respirator masks, are represented in the dataset.Table 3 compares This dataset to established standards for face mask identification algorithms.

Fine tuning of models
We used three state-of-the-art deep learning models, namely EfficientNet-B2, ResNet50, and MobileNet-V2, for face mask detection.Each model was trained using the SFMD, RMFD, and MFRD datasets, and their performances were evaluated and compared.The EfficientNet-B2 architecture is part of a family of convolutional neural network (CNN) architectures that combine convolutional layers, squeeze-andexcitation (SE) blocks, and mobile inverted bottleneck (MBConv) blocks.EfficientNet-B2 contains 19 layers and 8.1 million parameters, starting with a 7×7 convolutional layer, followed by batch normalization, Swish activation, and max pooling layers.The architecture also includes repeated convolutional, SE, and MBConv blocks, a convolutional layer with 1,280 filters, batch normalization and Swish activation layers, global average pooling, a dropout layer with rate 0.3, and a dense layer with 3 output nodes and softmax activation.The ResNet50 architecture, a CNN architecture that uses residual blocks to prevent vanishing gradients, has 50 layers and 25.6 million parameters, beginning with a 7×7 convolutional layer, followed by batch normalization, rectified linear unit (ReLU) activation, and max pooling layers.The architecture also includes repeated convolutional blocks with residual connections, global average pooling, and a dense layer with 3 output nodes and softmax activation.MobileNet-V2 is another family of CNN architectures that reduce computation and memory requirements using depthwise separable convolutions.MobileNet-V2 contains 16 layers and 3.4 million parameters, starting with a 3×3 convolutional layer, followed by batch normalization, rectified linear activation function (ReLU), and repeated inverted residual blocks with depthwise and pointwise convolutions.The architecture also includes a convolutional layer with 1280 filters, batch normalization and rectified linear function (ReLU) activation layers, global average pooling, and a dense layer with 3 output nodes and softmax activation.
All three models were pre-trained on the ImageNet dataset and fine-tuned on our SMFD using transfer learning.We used the Keras deep learning library with the TensorFlow backend to implement and train the models.The models were evaluated using common metrics such as accuracy, precision, recall, and F1 score.

Model training and evaluation
Each model was trained on the SFMD, MAFA, and MFDD datasets for 30 epochs using the Adam optimizer with a learning rate of 0.001 and a batch size of 32.We utilized 93% of the data for training and the rest for testing the models.The models' performances were evaluated using accuracy, precision, recall, and F1 score on a test set of face mask images.The results showed that the SFMD dataset outperformed both MAFA and MFDD on all three models in terms of accuracy, precision, recall, and F1 score.Hyperparameters used during training are outlined in Table 4.

RESULTS AND DISCUSSION
This study used Google Collaboratory as the platform for training the models.The Tesla T4 graphical processing unit (GPU) was allocated for training the model.It has 16 GB of memory and uses GDDR6 SDRAM technology.The implementation utilized various application program interfaces (APIs), including Keras and Tensorflow for advanced neural network design, Sklearn for data analysis, Matplotlib for plotting learning curves, and Numpy.The model's performance was evaluated using recall, precision, F1-Score, accuracy, macro-average, and weighted average, calculated using the classification_report method from the SK-learn package.
The results of this study demonstrate the effectiveness of using quality data for image classification tasks.The study evaluated three different models, namely MobileNet-V2, ResNet-50, and EfficientNet-B2, on three datasets, including RMFD, MFDD, and SFMD, using various performance metrics such as recall, precision, F1-Score, accuracy, macro-average, and weighted average.The models were trained for 30 epochs.The learning curves of the models on each dataset are shown in The study discovered that for RMFD, all models significantly improved accuracy over epochs and peaked at 0.94, 0.95, and 0.93 for MobileNet-V2, ResNet-50, and EfficientNet-B2, respectively.ResNet-50 outperformed the other two models in the later epochs with a score of 0.95.Similarly, for MFDD, all models showed an increase in accuracy over epochs and reached a peak accuracy of 0.95, 0.94, and 0.96 for MobileNet-V2, ResNet-50, and EfficientNet-B2, respectively.EfficientNet-B2 outperformed the other two models, achieving the highest accuracy score.For SFMD, EficientNet-B2 had the highest accuracy score of 0.99, while ResNet-50 and MobileNet-V2 showed a similar trend of improvement and achieved a peak accuracy of 0.97 and 0.98, respectively.Overall, the study demonstrated that all three models exhibited a significant increase in accuracy over epochs for SFMD, with EfficientNet-B2 achieving the highest accuracy.score of 0.99.The performance of the models on SMFD dataset is shown in Table 5.

Output
Figure 5 shows the output of the EfficientNet-B2 model after being trained on SMFD, the model generates a colored rectangular frame around the face.A red frame means that the face is unmasked, green indicates that the person is wearing a mask correctly, and blue shows that the person is wearing a mask incorrectly.Additionally, the model also displays the predicted class and the probability of that class on top of the rectangular frame.In future we would like to explore the vulnerabilities of video surveillance systems to adversarial attacks [27].

CONCLUSION
This study demonstrates the effectiveness of using high-quality data and appropriate machine learning models for achieving accurate image classification results.The three models evaluated in this study, namely MobileNet-V2, ResNet-50, and EfficientNet-B2, exhibited significant improvement in accuracy over epochs for all three datasets.EfficientNet-B2 was the most effective model, achieving the highest accuracy scores for two of the three datasets (MFDD and SFMD).ResNet-50 also performed well, especially for the RMFD dataset.Future research could explore the performance of other machine learning models, the optimal number of epochs for training, and methods for optimizing model performance.This study highlights the importance of high-quality data and appropriate machine learning models for achieving accurate image classification results.

Figure 2
displays the distribution of the dataset, with four subfigures.Figure 2(a) illustrates the distribution of each class, while Figure 2(b) shows the distribution of incorrectly masked images.Figure 2(c) depicts the distribution of images with masks, and Figure 2(d) displays the distribution of images without masks.

Figure 1 .Figure 2 .
Figure 1.Example of images in database

Figure 3 .
Figure 3. Augmented images of one of the data samples using the ImageDataGenerator method

Figure 4 .
It consists of three subfigures, Figure 4(a) accuracy of the models on RMFD, Figure 4(b) accuracy on MFRD and Figure 4(c) the proposed SFMD datasets.

ISSN: 2252- 8938 Figure 4 .
Figure 4. Illustrates the models' learning curves (accuracy) on all three datasets: (a) accuracy of models on RMFD, (b) accuracy on MFDD, and (c) accuracy on the proposed SFMD dataset

Figure 5 .
Figure 5.It presents the model's output across distinct scenarios: case 1 demonstrates accurate mask identification with nearly 100% accuracy.In case 2, the model detects obstructions, like a hand, and classifies it as no-mask with 99.99% accuracy.Case 3 showcases precise differentiation between correct and incorrect masks.Case 4 successfully categorizes faces without masks

Table 3 .
Comparison of various standard facemask datasets with the proposed dataset

Table 4 .
Hyperparameter setting for all the models

Table 5 .
Performance of the models on SMFD