Facial emotion recognition using deep convolutional neural network and smoothing, mixture filters applied during preprocessing stage

Received Oct 20, 2020 Revised Sep 18, 2021 Accepted Oct 1, 2021 The facial emotion recognition by the machine is a challenging task. From decades, researchers applied different methods to classify facial emotion into the different classes. The expansion of artificial intelligence in a form of deep convolutional neural network (CNN) changed the direction of the research. The facial emotion recognition using deep CNN is powerful in terms of taking bulk input images for processing and classify with high accuracy. It has been noticed in a few cases the classification model does not judge the facial images into appropriate classes due to the influence of noises. So, it is highly recommended to apply a noiseless image to the facial emotion recognition model for classification. We adopted a mechanism and proposed a model for classifying facial image into one of the seven classes with high accuracy. The images are smoothed before applying to the model by different smoothing process as part of image preprocessing. We claim facial emotion recognition with image smoothing by different filters or a mixture of filter are more robust than without preprocessing. The detail is explained in the subsequent sections.


INTRODUCTION
Image is a set of pixels, represented by the function ( , ) such that, ∈ ( − ) and ∈ ( − ) of an image having the scalar quantity, is equivalent to the amount of energy radiated from the place image is taken. Suppose . Often, the image cannot be analyzed in true sense due to its bad quality and amount of the noise present [1]. The corrupted image is presented as (2): where ( , ): Noiseless image and ( , ): Noises present in the image. The presence of noise corrupts partially or in a regularly at different portions of the image. As a result, the image knowledge extraction may not be in a true sense. For recovery, the quality of the image from the noise image filtering is used. According to [2] there are several filters like average, median, gaussian, and bilateral are used to smooth the image. In this situation the convolution is used and is represented by operator ⊛ applied on ( , ) with the impulse response of ( , ) create smooth image ℎ( , ) explained as (3).
The human face represents some sensible information which changes from time to time [3] with external or internal influence. In this article we have demonstrated the facial emotion recognition model by applying artificial intelligence. The input to this model is filtered by different filters as a part of image preprocessing that lead by higher accuracy compared without smoothing. The facial emotion recognition begins from Darwin, [4] said there are 40 human expressions curves a human face poses after perceiving inputs from the environment. The action units [5], [6] of the face are the fundamental unit of the expression which contain sensitive information of expression. Convolutional neural network (CNN) consist of convolutional layer, pooling layer and fully connected network [7] is the most interesting tool and technology that, produces promising result [8] for any high-level scientific computation [9], [10]. Convolutional neural networks are not only for facial emotion recognition that we applied in the under described research, but also in several classifications such as human disease classification [11], [12], and plant disease classification [13]. Before deep CNN quite popular, the image classification uses a different machine learning algorithms and methods to classify in applications like brain tumor [14], [15], Plant disease [16], [17] and other [18], [19].
We have adopted a deep CNN in our research. The input to the architecture is preprocessed facial image which is filtered by various filters [20] as a result the quality of the image is enhanced. Filters have different measures for smoothing the image by removing impulse noise as per the function it uses. The convolutional neural network accepts smoothed image and train an artificial intelligence model for facial emotion recognition that is either happy, sad, fear, disgust, neutral, surprise and angry. In the general complexity of the model increases and accuracy decreases as number of the classes increases that are more challenging. We claim our model stood well for a wide variety of emotion classification with high accuracy.
The primary input to facial emotion recognition model is an image. The training of the model is influenced depending on the amount of noises are in the images. It is believed that the smoothed image is more robust than not. The filters that smooth images are average, median, gaussian and bilateral each filter have its own pros and cons. However, most of them cannot well recover a heavy noise corrupted image with noise density above 70% to preserve the detailed information of an image [21]. The median filter and its different variants are extensively used [22] to reduce the impulse noise from grayscale images and the performance is increased. Averaging the pixel intensities with respect to the size of the filter is a common method for smoothing the image, but fuzzy averaging [23] reduces impulses in a large way. Identify the pixels belonging to the borders, then apply a reduced smoothing and applying more intense smoothing to the remaining pixels produced a standard result [24] in the ultrasound image application.
The median filtering is a good choice of noise reduction. An improved median filtering algorithm [25] uses the correlation of the image to process the features of the filtering mask over the image. Median filtering based on combined features of different image that, consist of joint conditional probability density functions, principal component analysis is used to reduce the dimension is performing on the uncompressed image datasets. A new proposed method [26] uses a median filter using prior information to capture natural pixels for restoration, this method restores corrupted images with 99% level of salt-and-pepper impulse noise. Switching among the median and mean [27] by detecting a filter is a proved method of smoothing.
Gaussian function used for gaussian blur [28], is a kind of normal distribution. The original pixel having the highest intensity is replaced by maximum gaussian weight and proportionally the lower intensity is replaced by low gaussian weight. The review article [29] is a good collection of gaussian filers used in different applications and explained the advantages of this filter with respect to others. The noise reduction along with preserving edge information [30] smoothing achieves using the [31] bilateral filter. Here the intensity of each of the pixel is substituted by a weighted average of an intensity calculated from the nearby pixels. The framework for image denoising [32] and suppresses mixed noise in color images [33] are a few of the advance example using the bilateral filter. The remaining of the paper is organized into the sections as: Section 2: Research method, section 3: Result and discussion and section 4: Conclusion.

RESEARCH METHOD 2.1. Dataset description
The renowned datasets FER2013 and CK48+ datasets are used for experimentation in the proposed model. CK48+, Fer2013 datasets consists of 3540, 35887 images related to seven different facial expressions such as happy, angry, sad, surprise, neutral, disgust, and fear, respectively. All the images are normalized, standardized by using standardization and normalization techniques, all the images are resized into a fixed dimension of 48X48 to maintain uniformity.

Filter description
The basic focus of our research is to observe facial emotion classification and its accuracy achievements for smoothed input images. The images undergone through different smoothing process and observation is tabulated in experimental section. For smoothing the images, a hybrid smoothing filter is proposed which is formed by the combination of average, median, gaussian, bilateral filters and their performances are compared. The equations used in each of the filters are as mentioned is: average filtering in (5), median filtering in (6), gaussian in (7) for 1D and in (8) for 2D, bilateral in (9),

Model description
In the devised model a facial emotion recognition image dataset is taken and is converted to a hybrid image set by applying various smoothing techniques.
Step 1: Initially, n random images from the image set is selected by using function proposed in the algorithm.
Step 2: Average filtering is applied on the randomly selected images and the resulted images are stored in hybrid image set, the random images selected are removed from the original image set.
Step 3: The same process is repeated by using median, gaussian, and bilateral filters and a hybrid image set is formed from different filtered images.
Step 4: Assign labels to the resulted hybrid image set Step 5: Divide the hybrid image set in the ratio of 80:15 for training and testing purpose Step 6: Train the proposed CNN model with selected images for training and evaluate with the images selected for testing for training and evaluation.

Algorithm
The stages in the algorithm illustrate the process in evaluating a face picture as an input into an emotion class. The algorithm uses three functions: hybrid filtering, randSelect, and FacEmoRec. The hybrid filtering function chooses pictures that are filtered using average, median, bilateral, and gaussian methods. FacEmoRec classifies photos based on emotion using the randSelect function, which randomly picks photos from the original dataset on which filtering should be done.

RESULTS AND DISCUSSION
High computation speeds in terms of graphical processing unit (GPU), central processing unit (CPU) and memory are required to build a hybrid image filter algorithm and to build a CNN model for evaluating the performances of the hybrid image filter dataset. We took the support of Google Colab cloud service support for developing the above-mentioned models. The configuration of the cloud service used is described as: Frequency of CPU: 2.30 GHz, GPU Used: NIVIDIA (12GB), Size of Disk Space Supported: 25 GB, Editor Used: Jupiter Notebook. CK48+, Fer2013 datasets that consists of 3540, 35887 images related to seven different facial expressions such as happy, angry, sad, surprise, neutral, disgust, and fear are considered for experimentation. Average, median, gaussian, bilateral and the proposed filter hybrid filters are considered for filtering the datasets and the resulted images are given for a CNN model for evaluation. It is observed that the images that were considered as inputs to the CNN model after applying filtering produced better results when compared to the images where filtering is not applied produced better results when compared to the images where filtering is not applied. Figure 2 represents accuracy and loss comparisons that are obtained from the model without filtering and with average and median filtering techniques applied to CK48+ dataset. Figure 2(a) represents train and test loss comparisons without filtering, Figure 2(b) represents train and loss comparisons when Average filtering is applied and Figure 2(c) represents train and test loss applied when median filtering is applied on CK48+ dataset. Figure 3 represents accuracy and loss comparisons that are obtained from the model with gaussian, bilateral and proposed hybrid filtering techniques applied to CK48+ dataset. Figure 3(a) represents train and test loss comparisons of gaussian filtering, Figure 3(b) represents train and loss comparisons when bilateral filtering is applied and Figure 3(c) represents train and test loss applied when hybrid filtering is applied on CK48+ dataset.  Figure 5(a) represents train and test loss comparisons of gaussian filtering, Figure 5(b) represents train and loss comparisons when bilateral filtering is applied and Figure 5(c) represents train and test loss applied when hybrid filtering is applied on FER2013 dataset. Table 1 Table 2 expresses the performance comparative analysis of train and test accuracy, loss and time taken for each epoch execution of the model with filtering and without filtering compared to the proposed hybrid filtering technique applied on CK48+ dataset. Figure 6 is a bar chart of accuracy levels that are obtained from the model with filtering and without filtering compared to the proposed Hybrid filtering technique applied on CK48+ and FER2013 datasets. Figure 7 is a bar chart of loss levels that are obtained from the model with filtering and without filtering compared to the proposed hybrid filtering technique applied on CK48+ and FER2013 datasets.

CONCLUSION
The research described in this article is a robust deep convolutional neural network (CNN) model for facial emotion recognition into one of the seven classes. In this proposed model the input is a mixture of smoothed images produced by different smoothing filters. The model resulted in reasonable performance in terms of accuracy, loss on the test dataset trained using CK48+ and FER 2013 mixed smoothed images. This can be extended to find out most suitable filter for an image which may further increase the accuracy level