Anomaly detection using deep learning based model with feature attention

ABSTRACT


INTRODUCTION
Anomaly detection is the process of differentiating between abnormal and normal or known patterns.An anomaly is defined here as a pattern/data/image that deviates from the natural order of things.Due to the diversity of the dataset, the closeness of normal and abnormal data, and the presence of noise in the dataset, abnormal object detection is a very difficult problem.Reconstruction-based methods are extremely popular and widely used in this application because they are efficient with unlabelled datasets.The auto encoder architecture is widely used in a variety of applications, including classification, compression, and target recognition [1].Model has two parts: encoder and decoder.Encoder compresses input and creates latent space.Decoder uses latent space to recreate original data from latent vector.Auto encoders come in a number of different forms, and they're widely used in a variety of applications [1].Variational autoencoder [2] and adversarial autoencoder [3] are two popular architectures from the same family.When data regeneration has occurred, reconstruction is a critical part of the model.With these models, lossless regeneration is extremely  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 383-390 384 difficult to achieve, and research on the subject has received little attention [4].It can regenerate the same using random noise after training a generative model like the generative adversarial network (GAN) [5].Game theory inspires GAN, which challenges both the generator and the discriminator against each other [5], [6].
The detection of anomalies using autoencoder models and their various variants has been the focus of this study.To represent data with sparse features, probabilistic models can transform sparse vectors into various probability distributions.Using an adversarial autoencoder, it is possible to learn the probability distribution of the latent vector z from the noise sample p (z), which is used as the base model as shown in Figure 1.Adversarial auto encoder (AAE) architecture is depicted in Figure 1.With the addition of a second image discriminator, designated as discriminator 2, the network must generate an image using feature-wise loss and a simple autoencoder reconstruction loss that will attempt to fool discriminator 2. We looked at a variety of loss functions in order to ensure that different classes of images come from different distributions while regenerating the original images.
As a performance parameter, we also considered reconstruction error, and the resulting image has the lowest error rate.We evaluated the model's ability to detect anomalies and found that it outperformed others.In this manuscript, we addressed the detection of abnormalities in videos and images.The term "anomaly" refers to an object that is not normal.By learning the feature representation of the normal class, we are able to identify anomalies that are not from the known class.We used the University of California San Diego (UCSD) dataset and the modified national institute of standards and technology (MNIST) handwritten characters dataset for testing the algorithm.The paper's structure comprises related work, proposed architecture, experimental results, and conclusion in sections 2 to 5, respectively.

RELATED WORK
Many researchers have done comprehensive surveys on anomaly detection.Ruff et al. discusses anomaly detection in detail using machine learning and deep learning in [7].In the manufacturing industry, anomaly detection can be extended to defect detection.Saad et al. [8] described one such method to detect defects using a grey level co-occurrence matrix for the beverage manufacturing industry.Anomaly classification is the process of classifying and detecting abnormal patterns that deviate from the rest of the data [9].As unsupervised learning models for anomaly detection, generative models are widely used.The idea is to train the network with known data that is normal, and then the network can classify unknown or unseen data as anomalies because the model will not be able to regenerate them or the regeneration loss will be greater than the loss for known data.
Because of its superiority over other traditional methods, deep learning is widely used for anomaly detection [10], [11].Sharipuddin et al. [12] used deep learning-based method to detect Intrusion for internet of thing (IoT).In [13] anomaly detection with generative adversarial networks (AnoGAN), GAN is used to Int J Artif Intell ISSN: 2252-8938  Anomaly detection using deep learning based model with feature attention (Rikin J. Nayak) 385 calculate the anomaly of two losses: residual loss and discriminator loss.The authors improved their algorithm in [14] by including an encoder that generates latent vectors for the input image.They've also replaced GAN with Wasserstein GAN.However, [15] demonstrates that the discriminator is unsuitable for measuring anomalies.The authors discarded the discriminator during testing [16] because it did not improve the anomaly score.There is a lack of work on anomaly detection using multiple views [17].In [18], the author detects facial micro-expressions to detect anomalies in the dataset given in [19].In [20] the author has proposed a rough set method-based outlier detection for large scale dataset.Autoencoder is widely used for anomaly detection of data.An autoencoder is a network whose goal is to regenerate input data with the least amount of error possible.The first autoencoder was introduced by the author in [21].The input data is encoded and represented by a latent vector; the decoder decodes the vector and regenerates the original data with minimal loss.This concept is used to detect anomalies by training the model with known data, resulting in a very high loss for unknown/abnormal reconstruction.Auto encoders such as variational auto encoders, adversarial auto encoders, and other types of auto encoders have been proposed and used for a variety of applications.The author [22] used an auto-encoder for anomaly detection.As with an auto encoder, an adversarial auto encoder trains the network by forcing the latent space, which is the encoder's output, to have the same distribution as the prior.
Generative adversarial networks (GANs) are popular in computer vision [23], [24] and anomaly detection [25] because they can generate data and handle complex data distributions effectively.GANs have a lot of benefits, but they're hard to train [26].In contrast to an encoder, GAN generates images by considering feature-wise errors rather than element-wise errors.Variational autoencoder -generative adversarial networks (VAE-GAN) is a hybrid network proposed by [27] that combines a variational autoencoder with a generative adversarial network.Different models have different features that can be used to solve specific problems.One such problem for which the proposed model provides an accurate solution is anomaly detection.Two conditions define the best network: models must be able to generate data effectively, and data must be classified using specific Euclidian distances between classes.For a few examples, networks may generate data/patterns even if they haven't been trained to do so, making anomaly classification more difficult.This happens when data from different classes with similar structures merges.Different loss functions, in addition to simple reconstruction loss, could be considered in such cases.The proposed model employs mean squared error (MSE) and Kullback-Leibler (KL) divergence as loss functions.In the present article, we propose a custom model with a loss function to improve the performance of the generative model for anomaly detection applications.The proposed model for anomaly detection is described in the following section.

PROPOSED MODEL
The general architecture of the proposed model is as in Figure 2. It includes two networks.Adversarial auto-encoder and discriminator.In the case of a variational autoencoder, the distribution of latent vector would be normal because of the KL divergence term in the loss function.In (1) shows the loss function of variational auto-encoder (VAE).
In an adversarial autoencoder, regardless of reconstruction loss, the latent space can come from any distribution, and it is dependent on the noise vector p (z), as shown in Figure 1.In AAE, it uses an adversarial concept, and the latent vector q (z) has an adjustable distribution compared to VAE, i.e., the encoder itself works as a generator.We took this advantage into account when selecting an AAE as the base model for the proposed network.The second network in our model is the discriminator, which discriminates the input image and the generated image from AAE, because of which network will be jointly trained by two models, which results in better reconstruction.
The proposed model has four components.First, the encoder encodes the input image and generates the latent vector, which is a compressed representation of the input data.Second, the decoder takes the input from this latent vector and regenerates the input data by minimizing reconstruction loss between the generated image and the original image.The third component is discriminator 1, which takes two inputs, one from a vector with a known distribution P and one from a latent vector Q, and this discriminator forces latent vector Q to have its data distribution close to the known distribution P.This allows a user to generate any desired distribution from Q. Fourth, Discriminator 2 is another network which discriminates the input image from the generated image and works as a generative network which jointly trains the encoder-decoder part to regenerate the input data by ensuring better reconstruction, which will be proved by the simulation results in the next part of the paper.In the proposed network discriminator 2, consider feature-wise error over element-wise error, which adds GAN's advantage over the auto encoder in the proposed model.
Here, the KL divergence term will force the network to generate a sample having a distribution as close as the input image.The Discriminator I network will function as an AAE, generating the same latent space distribution as the prior distribution.Using this layer, we can select the required distribution for the output vector (attention weights) of the encoder.The Discriminator II network will work as a generative adversarial network and will try to generate images to fool the discriminator while the discriminator will classify the original as true and the generated as fake.The function of this network is similar to that of GAN.Loss function for this network is, this loss function is the loss function of GAN [5], but here both the inputs are images.Here, D2: Discriminator 2, D: Decoder, E: Encoder.One input of Discriminator 2 is from the original image, and the second input is from the generated image from the decoder.The network is trained as the same as GAN, in that the discriminator will try to increase the loss function while the decoder will try to reduce the loss function by generating an image that is as close to the original as possible.
Figure 2. Proposed model

RESULTS AND DISCUSSION
This section discusses the description of experiments performed on image and video data.We have divided our results related discussion into two parts.In the first section image-based anomaly detection is discussed.In the second part the video dataset is used for abnormality detection.

MNIST dataset
There are ten classes in the MNIST handwritten dataset.Each image is 28×28 in size.We follow the same method as given in different literature.First, we combined the training and test data.Each time, one digit is considered an outlier and is removed from the training dataset.80% of all normal datasets are kept in training, while 20% are kept in testing [28].As an additional experiment, we used the MNIST fashion dataset also.The Experimental setup is similar to [28], [29], with training data containing 80% normal data and testing data containing the remaining normal data and all abnormal data.The image has been normalized between 0 and 1.In order to detect anomalies in images, we used the following loss function for the auto encoder: The loss function looks at how likely it is that an image will be like the one that was generated, as well as how far the original image's distribution is from the one that was generated.Because the network will be trained to generate the most likely sample for a given prior input, maximizing likelihood is equivalent to minimizing loglikelihood.Figure 3

Attention mechanism
The encoder section generates a 16-dimensional vector that serves as the attention weights for each class.We used attention weights for the input image and attention weights for the regenerated image using the same network for the same input image to calculate loss.For known images on which the network has been trained, each image will have an identical vector, whereas for unknown images, the distance between the vectors will be greater.As a result, this can be considered for the discovery of unknown/abnormal images.Figure 4 shows the attention feature for a specific input class image.
To calculate the loss, the input image is encoded and the latent vector W is generated.In the same way, the regenerated image from the network was passed through the encoder again, and a new latent vector, W1, was generated.Loss is calculated using the following formula based on both latent vectors, In this section, we compared our results to [28].We are evaluating the receiver operating characteristic (ROC) curve at various thresholds.The results of the AUC score using image reconstruction and attention features are shown in Table 1.We calculated the final anomaly score in Table 2 by averaging them.We used the results from [28] to compare with other algorithms.We also ran our model on the MNIST fashion dataset.There are ten classes in the MNIST fashion dataset, and each class is considered an anomaly.Table 3 displays the results for the fashion dataset.

UCSD dataset
The UCSD anomaly detection dataset contains a variety of videos captured by stationary cameras.The dataset includes training and testing videos, as well as ground truth about anomalies found in the videos.There are 34 training samples and 36 testing samples.Table 4 displays the recoded results using the proposed model.To measure the performance, we have used area under the curve (AUC) and equal error rate (EER).Figures 5 and 6 depicts the regularity score for testing videos.The results show that the model can detect anomalies with greater precision.The error rate for abnormal frames is high, so the regularity score is lower than for normal frames.

CONCLUSION
In this paper, we combined GAN with an adversarial auto encoder to improve reconstruction performance.According to the results it can be observed that the applicability of hybrid model is confirmed qualitatively.Because the model is intended to detect anomalies, we devised an anomaly score function that combines the distance between attention features and the reconstruction error between the original and reconstructed images.This score function gives better results for image-based anomaly detection and outperformed other such models.The performance of the proposed model is considerably good with AUC score of 0.872 for MNIST dataset.We have extended the same model for video-based anomaly detection using UCSD dataset.Recoded results shows that the proposed model can detect anomaly in videos with AUC score of 0.75 and EER score of 0.25.In summary the applicability of hybrid model in anomaly detection was experimentally proven and can be further explored for better results.Here following major contributions were accomplished in this study first, proposed model for anomaly detection was successfully demonstrated.Second, Unsupervised model could be useful even when labeled data is not available and proposed model gives satisfactory results for both image and videos.

Figure 3 .Figure 4 .
Figure 3.Comparison of reconstruction and KL Div loss for normal and abnormal classes

Table 2
Comparison of AUC Score to detect each MNIST class as abnormal

Table 3
AUC score for each MNIST fashion data

Table 4 .
AUC score for UCSD dataset anomaly detection