IAES International Journal of Artificial Intelligence (IJ-AI)

Mohammed Razzok, Abdelmajid Badri, Ilham EL Mourabit, Yassine Ruichek, Aïcha Sahel Laboratory of Electronics, Energy, Automation, and Information Processing, Faculty of Sciences and Techniques Mohammedia, University Hassan II Casablanca, Mohammedia, Morocco Laboratory Connaissances et Intelligence Artificielle Distribuées (CIAD), University of Technology of Belfort-Montbéliard (UTBM), Belfort, France


INTRODUCTION
Road incidents are one of the most common causes of death worldwide today.As a result, pedestrian detection algorithms, which find all pedestrians in an image, have gained popularity in computer vision and artificial intelligence communities.Common pedestrian detection systems (PDS) are built for bright weather [1]- [9].However, an applicable PDS is also required to perform well in rainy or snowy conditions.
Rain is one of the commonest dynamic weather phenomena.Images taken in rainy conditions frequently suffer from local degradations [10], [11] such as low visibility and distortion, which directly impair visual perception quality and make them unfit for sharing and use.Furthermore, rainwater-induced artifacts may drastically impact the performance of numerous machine vision solutions, such as smart driving and video monitoring systems.
Machine learning (ML) field focuses on the development of computer algorithms, which exploit data to learn patterns, make predictions, and increase their performance over time by more data.Lately, taking advantage of the invention of convolutional neural networks [12]- [15], particularly the establishment of the pix2pix [16] network architecture and the adversarial training strategy, the performance of single image deraining has experienced notable progress.By training a rainy-to-clean image translation model with synthetic rain streak or raindrop datasets, a rainy image can be effectively repaired by eliminating the artifacts despite the presence of rain streaks or raindrops with different scales, forms, and thicknesses.In this paper, we investigate the impact of images taken in rainy conditions on pedestrian classification tasks using mAP measurement.In addition, we assess the effectiveness of our proposed PDS based on Pix2Pix and you only look once (YOLO) v3, in comparison to others models based on noise-removing masks.The rest of this article is organized: Section 2 describes in detail the algorithms that will be employed.Section 3 introduces our proposed pedestrian detection system for combating adversarial weather attacks.In the next section, we present and discuss the results of our research.The last section addresses the paper's conclusion.

METHODS AND TOOLS 2.1. YOLO v3 algorithm
YOLO [17] is an open source object detection and classification algorithm based on convolutional neural networks (CNN).It is able to predict which objects are present in an image and their positions at first glance.The primary benefit of this approach is that the whole image is evaluated by a singular neural network.The network can process images in real-time at 45 frames a second (FPS) using Nvidia Titan X, and a simplified version fast YOLO can reach 155 FPS with better results compared than most real-time detectors.
YOLO starts detecting objects by dividing the input image into SxS gray, and each grid predicts C class probabilities, B bounding locations, and confidence scores.Each boundary box includes 5 variables: x, y, w, h, and a box confidence score.The confidence score represents how likely the bounding box includes an object and how precise the boundary box is.x and y are offsets to the corresponding cell.The bounding box width w and height h are normalized by the width and height of the image.Each cell has C conditional class probabilities.The final output of the YOLO has a shape of (S, S, B×5 + C).The structure of the YOLO v3 algorithm is presented in Figure 1.

Average filter
The average filter [18] operates by passing through the image pixel by pixel.At each location, the core element is replaced with the mean of the whole pixel values under the kernel region.The 3 by 3 and 5 by 5 filters are shown in (1) and ( 2) respectively.

Gaussian filter
The Gaussian filter [19] is a linear filter commonly used in image processing for blurring and removing details and noises from images.It has a different kernel that reflects the form of the Gaussian (bellshaped) hump.An image I filtered by Gaussian convolution is given by (3), where σ is the standard deviation Int J Artif Intell ISSN: 2252-8938  Pedestrian detection under weather conditions using conditional generative … (Mohammed Razzok) 1559 of the distribution, p denotes the central pixel of the kernel, q represents the positions of its neighbors, and Gσ denotes the 2D Gaussian kernel (4).

𝐺𝐶[𝐼
The operation of the Gaussian convolution is not affected by the image content.The influence of one pixel on another in an image is defined only by their distance in the image, not by the image values themselves.The gaussian filters (kernel 3×3, σ=0.8) and (kernel 5×5, σ=1.1) are depicted in ( 5) and ( 6), respectively.

Median filter
The median filter [20] is a nonlinear filter that replaces the central pixel with the median of the pixels under the kernel area.The central element is always replaced by one of the pixels under kernel area.This is not the case with average and gaussian filtering.As a result, the median filter is less vulnerable to intense values (called outliers) than the average filter.

Bilateral filter
Bilateral Filter [21] is a technique for smoothing images and reducing noise without blurring large, sharp edges.It has the same definition as Gaussian convolution.On the other hand, it considers the value differences between neighbors.It's abbreviated as BF [,] and has the following definition, Where normalization factor Wp ensures pixel weights sum to 1.0: Parameters σs and σr will specify the amount of filtering for the image. σ  is a spatial Gaussian weighting that decreases the influence of distant pixels, and  σ  is a range Gaussian that decreases the influence of pixels q when their intensity values differ from Ip. Bilateral filters are generated by bilateralFilter(src, d, sigmaColor, sigmaSpace) function [22], this function accepts the following parameters, − src: the source image.− d: the diameter of the pixel neighborhood.− sigmaColor: the filter sigma in the color space.− sigmaSpace: the filter sigma in the coordinate space.

Non local means filter
Non-local means method [23] fills the pixel's value with an average of the values of a distribution of other pixels: small blocks centered on other pixels are compared to the block centered on the pixel of interest, and the average is only conducted for pixels with blocks that are similar to the current block.As a consequence, this approach is capable of restoring textures that were previously blurred by other noising algorithms.Non Local Means filters are generated by fastNlMeansDenoisingColored(src, h, hColor, templateWindowSize, searchWindowSize) function [22], this function accepts the following parameters, − h: A parameter that controls the filter strength for the luminance component.A larger h value removes all noise but also all image details; a smaller h value conserves details but also some noise.− hColor: Identical to h, but it's for color images.The template block's size in pixels is used to calculate weights.− searchWindowSize: The window size in pixels used to calculate the weighted mean for a particular pixel.

Adversarial network
Generative adversarial networks (GAN) [16] are a class of machine learning frameworks designed by Ian Goodfellow and al. GAN is composed of two parts: i) the generator learns to build real data.For the discriminator, the created instances serve as negative training examples, ii) the discriminator learns to make a distinction between fake and real data generated by the generator.When the generator makes implausible results, it is punished by the discriminator.
When training starts, the generator produces fake data, which the discriminator quickly recognizes.As training advances, the generator comes closer to generating output that fools the discriminator as training progresses.Finally, if generator training is efficient, the discriminator becomes less capable of distinguishing between real and fabricated images.It begins to mistakenly classify fake data as real, and its accuracy decreases as a result.
Both models are based on neural networks.The discriminator input is connected to the generator output directly.The discriminator's classification is used by the generator to update its weights via backpropagation.As a result, both models have been trained concurrently in an adversarial process in which the generator tries to trick the discriminator while the discriminator attempts to spot the fake pictures.The GAN framework is presented in Figure 2.

PROPOSED METHOD
To eliminate the rainy effect from images, we attempt to directly turn a rainy image into an unrainy image pixel by pixel directly.This study was inspired by the effectiveness of Pix2Pix GANs in translating one image into another.The Pix2Pix technique uses a conditional GAN (cGAN), in which the output picture is produced in response to an input, in our particular scenario a source image.
The discriminator is a type of image classification model that uses a Deep CNN.Specifically, it performs conditional-image classification by taking both the source image (e.g. a rainy image) and the target image (e.g. an unrainy image) as input, and then predicts the probability of whether the target image is real or a fake version of the source image.The PatchGAN model, which is based on the efficiency of the model's receptive field, is used to define the relationship between one of the model's outputs and the number of pixels in the input image.This model is designed so that each output prediction maps to a 70×70 block of the input image.The advantage of using this model is that it can handle images of different sizes, such as those larger or smaller than 256×256 pixels.During training, the model generates a patch of predictions by concatenating two input images.To optimize the model, it uses log loss and applies a weighting factor of 0.5 to updates, which is a technique recommended by the Pix2Pix authors.This weighting slows down changes to the discriminator model compared to the generator model, which helps improve the overall training process.The flowchart of our proposed discriminator is presented in Figure 3.
In comparison to the discriminator, the generator is more complicated.The generator employs a U-Net architecture as an encoder-decoder model.It generates a target image (unrainy image) from a source image (rainy image).To achieve this, the input image is first downscaled or encoded to a bottleneck layer, and then the condensed representation is upscaled or decoded to the output image size.Figures 4 to 6  The discriminator model is trained particularly on both real and fake images, whereas the generator model is not.Furthermore, it is trained using the discriminator and updated to reduce the discriminator's predicted loss for "real" generated images.In this way, it is encouraged to produce more realistic images.The weights are reemployed in this composite model, but they are identified as untrainable because the discriminator is updated separately.The composite model is updated with two targets: one confirming that the produced images were authentic (cross-entropy loss), which forces the generator to make large weight updates to produce more realistic images, and the actual real translation of the image, which is compared to the generator model's output (L1 loss).
GAN models hardly ever converge.Instead, a balance is established between the generator and discriminator models.As a result, it is difficult to decide when to stop training.During training, we can save the model regularly and use it to generate sample image-to-image translations.For example, after 10 training epochs, we examine the generated images and choose a final model based on the image quality.In this study, we proposed a pedestrian detection system capable of detecting pedestrians in two scenarios rainy and un-rainy as presented in Figure 8.The first step of our system is to determine whether it is raining on the input images or not using our proposed rainy detector based on deep convolutional neural networks as described in Figure 9.If the image it's rainy we use our generator from the generative adversarial network to transform the rainy image to un-rainy image, if not it passes directly to YOLO v3 to detect the

EXPERIMENTAL RESULTS & DISCUSSION
For the purpose of building our GAN model, we downsized the VOC2014 dataset [24] to 256×256 and divided it into two folders, one for training and one for testing.The train folder contains 1,000 images from number 000001 to 001979, whereas the testing dataset contains 3,952 images from number 001983 to 009963.Each image includes a pair of rainy images on the left and unrainy image on the right.Rainy images are generated using add_rain function from Automold source code (add_rain (clean, slant=-20, drop_length=20, drop_width=1, rain_type='heavy')).Additionally, we prepared our proposed Rainy detector CNN using 600 images (from 000001 to 001190) for training and 400 images (from 001193 to 001979) for validation.
In our work we: − Load and prepare the rainy-affected images from the original image dataset.

−
Develop a Rainy Detector model to determine whether or not it is raining on the input images.The average precision metric (AP), which measures the region below the precision-recall graph, was used to evaluate our models.It is a widely used metric for evaluating the accuracy of object detectors.We determined the average precision in this work using the Cartucho source code [25].The results presented in Table 1 show that the average, Gaussian, median, bilateral, and local means filters do not help the images with raindrops, and instead make the detection result worse.On the other hand, our proposed system succeeded in restoring rainy images to un-rainy images and achieved better pedestrian detection performance.The only limitation of our work is that our system can achieve about 10 FPS for non rainy images and 6 FPS for rainy images due to the hardware and software limitations.

CONCLUSION
Advanced driving assistance systems are becoming more advanced with in-vehicle infrastructures.However, on rainy days, the detection rate remains poor.Rain streaks accumulate and obstruct the camera's view.In addition, most pedestrians wear raincoats or hold umbrellas on rainy days, resulting in a high number of occlusions.Due to the difficulty in detecting pedestrians in the rain, this study proposed a new PDS that includes a de-raining subsystem to detect pedestrians in both rainy and non-rainy conditions.Our proposed PDS outperforms both the existing YOLOv3 method and the traditional basic noises removable algorithms.Developing a neural network that excels in one area but fails in others is not a viable strategy for self-driving vehicles.Our long-term goal is to develop deep-learning architectures and solutions that can detect objects in a variety of environments.We are also interested in using thermal imaging cameras to detect pedestrians because of their ability to see in complete darkness, light fog, light rain, and snow.

Figure 4 .Figure 5 .
Figure 4.The flowchart of the encoder model

Int
Pedestrian detection under weather conditions using conditional generative … (Mohammed Razzok) 1563 positions of pedestrians in the image.Moreover, to demonstrate the effectiveness of our proposed PDS model against weather degradation, we compared it with multiple PDS models based on removable noise filters presented in Figures 10 to 14.

−
ISSN: 2252-8938 Int J Artif Intell, Vol. 12, No. 4, December 2023: 1557-1568 1566 Build a Pix2Pix model to transform rainy images to un-rainy images.− Use the final Pix2Pix generator model to transform rainy images to un-rainy images.− Use the pre-trained YOLO v3 model to detect pedestrians in images.

Table 1 .
Performance of proposed pedestrian detection systems