Photo-realistic photo synthesis using improved conditional generative adversarial networks

ABSTRACT


INTRODUCTION
Converting sketch photographs to photo-realistic visuals in a automated manner finds enormous applications in electronic entertainment, arts, security forces, and several domains.A face photograph with poor identification and one that was drawn by police or a drawing expert from an eyewitness's description are two very different things in the realm of law enforcement.Producing a genuine color picture by hand requires an experienced team with drawing and painting expertise in addition to solid investigative abilities.In several application sectors, face recognition is a well-studied topic.Nonetheless, matching drawings to digital facial photos is a crucial law enforcement application that has garnered comparatively little attention.Based on the memories of an eyewitness and the skill of a sketch artist, forensic drawings are made.Due to the limited or approximate description supplied by the eyewitness, forensic drawings include several inaccuracies.Typically, Int J Artif Intell ISSN: 2252-8938  Photo-realistic photo synthesis using improved … (Raghavendra Shetty Mandara Kirimanjeshwara) 517 forensic drawings are compared manually with a database of digitised face photos of identified people.Existing cutting-edge face recognition algorithms cannot be used directly and need extra processing to account for the non-linear fluctuations found in face sketches and digital face photos.A technology that automatically matches a drawing to a digital facial picture may aid law enforcement organisations and make the identification procedure efficient and reasonably quick.It takes the skill of a sketch artist and the witness's description for a forensic drawing to be created.As seen in Figure 1, forensic drawings involve many flaws due to incomplete or imprecise narrative supplied by the eyewitness.Due to the non-linear fluctuations in both drawings and digital face photos, current state-of-the-art face recognition techniques cannot be employed directly and need further processing.The identification process may be sped up and made more accurate by using an automated sketch to digital facial picture matching system, which can be a great help to law enforcement.
Figure 1.Forensic art that exaggerates face features Automatic synthesis of realistic face portraits from sketch photos using generative adversarial network (GAN) [1] may increase the likelihood of recognition and boost security agencies case-solving efficiency.Because of this, there has been a lot of work done in the field of artificial intelligence (AI) to figure out how to turn drawings into portraits [2], [3].One common technique for making images is the use of convolutional neural networks (CNNs) for transferring artistic effects from one image to another.However, in the picture generation using CNNs, both the source image and the destination image must be given.To generate a photorealistic picture, one may use standard graphics theory to recreate the curves, skin tone, and lighting of a realistic one [4]- [6].In practise, typical graphics algorithms work well, but it's costly and tedious to construct and modify the visual context, and each detail of the region needs to be defined precisely.Pix2pix is capable to produce portraiture in the use of trained models, but the significant variety of test results indicate that it is nevertheless far from perfect; the images' contours are sometimes ambiguous, and some results missing intricate details and image texturing.In this research, we implement fixes to these problems into the pix2pix generating model.Edge information is obtained, the image's contour is modified, and portraiture generations converging is limited by new model.Our experiments show that the modified version of pix2pix effectively fixes the edge blurring problem in face image synthesizing, and also serves as a standard for similar imagegenerating applications.
Several researchers have used adversarial learning to transform one image into another.Training data consists of pairs of input and output pictures; the input photographs are converted into the desired target images by following the examples provided.Current interesting advances in picture production may be attributed to the fast growth of deep learning, particularly the introduction of generative adversarial networks (GAN) [7].The purpose of this research is to refine the model and derive edge information from the dataset itself.The model's image-generation capabilities will be constrained by the necessity for high similarity between test and training imagery, and usage of a boundary map will further complicate the already challenging task of generating a dataset.The super-resolution technique is the gold standard in computer vision [8], [9].
For the study of dynamic facial expression change, the author [10] used GANs to create static facial expression photos from a natural (expressionless) image.Experimental testing findings indicate that the method yields the superior picture of facial emotion.For verification purposes, the discriminator receives a composite of the image produced by the generator network and the edge produced by the edge network that comes next.Experiments show [11] that the proposed approach can more successfully generate colour portraiture from drawings than preexisting techniques, and that the photos it generates have a more distinct and convincing edge than those generated by a pix2pix model.The average value of the structural similarity index measure (SSIM) is 82.78% when using the recommended approach, whereas the values are 42.99% and 78.60% when using pix2pix and alternative techniques, respectively.The author investigates [12] picture creation led by hand drawing in this study.Due to the strict requirements imposed by the image-to-image translation procedure, when the input sketch is poorly drawn, the output follows the input edges.Instead, we suggest using sketch as a weak constraint, in which the output edges are not required to match the input edges.We solve this issue with a unique method of cooperative picture completion in which the sketch offers the visual context for image In this paper, the author [13] demonstrated a unique generative adversarial network (GAN) technique that generates realistic pictures from 50 categories, including motorbikes, horses, and sofas.A completely automated data augmentation approach for drawings demonstrates that the supplemented data is beneficial to our purpose.A novel network building block that is suitable for both the generator and the discriminator by injecting the input picture at different scales was proposed.The proposed method creates more realistic visuals and obtains much higher Inception scores when analysed against state-of-the-art image translation techniques.Using convolutional neural network (CNN)-based feature extraction from the Modified National Institute of Standards and Technology (MINST) dataset and algebraic fusion of several classifiers trained on multiple varied feature sets (obtained via feature selection applied to the CNN-extracted feature set), author [14] described a system capable of recognising a wide range of images.The author designed [15] a neural algorithm which can discretise and associate the visual information as well as artistic style of natural photos.By combining the subject matter of any given photograph with the aesthetics of other well-known works of art, the algorithm enables the generation of new images of high thoughtful quality.The results provide new light on the deep picture representations that convolutional neural networks learn and show how these networks may be used for sophisticated image creation and manipulation.
Normalized direction-preserving Adam (ND-Adam) is a method proposed in [16] that improves generalisation performance by enabling finer granularity over both the direction and step size of weight vector updates.Following similar reasoning, we increase the generalisation performance of classification problems further by regularising the softmax logits.Not only do researchers hope to close the gap between stochastic gradient descent (SGD) and Adam, but also to provide light on the topic of why some optimization methods are more broadly applicable than others.
Pix2pix, a specialised GANs model, defines the image translation issue as the mapping connection amongst the input and output pixels.While a convolutional classifier called "PatchGANs" is used in the discriminator, an Unsampled-Network (U-Net) structure is used in the generator.Pix2pix gives a universal solution to the picture conversion issue, unlike CNNs and other GANs.Extensive conditional training is employed to automatically learn the loss function of the image translation problem, which is then utilised to restrict the possible directions of image translation and convergence.Whenever translating a sketch into a photo-realistic image or altering the style of actual images, for example, it is common for fine details and realistic textures to be lost in the action of image transformations because the image structures of the semantic picture and target image are so drastically dissimilar [17], [18].To combat the problem of edge blur in the converted photos, Wang manually inserted edge information to every label image throughout model training.
Subspace learning, sparse representation, Bayesian inference, and deep-learning-based techniques are main four types of existing face-sketch synthesis approaches; the first three fall under the data-driven category, while the final kind is model-driven [19].The deep-learning-based technique emphasises a model-driven technique in which the mapping function is learned beforehand and utilised to implement the transformations.The various past works and their approaches are summarised in Table 1.

Generative Adversarial Networks, SketchyGAN
Semantic Accuracy, Fooling Rate, Inception Score [5], [11] High-Resolution Image Synthesis Conditional GAN Pixel accuracy, mean intersection-overunion, peak signal-to-noise ratio (PSNR) and SSIM [6] Image-to-Image Translation Cycle-Consistent Adversarial Networks FCN score [12] Face sketch-photo synthesis Contextual GAN SSIM, Verification Accuracy [17] Sketch-Photo Synthesis Sparse Representation Verification Accuracy [18] Face According to some sketch artists, making a drawing is an unknown psychological phenomenon, but a sketch artist usually focuses on face characteristics and texture, which he/she attempts to integrate in the sketch via a combination of soft and noticeable edges.Thus, local descriptors can effectively express facial patterns in drawings and digital face photographs, which inspired the proposed technique.This study uses the improved pix2pix cGAN model to match drawings with digital facial pictures automatically.This study introduces a preprocessing method to improve forensic sketch-digital picture pairings.Pre-processing forensic sketching boosts performance by 4-5%.In this research we have used 3 different datasets: The Chinese University of Hong Kong (CUHK) and Indraprastha Institute of Information Technology Delhi's (IIIT-D's) students face sketch and corresponding digital images, available online [23], [24] and Self-generated datasets of students at Canara Engineering College, Mangalore.

METHOD
While examining the current state of sketch-photo synthesis methods, we discovered a potential drawback that might hinder the face image retrieval procedure.The number of nearest neighbours is fixed, hence the pseudo images produced using these approaches have poor resolution as can be seen [25].Specifically, we investigate GANs in a conditional context using this method.In the same way that generative adversarial networks (GANs) learn a generative model from data, conditional GANs (cGANs) learn a conditional generative model [26].The ability to condition on an input images and produce an output image makes cGANs useful for "image-to-image translation" applications.

Pix2pix cGAN architecture
Pix2pix is a kind of conditional GAN (cGANs), production of the target picture is dependent on the source imagery data.Generator and discriminator comprise the network.The generator generates the picture from the input.The discriminator compares the supplied picture to an unknown image and guesses whether it was generated.The loss that is anticipated by the discriminator for the produced images is minimised by updating the generator.To prevent overfitting, the generator is only indirectly directed by the loss functions during training, and it is never directly presented the training dataset.To counter this, a dropout layer is deployed in both the training and testing phases, and also provides a random sample for the generator [7], [27].It is possible to modify the discriminator model directly, but the generator model can only indirectly modify.Towards this end, a novel composite model might be crafted in which the discriminator model incorporates the output of the generator model as a required input.Stacking the generator model above the discriminator creates a hybrid architecture.The sharpness loss term was suggested for use in Sharp-GAN to produce nuclei with distinct borders [7].It established harsh penalties for contour pixels that showed little variation from their surrounding counterparts.GANs' hazyboundary problem was fixed by the sharpness reduction.Total projected loss is defined as, Wherein L1 and L2 are, when matched up with a discriminator D that seeks to maximise loss, as in ( 1)-( 3), the generator G seeks to reduce loss L. The generator is revised to reduce the L1 loss, commonly known as the "Mean absolute error", amongst the output and input.A weighted sum of the adversarial loss obtained out of discriminator's output and L1 losses are used to update the generator in this manner.Figure 2 illustrates the procedural diagram of pix2pix model for sketch to image synthesis.

Pix2pix cGAN training
All available instances of each class's images were used to train two sets of identical pix2pix cGAN models.Each class's pre-processed images were uploaded as random pairs for the source and target images.Image pairs uploaded are scaled such that value of the pixel range from -1 to +1 rather than 0 to 255.GAN models usually reach an equilibrium between the generator and discriminator.Thus, stopping training is difficult to determine.Hence, we routinely stored the model including its weights throughout training cycles to obtain sample images for quality evaluation.Model weights are generated using a random Gaussian distribution with a mean of 0.01 and a standard deviation of 0.02.Discriminator loss is weighted by 50% for every model update to slow down discriminator training, which is faster than generator training.The suggested approach is pictorially shown in Figure 3.

Database used
Since collecting face drawings is difficult, there are few datasets of human-drawn sketches and face photographs.To evaluate our method against those already in use, we use 88 subjects from the CUHK student dataset as a training set and the remaining 518 subjects as a testing set, which consists of 123 images from the augmented reality (AR) dataset, 295 images from the XM2VTS dataset, and the remaining 100 images from the CUHK student dataset.We also employ a mixture of the IIITD dataset and data we generate in-house, in addition to the CUHK dataset.There are 238 sketch-digital picture pairings in the IIITD database.Digital photographs from various sources are used to create these drawings.It has 72 sketch-digital picture pairings from the IIIT-D student and staff database, 99 from the labeled faces in the wild (LFW) database, and 67 from the face and gesture recognition network (FG-NET) ageing database.Canara College of Engineering, Mangalore student face sketch photos, together with their equivalent digital photographs captured in a variety of lighting situations, make up the custom dataset.

RESULTS AND DISCUSSION
In order to design and implement the proposed model, we utilized the "TensorFlow", a open-source framework and the Python programming language.The experiment was executed on a Windows 11 OS using Since the PSNR index, which has its own set of limitations, is not sufficient for describing the quality of the generated photo and its visual characteristics [28], the SSIM index is used for further comparison.Where x and y are the mean values of the true and created false images, respectively.x' and y' show the difference between the authentic and false images, whereas xy shows their covariance.Relative performance comparison is shown in Table 2.
Different generators are compared with all other components fixed.In particular, we evaluate our generator in contrast to the state-of-the-art U-Net and customer reference number (CRN) generator designs as shown in Table 3.Both semantic segmentation scores and findings from human perceptual studies are considered in our performance evaluation.Figure 4 4, early in the run, all three losses exhibit considerable randomness before levelling out between epochs 175 and 200.After that point, losses are stable, albeit their variability grows.Figure 5 gives visual output of sketch to image synthesis.

CONCLUSION
Using the pix2pix generative model, we investigated the issue of photo-sketch synthesis.The suggested approach was designed to aid GANs produce high-resolution photorealistic pictures from drawings.Three datasets are used for analyses, and the outcomes are compared to those generated by the most cuttingedge generative methods available.It has been shown without any reasonable doubt that the suggested strategy significantly enhances visual quality.Extensive paired data experiments prove that our strategy outperforms the alternatives we investigated.Our suggested method has been shown to produce a pixel accuracy of 82.7% in experimental settings.Furthermore, it is suggested to hyper parameter optimization to see whether we can further improve performance by selecting the best possible feature subsets.


ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 516-523 518 completion or generation.Using joint pictures, we train a contextually generated adversarial network (GAN) to learn the joint distribution of a drawing and its associated image.

Figure 2 .Figure 3 .
Figure 2. Procedural diagram of pix2pix model for image synthesis (a) shows the training loss curve for D_fake, Figure 4(b) showns training loss for D_real and Figure 4(c) shows training loss for G_Gan model training of 200 epoch.As seen in Figure

Figure 4 .Figure 5 .
Figure 4. Training loss curve of the proposed model; (a) D_fake training loss, (b) D_real training loss, and (c) G_GAN training loss (6) Intel i7 12700k 12-Core CPU, GPU of 4GB.200 epochs of experimentation are used.Used batch size is 64 and learning rate is 0.0001.Adam optimizer is used.To characterize the imagery color rendering quality of different models more objectively, we employ two primary indices: the peak signal-to-noise ratio (PSNR) (4) and the structural similarity index measure (SSIM) in(6).

Table 2 .
PSNR and SSIM comparison with related work

Table 3 .
Generator design comparison