Dataset for classification of computer graphic images and photographic images

ABSTRACT


INTRODUCTION
Rapid advancement in computer graphic (CG) image rendering techniques give birth to applications such as animations, cartoons, gaming, photo-realism, virtual reality and many more [1].Sophisticated CG software tools allow the user to produce synthetic images which are close to the reality and are difficult to identify whether an image is a camera captured or computer generated [2], [3].If such images are used illegally in the court-of-law, journalism, criminal investigation, and political propaganda, then it may cause serious threat to the society [4].In such cases, verifying the authenticity of images is a big challenge in digital image forensics.
To classify photo-realistic computer graphics (PRCG) from photographic images (PG), three benchmark datasets as shown in Table 1 are used in the literature: i) Columbia Photographic Images and Photorealistic Computer Graphics Dataset-Columbia Dataset; ii) DSTok dataset; and iii) A dataset created by Rahmouni et al., to evaluate the performance of classification models.The DSTok dataset is the largest dataset in the literature with 9,700 samples.The aforementioned datasets are lacking with diversified image contents, sample size and more importantly the images in these datasets were produced/captured using older versions of CG software's/camera models.Advancement in digital image rendering techniques have made it easy for the users to capture high quality images with regard to photographs and to produce photograph-like images which cannot be compared with the graphical contents produced using older versions of CG image rendering techniques.Hence, it is needed to upgrade datasets as well to evaluate updated innovations in the field of CG and PG image classification.In this paper, we propose two new datasets, namely "JSSSTU CG and PG image dataset"and "JSSSTU PRCG image dataset".Initially the dataset is created with the intention of having diversified image contents with respect to CG and PG image categories and comprises 14,000 samples.Later dataset is created with the intention of having only photo-realistic computer graphics which are hard to distinguish with naked eyes that consist of 2,000 samples.Our new datasets would become very challenging and will be helpful for the researchers to develop efficient or improved classification models to produce better results who are working on the cutting-edge research problem: "classification of computer graphic images and photographic images".Researchers have addressed this problem in different perspectives based on conventional machine learning and deep learning approaches.

a. Conventional machine learning
Significant improvement has been made in recent years to classify CG and PG images.Existing conventional machine learning techniques can be grouped into three categories based on the features selected for classification.They are: i) camera-characteristic based approaches [4], [8]- [12]; ii) spatial feature based approaches [13]- [20], and iii) geometric feature based approaches [21]- [28].

−
Camera-characteristic based approaches Techniques used to generate CG and PG images, undergo different pipeline architectures.Since PG images are acquired using digital cameras, they must exhibit distinct intrinsic properties which are not present in CG images.Based on this fact, some identification approaches have been described in [8]- [10].Dehnie et al. [8] employed pattern noise caused due to the defect in camera sensors for classification of CG and PG images.Dirik et al. [9] proposed the features to detect the traces of color filter array (CFA) and chromatic aberration to distinguish CG and PG images.Khanna et al. [10] described a method based on residual pattern noise to distinguish scanner, CG and PG images.Photo response non uniformity (PRNU) noise is used as a digital fingerprint to identify the source camera in digital forensics and this is exploited in [4], [11], [12].Peng et al. [11] proposed a method based on the theory of multifractal spectrum and features of PRNU, multifractal spectrum features of PRNU are extracted from an image to distinguish PRCG and PG images.Peng and Zhou [4] examine the changes in PRNU correlations and histogram features extracted from variance histograms of PRNU are used for identification of CG and PG images.Long et al. [12] proposed a method based on binary measures computed from PRNU noise in RGB channels to depict the differences between CG and PG images.− Spatial feature based approaches Pan et al. [13] show that the perceptual difference between CG and PG images is generally present in color and coarseness.Former is represented using fractal dimensions and the latter is described using generalised dimensions.Wu et al. [14] compute the difference in histogram of images and some higher histogram bins are considered as features to perform classification.Local binary pattern (LBP) [15] is a texture descriptor majorly used in image texture analysis and this is employed in [16], [17] to classify CG and PG images.Peng et al. [18] proposed a method based on statistical and textural features.Tan et al. [19] presented a novel scheme using local ternary count (LTC) which produces 54 dimensions of features from normalized histograms.Peng et al. [20] proposed a hybrid feature by analysing the differences in textures of residuals of CG and PG images.[22] built an alpha-stable model to describe wavelet decomposition coefficients of PG images.Wavelet domain is used to extract fractional lower order moments in images.Zhang and Wang [23] found that imaging features and visual features for images produced using different image acquisition processes reveal different statistical regularities in the wavelet domain.Based on this principle, statistical features and cross correlation of wavelet coefficients are used as features extracted from each sub-bands.Guo and Wang [24] presented a method based on multiwavelets which extracts the features in wavelet subbands.Fan et al. [25] proposed a scheme based on modified image contour transform in HSV color space to classify CG and PG images.Statistics such as average value, variance, skewness and kurtosis are computed in the wavelet domain.Birajdar and Mankar [26] used discrete wavelet transform to extract binary statistical image features by decomposing an image into subbands.Then the fuzzy entropy measure is employed to select relevant features.Quaternion wavelet transform is presented in [27] and [28], which extracts statistical features to classify CG and PG.

b. Deep learning approaches
Rahmouni et al. [7] proposed a novel scheme which combines statistical feature extraction to a convolutional neural network (CNN) architecture, then class label of the entire image is predicted by using weighted voting scheme which aggregates the local estimates of the class probabilities.Nguyen et al. [29] customized VGG-19 architecture to extract the generic features in the first three convolutional layers, and then statistical pooling layer is constructed as proposed in [7].Pre-trained CNN models are employed in [30]- [33], and fine-tuned through transfer learning for binary classification.Chawla et al. [34] proposed five layers CNN architecture by introducing a special layer which takes some prediction error filters onto the first convolutional layer to ensure the correlation between pixels in PG and CG images.To predict the outcome of the original picture, two methods are used namely, weighted voting scheme and majority voting scheme.Former is used to label the image by aggregating the class probabilities and latter is used where the label is considered that appear in the majority of the image patches.Yao et al. [35] employed three sorts of high-pass filters to extract sensor noise residuals then piped into the proposed five layers CNN framework.Quan et al. [36] proposed a new CNN framework with two CNN cascaded convolutional layers at the end of the network.He et al. [37] described a novel deep learning approach by combining CNN and recurrent neural network (RNN).Thereafter, He et al. [38] proposed an attention-based dual-branch CNN to extract the features from combined color components.Meena and Tyagi [39] proposed an ensemble model by combining the features produced by VGG-19 pre-trained CNN and noise features produced using high-pass filters to discriminate CG and PG images.
From the above study, even though much progress has been made for the classification of CG and PG images, existing techniques and datasets used to evaluate the performance is still have the following limitations: i) in the existing datasets, sample size and image contents are limited and do not make compelling high quality image content due to advancement in image rendering techniques; (ii) in prior works, accuracy of the classification model depends on choice of the feature descriptor used for classification.
In the task of distinguishing CG and PG images, the contributions of the paper are outlined as: − Due to non-availability of large, heterogeneous dataset containing CG and PG images, 'JSSSTU CG and PG image dataset' is created.'JSSSTU PRCG image dataset' is created which exhibits high photorealism.

−
Effectiveness of the existing texture based feature descriptors and CNN based deep learning techniques are investigated on our new datasets and benchmark datasets.
Remainder of this paper is organised as follows: section 2 presents a description of the state-of-theart techniques based on conventional machine learning and deep learning.Section 3 demonstrates performance of the techniques on our new datasets and benchmark datasets through experimental results.Conclusion is given in section 4.

RESEARCH METHOD
In this section, state-of-the-art conventional machine learning and CNN based deep learning techniques are developed for the task of classifying CG and PG images.Handcrafted textural features are considered for conventional machine learning and VGG variants CNN based pre-trained models, are used for deep learning.They are described in the following sections.

Conventional machine learning techniques
Texture describes surface characteristics of an image.Most widely used texture descriptors such as gray level co-occurrence matrix (GLCM) [40] and LBP [15] are employed to analyse the surface texture of CG and PG image.Texture surface of CG image appear smoother than those of PG image, which exhibits the basic differences between them.Hence, the aforementioned texture descriptors are used in our work.

GLCM descriptor
GLCM is a statistical method which computes the occurrence of pairs of pixels or gray levels in a particular orientation over all in an image or image region.This is represented by using parameters (Θ, d) where 'Θ' represents orientation and 'd' is the distance between two picture elements.The GLCM descriptor allows rotational invariance and it is defined by 8 orientations separated by /4 radians.Haralick et al. [40] defined 14 statistical properties computed from the normalized GLCM matrix.In this work, we employed four properties such as contrast, correlation, energy and homogeneity.− Contrast: contrast computes local intensity variation between a picture element and to its neighbor for the entire image as given in (2).The range is calculated using the (1).
Where, GLCM represents normalized matrix.Imn and variables m and n from ( 2)-( 5) represent (m, n) th entry and a value at (m, n) in a normalized GLCM.
− Correlation: correlation computes correlation of a picture element to its neighbor over the entire image.
It returns a value between -1 and 1for a positively or negatively correlated image.Otherwise, return unrepresentative value for a constant image as given in (3).Where, 'µ' and 'σ′ indicates mean and standard deviation of the marginal distributions associated with Imn/R, and R is a normalized constant.
− Energy: energy is also termed as angular second moment which computes the sum of squared elements.It returns a value between 0 and 1, otherwise, returns 1 for a constant image as given in (4).
− Homogeneity: Homogeneity computes the closeness of elements diagonally in GLCM.It returns 1 for diagonal elements, otherwise, returns a value between 0 and 1 as given in (5).

LBP descriptor
LBP is a texture descriptor operator proposed by Ojala [15], which encodes each pixel value of an image by comparing its neighborhood pixels with the center pixel value.If the intensity of neighboring pixel is greater than or equal to the intensity of center pixel mark the neighboring pixel as 1, otherwise, mark as 0 which result in a binary sequence.Then, a bit vector is converted into decimal number and is replaced with center pixel value.LBP descriptor of every pixel in an image is computed using (6) and f(s) is given in.(7).Where, In and Ic indicate intensity of neighboring and current pixel respectively.N represents the number of neighbors chosen at a radius of R. In this work, we choose P=8 neighbors with a radius R=1.

Support vector machines (SVM) classifier
SVM [41] are the most widely used and effective supervised machine learning algorithm for classification problems.It can be used to perform linear and non-linear classification.In this work, we perform non-linear classification, when the feature vectors could not be separated linearly.Radial basis function (RBF) kernel is chosen in the experimentation and it is described in (8).Where, ||u1-u2|| in (8) represent euclidean distance between two feature vector points u1 and u2 and '' represent variance.

CNN based deep learning techniques
Among other deep neural networks, CNN based deep learning techniques have shown its effectiveness by obtaining general features to specific features automatically based on the image content.CNN based pre-trained neural network models such as AlexNet [42], VGG (VGG16 and VGG19) [43], GoogLeNet [44], and ResNet [45], have shown great performance in classifying images into 1000 object categories such as pencil, keyboard, mouse, etc., and have become standard models for classification tasks.These models are trained on millions of images and have learnt rich feature representations for a wide range of PG images in the ImageNet database.
Training a deep ConvNet model from scratch takes several days or weeks or even months on a large dataset.A pre-trained neural network model would be a better choice to solve similar kinds of problems for a smaller dataset.In this work, two variants of VGG pre-trained neural network models such as VGG16 and VGG19 are adopted.Because, these neural network models have fixed size kernels which take less time to process and easily capture small patterns.Transfer learning is applied to perform classification on a new dataset which contains CG and PG images.It can be carried out in two ways: feature extraction and finetuning.Latter is adopted in our work and is performed by replacing the last three layers of pre-trained neural network models and these layers are fine-tuned for classification of CG images and PG images.

Visual geometry group (VGG) architecture
VGG architecture can be viewed as an input layer, feature extraction layers and classification layers.In the input layer, a color image of fixed size 227×227 is input to the architecture during training.The image is pre-processed by subtracting the mean RGB value from each pixel on the training set.During feature extraction, the image is moved through a series of convolutional layers, where the filters of fixed size 3×3 are used.Spatial padding and stride is fixed to 1 pixel which preserves the spatial dimension after convolution.The depth of the convolutional layers begins from 64 in the first layer and increases by a factor of 2 after every maximum pooling layer until it attains 512.Spatial dimension of the image is reduced by maximum pooling layers and this is done by using a filter of size 2 and a stride of 2. Five maximum pooling layers are used in the architecture which follows some convolutional layers.Classification layers consist of three fully connected layers: first two have 1024 neurons each and third contain two neurons and at the end sigmoid activation function is used to perform binary classification which produces the value in the range 0 and 1and it is described in (9).VGG variants CNN architecture is presented in Table 2 (the parameters of convolutional layers are denoted as conv(block)-(number of filters)_ layer number at each block.ReLU is not shown for brevity).
For training, binary cross entropy loss function is used and is given in (10).
Where, M is the number of categories,

Dataset collection
'JSSSTU CG and PG image dataset' consists of image categories: CG and PG images with 7,000 samples in each class, containing diversified contents.CG images are collected from various reliable computer graphics websites and PG images are captured from different camera models (standalone, in-built mobile cameras) as the camera specifications for each model vary in terms of megapixel count, image quality, sensor type and so on.To improve the diversity of PG image contents, they are collected from other sources INRIA [46], ICCV09 [47], and McGill calibrated colour image database [48].Contents of the CG image class include 3D model, architecture, cartoon, digital art, non-PRCG images, object, people, PRCG images, texture, trademark, vector maps and video gaming.PG image class cover a wide range of image contents: animals, buildings, man-made objects, indoor scenes, outdoor scenes, nature, vehicles and so on.'JSSSTU PRCG image dataset' contains 2,000 samples which demonstrate high photo-realism.
Online sources used to create CG image dataset are given in Table 3. Camera models and other sources used to create PG image dataset are given in Tables 4 and 5. Various online sources used to create PRCG image dataset are given in Table 6.The aforementioned datasets are made publicly available to the research community at the following link: https://sites.google.com/view/hrchennamma/researchactivities/jssstu-data-sets.

Table 3. Online sources used to create CG image dataset
In addition to these datasets, existing benchmark datasets presented in [5]- [7], are used for the experimentation.Sample size pertaining to each class and each dataset are shown in

Experiment setup for pre-trained neural network models
VGG variants (VGG16 and VGG19) are implemented with Google Colaboratory platform on the free 'Tesla K80 GPU' with 25 GB RAM using Keras.Regularization techniques such as early stopping (monitor=validation loss and patience=10), data augmentation and dropout (dropout probability 0.5) [49] are used to prevent the pre-trained neural network models from overfitting.Hyper-parameters such as stochastic gradient descent (SGD) optimizer with default momentum value of 0.

Experiment results
Average classification accuracies obtained using handcrafted texture features and pre-trained CNN on our new datasets are tabulated in Table 8.As shown in Table 8, CNN based pre-trained techniques outperformed the classification accuracy performance against the conventional SVM-based classifier.VGG19 has attained better classification results when compared to the handcrafted texture features and VGG16.

Comparative analysis of benchmark datasets used to evaluate classification models
Existing methods based on conventional machine learning and deep learning are used to compare their performances on existing benchmark datasets.Average accuracies attained are listed in Table 9. Performance of the methods is given from highest to lowest.As seen from Table 9, VGG19 pre-trained CNN has achieved cent percent classification results on Columbia dataset.Feature fusion method based on conventional machine learning proposed by Tokuda et al. has obtained better identification accuracy on DSTok dataset.Further, techniques presented in [29], [34], [35]

Performance metrics
The metrics such as precision, recall and f-score [50] are used to assess the performance of VGG19 pre-trained neural network model as it yields best classification accuracy against handcrafted features and VGG16 pre-trained neural network model on existing and proposed datasets.Macro average of aforementioned metrics is computed for two classes.Table 10 shows the evaluation metrics used to assess the performance of VGG19 pre-trained neural network model on existing and proposed datasets.
As seen from Table 10, low f-score is obtained on JSSSTU PRCG image dataset when compared to other datasets.The difference in f-score of JSSSTU CG and PG image dataset and DSTok dataset is only 0.3.Hence, we conclude that, our new datasets are very challenging and the DSTok dataset is as good as JSSSTU CG and PG image dataset but it is lacking with larger dataset size, contain limited number of PRCG images and images produced using recent rendering technology.

CONCLUSION
This work is aimed at creating two new datasets, namely 'JSSSTU CG and PG image dataset' a heterogeneous dataset which comprises 14,000 samples and 'JSSSTU PRCG image dataset' which exhibits photo-realism with 2,000 samples.Further, we implemented state-of-the-art techniques based on handcrafted texture features and deep learning.Performance of these techniques is evaluated on our new datasets and benchmark datasets.Experimental results show that CNN based pre-trained techniques outperformed the classification accuracy performance against the conventional SVM-based classifier.Further, we found that the choice of handcrafted features used for classification has achieved better results on the Columbia Dataset when compared to other benchmark datasets and our new datasets.The performance of VGG19 pre-trained neural network technique has attained significant results on 'JSSSTU CG and PG image dataset' but still the accuracy can be improved.On the other hand, its performance on 'JSSSTU PRCG image dataset' has achieved low detection rate due to the high-realism images present in the dataset.Hence, an efficient and robust technique is needed to solve this problem and our new datasets will be helpful for the researchers who are working on the cutting-edge research problem: "classification of computer graphic images and photographic images" to evaluate their classification models.To the best of our knowledge, these kinds of datasets do not exist in the literature.

Int
Dataset for classification of computer graphic images and … (Halaguru Basavarajappa Basanth Kumar) 141

Figure 1 .
Figure 1.Image samples from JSSSTU CG and PG image dataset and JSSSTU PRCG image dataset (a) CG images, (b) PG images, and (c) PRCG images

Table 1 .
Comparison of existing CG and PG image datasets.

Table 2 .
VGG variant CNN configuration: output volume and parameters for VGG16 and VGG19 architecture 1 −   ) is the probability of class CG.   is the probability of class PG and y is the binary indicator (0 or 1) if category label is the correct classification for sample.Rectified linear unit (ReLU), a non-linear activation function, i.e. f(x)=max(0, x) is used in all hidden layers of VGG variants. ISSN: 2252-8938 Int J Artif Intell, Vol.11, No. 1, March 2022: 137-147 142

Table 7 .
Image samples 143 from JSSSTU CG and PG image dataset and JSSSTU PRCG image dataset is shown in Figures 1(a)-(c) respectively.All the images are resized to a dimension of 227×227 pixels.Textual information present in some of the computer graphic images is cropped.Datasets are randomly partitioned into 80% for training (70% for training and 10% for validation in case of pre-trained neural network model) and 20% for testing.
Dataset for classification of computer graphic images and … (Halaguru Basavarajappa Basanth Kumar)

Table 4 .
Camera models used to create PG image dataset

Table 5 .
PG image sources used to create PG image dataset

Table 6 .
Online sources used to create JSSSTU PRCG image dataset

Table 7 .
Datasets used in the experiment GB RAM.Texture features such as GLCM and LBP are extracted independently from an image which consists of a feature dimension 4 and 59 respectively.SVM classifier is used in the experimentation.
9 and maximum number of epochs 100 are used for all datasets during training.Other hyper-parameters such as batch size and initial learn rate and learn rate drop factor used for different datasets are given: − Columbia dataset, DSTok Dataset, Rahmouni et al.Dataset and JSSSTU CG and PG image dataset VGG variants (VGG16 and VGG19): batch size of 32 images, an initial learn rate of 1e-4, learn rate drop factor (monitor=validation loss, factor=0.1,and patience=5) are used.

Table 8 .
Average classification accuracies of handcrafted texture features and pre-trained CNN on our new datasets

Table 9 .
have attained cent percent accuracy on a Rahmouni et al. dataset.Comparative analysis of benchmark datasets used to evaluate classification models Dataset for classification of computer graphic images and … (Halaguru Basavarajappa Basanth Kumar) 145

Table 10 .
Performance metrics used to assess the performance of VGG19 pre-trained neural network model on different datasets