Expert role in image classification using CNN for hard to identify object: distinguishing batik and its imitation

Received Mar 20, 2020 Revised Dec 21, 2020 Accepted Feb 1, 2021 In this research we try to solve the recognition problem in differentiating between batik and its imitation. Batik is an Indonesian heritage of process in making traditional textile product that is now endangered by the existence of imitation products. We try to compare two popular CNN model to classify batik products into five classes. The classes are tulis, cap, print warna, print malam, cabut warna. Tulis and cap are genuine batik, and the other three are an imitation. We realize that this problem is go beyond the recognition of fine grained image problem, it is a hard to identify image problem because even the batik experts is having a hard time identifying batik and its imitation if only based on its picture. The two CNN models, inceptionV3 and mobilenetV2 were trained on three types of image. One type is a freely taken image, the other two were taken based on the experts suggestion. The accuracy score shows that the model trained with the suggestion based picture perform better than the one trained with the random picture.


INTRODUCTION
One of the most blooming research area in computer science nowadays undoubtedly is computer vision. With a vision of enabling computer based machines and robots to see and understand the surrounding world. It is to be expected that this technology can produce result based on static and dynamic scene. Of course the static scene application is the basic for the dynamic one. Image recognition is the early form of this technology application. To recognize an image, system would classify it into a specific domain [1]. Thus this system also known as a classification system.
From solving image classifying problem the application of computer vision spread into many. It can be used to recognize handwriten letter, access control, camera surveilance, human detection, human tracking, distinguish textile products, classifying animal, and even for military purposes [2][3][4][5].
K-nearest neighbor (k-NN), support vector machine (SVM) and machine learning (ML) is an example technique popular to solve image recognition problem. The simplest one of the three is k-NN, which do the classification by searching the most similar image from the dataset [6][7][8][9][10][11]. SVM basically try to projecting input data to feature space. Thus result in linear classifier of the data. The application of SVM in image recognition field include handwriten recognition and satelite image analyze. In machine learning or any recognition and classification application feature extraction play important role [12].
Arguably machine learning is the leading technology in image recognition righ now, with the deep learning method be the most successfull of its subset [13]. Deep learning (DL) is a neural network with multiple hidden layers. Multi-layer perceptron (MLP) is the traditional type of DL in which every element of the previous layer is connected to every element of the next layer. In image recognition field there is three deep learning techniques notably superior, that is convolutional neural network (CNN), restricted boltzmann machine (RBM) based model, and stacked denoising autoencoders (SDA). With CNN alone as supervised type while the RBM and SDA use unsupervised approach. CNN has eminence in automatic feature learning, proof to scale/rotation/translation, and in generalization to avoid overfitting.
The growth of deep learning technology in recent year is boosted by three factors: 1. the availability of plentiful dataset that can be accessed publicly, 2. the rapid increase in GPU-based computing power, 3. the rise of new machine learning platform such as tensorflow, keras, theano that released in open source manner [9]. For the datasets, there is various source online that provided open data that can be downloaded freely. One is from Univercity of California Irvine (UCI) that provided a machine learning repository loaded with a various amount of dataset. Since the year of 2007 to 2019 this repository held 494 datasets and still growing. Beside that, Google the internet giant recently released new service called datasetsearch, a searching tool to 25 million datasets. Those tremendous amount of freely accessible datasets combining with freely available powerfull tool and cheaper computing power result in boost of machine learning research.
Recognizing object via image is an easy task for human, but for decades became a daunting task for machine or computer. Thanks to CNN, now training a computer to classify an image into several category became an easier task. The popular image recognizing problem raise from several open datasets such as imagenet, MNIST, CIFAR, Pascal. Take imagenet challenge as an example, the CNN technique proved successfull to achieve 95% accuracy.
More challenging problem is to recognise fine grained image. That is image of objects that seems similar for different classes. Like to recognize the dog breeds from image, or bird or cat breeds. This of course more difficult to do then to classify image into horse and bird. An experiment using GoogleNet model trained on ImageNet dataset then retrained on fine-grained fashion dataset yield 62% accuracy [14]. Thus the fine-grained image recognition remain challenging.
To identify between batik and its imitation was an image recognition problem left slightly explored. Batik is a traditional textile of Indonesia produced by hot wax resist dyeing technigue. There is two ways to put hot wax on a fabric in batik making process. The first is through the use canting tulis, the second is through the use of canting cap. Both canting tulis and canting cap are traditional tools for making batik. Batik that done wholly through the use of canting tulis then called batik tulis or handwritten batik, and batik that done wholly through the use of canting cap the called batik cap or stamped batik. Batik that combine both tools and technigue then called batik kombinasi or combination batik.
Handwritten batik, stamped batik and combination batik are the three known as truly batik. The process of making batik is as follows. First using pencil a pattern were drawn on to the fabric. This job can take days for complex pattern. Then the work continue to wax sticking job. For handwritten batik this stage is done by using canting tulis and for stamped batik this is done by the use of canting cap. This job may take longer period than the pattern drawing job especially for handwritten batik. After all the pattern covered in wax then the job is move to the first coloring work. If multiple color are desired in a piece of fabric then the first colored area must be sealed by wax then the following coloring job can occur. To detach the wax the boiling process is required. The wax that stick to the fabric will be removed while the fabric is boiled in a boiling water. Any batik likes product which produced involving other technique considered as imitation batik. Imitation batik can be made faster and cheaper then the truly batik. Imitation batik making process include color printing, color removal, cold wax printing and the combination. The imitation is indeed can be made very similar to the original. Thus result in cheaper product fooling the ordinary people.
The Center for Crafts and Batik in Yogyakarta, Indonesia has long fighting the imitation batik product. Now we want to try machine learning to help in identifying batik and its imitation. Identifying batik and its imitation is a hard task. Even a batik expert has difficulty in identifying a good imitation from the real one. That because nowadays the batik forgery technique is so advance its produce a very good imitation. There is traits that can be use to differentiate between batik and its imitation, but those were very subtle and encompass not only visual aspect. A batik evaluator whenever identifying batik products must consider these traits and then their instinct to deduces wether the product is genuine or fake. Usually they do it in group which then they could disscuss their conclusion that may differ from each other. It is common for batik evaluators to deduces differently from each other. It would not be excessive if we say that the batik and non batik identification problem was go beyond fine grained image recognition problem. For a fine grained image recognition problem the image itself was sufficient for an expert to do the classification manually. But for the batik and non batik problem, even an expert cant draw a coherent conclussion from a real object not to mention drawing a conclusion from an image only.  [15][16][17]. In batik classification, the CNN method proved to be the best choice [15][16]. However all the work before focused on identifying the batik motif or the shape of the batik pattern. Research on how to identify the authenticity of batik through the use of machine learning has not yet been found. However, ways to identify batik authenticity can be done by manually observing visual, physical, and chemical traits as stated by Masiswo et al. [18]. This research tries to found an automated solution in identifying batik authenticity. Identifying batik authenticity is more difficult than classifying batik pattern. A task that even the expert cannot do easily.

Batik expert group discussion
Preparing the dataset is the busiest part of the research stages. A ready to use labeled dataset for classification between batik and non batik was not available yet, so it have to be compiled first. At the beginning we realize that this project will need particular images. We have a doubt for a random image taken from batik-non batik sample would do for classifying. Thus a batik expert group discussion was held to decide which kind of batik-non batik image would work for classifying.
The result of batik expert group discussion is:  An image of batik-non batik sample taken with a digital microscope with 60 times magnification resulted in image type 1 depicted in Figure 1.  An image of batik-non batik sample taken with a digital microscope with 20 times magnification resulted in image type 2 depicted in Figure 2.  An image of batik-non batik sample taken with regular smartphone camera with default setting from 25cm distance more or less. The experts says that in the first and second type of image the wax trait could be seems. Wax trait especially from the first wax sticking job is important to identify the authenticity of a batik products. The type 3 images are designed for mobile applications should a digital microscope is not available. Print malam dingin (PM), which also fall in non batik class. Samples were gathered in various form, some sheets of textile some clothes. To decide in which class the sample supose to join, an examination by group of batik experts was done. The sample that has been through the examination step then put to join the suggested class, ready for image capture in the next step.

Image capture
The first type image, the 60 times magnification image, and the second type image, the 20 times magnification image were taken with a digital microscope. The images were taken in a room with enough daylight sources. The digital microscope also have its own lightning located at the tip near its lense. The microscope lense lightning setting was set so the image doesnt washed out by the bright light comes from its lamp. The microscope setting are arranged so the output image will be focused and sharp. With every detail captured as much as posible. The third type image were taken with regular smartphone camera with default setting. We use an external light source to help take pictures of the third type. The image files then divided and labeled into classes, organized in folders.

Inception
As the grow in CNN became a leader in computer vision algorithm, thus many model of CNN emerge. One of those model is the Inception model. In ImageNet datasets, which hold million images for 1000 classes, inception models was trained and score 3,5% on error rate [19]. InceptionV3 is based on its predecessor, inception V1 and Inception V2 with a modification in the initial structure. The architecture of InceptionV3 was accomodated in the GoogleNet model which in 2014 was recognized as the state of the art in image recognition. The basic idea is instead deciding the size of convolution to use, just do all the convolutions and let the model decide which is best. This method allow model to find local feature and more abstracted feature both by utilizing small convolution and big convolution along the way.

Mobilenet
Mobilenet is a streamlined architecture that uses depthwise separable convolutions. The result of this architecture is a lightweigth CNN model that is efficient for mobile and embedded device application [1], [20][21][22]. Mobilenet uses 3x3 depthwise separable convolution reducing the computation to 8to 9 times less than the standard convolutions with the price of only small reduction in accuracy. Surprisingly in a fine grained recognition problem like the stanford dog dataset the tiny mobilenet model gain slightly less accuracy at greatly reduced computation and size compared to the state of the art result (83,3%:84%).

Retraining the model
Transfer learning is a new method in machine learning by accomodating the learned knowledge in a training then use that to solve new and different but related to the old problem problems. In comparison in real life we can apply the knowledge we gather in making device with a microcontroller then apply that to build similar device based on single board computer, of course some adaptation required. In machine learning, we can use a model trained in a bigger and complex dataset problem then use the model and trained a bit to solve simpler dataset. Thats what transfer learning do.
The transfer learning process is:  We train a models with a big and complex dataset,  Keep the model and change the last layer (the output layer) with the desired output layer of the new dataset,  Retrain the model with the new dataset by modifying only the connection between the last layer and the previous. Thus the retraining process will just modify the weight parameters of the last layer connecting to the output defined by the label vector [19,[23][24][25]. After that we have new model with the output layer matched the classes of the new dataset and can be used for solving the new problem.
Luckily enough nowadays we can obtain the pretrained models from online repository. TensorFlow a new machine learning platform released by Google, that run under python language, make available to a number of pretrained models on their website. We can obtain a pretrained inception, mobilenet, VGG models and more ready to use in our tensorflow environment. For this research we use a ready to use inceptionV3 and mobilenetV2 models pretrained with imageNet dataset. By utilizing a pretrained models we no longer require to train the models first, we can skip the first step in transfer learning process.
This research in general was aimed in identification of batik and non batik products in more detailed task the identification is narrowed to classify into five smaller classes mentioned before in part 2.1. For both problem we will use two models, inceptionV3 and mobilenetV2. For there are three types of image used in classifying then there will be 12 models trained.
The environment we set up for the retraining process is a tensorflow 1.12 with python 3.6, run on a build up PC with Intel core i7 processor and 16GB memory. No GPU accelerated process were used.

The expert judgement
From the data gathering step, we success in assemble about 1000 sample and taking about 12000 picture of those samples. The recapitulation of this dataset sample and picture gathering is as shown in  Table 1. From the expert group discussion step about deciding which part of the sample is best for identifying batik and non batik, the expert conclude there is two strong visual trait to identify batik authenticity. Those two trait unfortunately is a little bit abstract. When the expert try to explain those trait to the commoner in the team, we the commoner can't grasp the concept they try to explain. After the long explanation and questionanswer session we still cant apply the thing they said to differentiate between classes in this batik non batik problem. In the other hand the expert seems capable in identifying the batik and non batik products even into the five category which is harder to do. Figures 3-4 show images that the expert claims shows one of the two visual trait. For us commoner those two image were similar in the term that we cannot conclude which one is batik and which is non batik. Fortunately most of the expert we test with the image can answer correctly. Some expert got it wrong by answering print malam instead of the correct answer batik cap. The print malam imitation is the one batik imitation considerably difficult to identify because the technique result in similar to authentic batik product. All of the expert agree that solely depend on visual trait to identify the authenticity of a batik product lead to uncertain conclusion. Even by using all the physical and visual trait an expert sometime can't be sure of the batik authenticity. A standard labeling process of authentic batik product require the evaluators to examine the production process to make sure that the factory is truly produce authentic batik products.  Table 1. Dataset recapitulation   Sample  Type   60x  20x  frame  total  image  sample  image  sample  image  sample  image  sample  CW  591  119  590  118  238  119  1419  356  BC  1351  271  1351  271  542  271  3244  813  PM  442  89  440  88  175  88  1057  265  PW  1491  299  1491  299  598  299  3580  897  BT  1266  254  1265  253  506  253 3037 760

Result expected
The difference result given by the expert judgement shows that there is no certain method in identifying the batik authenticity. The recent best method, the expert judgement still leave errors. Therefore we not put the 100% accuracy as our aim. Instead we put the expert judgement accuracy as our target. The result model will be expected to have the expert accuracy. Thus we label the expert judgement accuracy as the 100% accuracy target we want to achieve.

Expert suggestion
Because there is no well written procedure to manually identifying the batik authenticity that can be follow naratively, and because the expert cannot clearly explain the method they use in finding trait, then we conclude that those trait were abstract in some level. We move then to just follow the expert suggestion that in a certain picture of sample there is trait that can be seems, even if some expertise is needed to notice those trait. We took the picture of those part and then use it to train a CNN model to solve the authentic batik identification problem. The expert suggest that in a picture of sample taken with a digital microscope with 60x magnification and with 20x magnification there is trait to be found. One of the expert said that those trait were the main trait to spot the print malam product, the hardest to identify imitation. Beside the 60x and 20x magnification we also include the regular photo from the smartphone camera, we call this as the full frame image. This third type image we try to also identify through CNN retraining process. For a sample we took 5 picture of 60x magnification, 5 picture of 20x magnification and 2x of full frame image, total 12 pictures per sample.

Models accuracies
The models retraining process or the transfer learning was held on a pretrained models. The inceptionV3 and mobilenetV2 models that has been trained with the imageNet dataset. Those two models we download the copy from the tensorflow website. For batik and non batik problem or the two class problem we try both the models, not different from what we do for the five class problem. First we train an inceptionV3 models on dataset of 60x magnification image for two class problem. The result is 74.9% accuracy. Then we retrain another copy with 20x magnification image dataset for two class problem that resulting in 75.9% accuracy. Then for the full frame dataset of two class problem the inceptionV3 models achieve 70.6% accuracy. The full model accuracy result is shown in Table 2. The average time needed for full retraining a models with the prepared datasets is less than one hour. Overall the result is as expected. The inceptionV3 models gain higher accuracy than the mobilenetV2 models. The mobilenet models are designed to run in a mobile environment, the minimization of model size and time to run the model is a priority of mobilenet design. Through simplification the mobilenet models end to smaller size, simpler calculation, and faster run time in the price of acceptable accuracy. The mobilenetV2 model we train approximately 20MB in size. Four time smaller than 80MB of inceptionV3 models.
The models identifying in two class problem is also have better performance than the models identifying the five class problem. The two class problem is clearly the simpler task so the higher accuracy were expected.

CONCLUSION
The hard to identify object is a new sector in image recognition world. Taking a step further from fine grained recognition problem, a hard to identify object problem is trying to solve the identification of an object that even an expert had difficulties in identifying the object. Take an example in identification of batik and non batik problem. An expert in batik could try to identify the batik authenticity with the help of Indonesian national standard of batik (SNI Batik) as a guidance or by the abstract knowledge and experience they have as a long time dedicated experts. But the identification result may vary among the experts itself. Thus the identification batik and its imitation problem is a hard to identify object problem because even an expert will have difficulties in doing the task. We try to solve this problem with the help of CNN. We retrain inceptionV3 and mobilenetV2 models formerly pretrained on imagenet data set. Overall the inceptionV3 based models perform better than the mobilenetV2 based models, models of fewer class categories have better accuracy, and models trained with the type of image suggested by the expert also gain better accuracy than models trained with a randomly taken image.