Early detection of tomato leaf diseases based on deep learning techniques

ABSTRACT


INTRODUCTION
The agricultural sector is considered the most productive among the various sectors in most countries, with agriculture playing a significant role in the country's global economy [1], [2].After the oil sector, the Iraqi agricultural sector is the second largest contributor to the country's gross domestic product (GDP), employing more than 20% of the country's workforce [3].One of the most cultivated and produced crops is the tomato crop [4], widely cultivated in fields and farms; out of every ten farmers, nine grow tomatoes on their farms [5].The sixth-most abundant vegetable in the world is tomatoes, according to the food and agriculture organization (FAO), with an average tomato production of more than 180 million tons per year worldwide [6], [7] and per capita consumption of more than 20 kg per year [8], which represents about 15% of the total vegetable consumption [9].
Tomato is the most common vegetable in Iraq, with about 771,000 tons produced across the country, with high production in Karbala [10], Basrah [11], and Najaf [12].The main reason for the decline in crop production and the increase in losses is pests and diseases, as tons of crops are lost annually [13].Therefore, early detection of these diseases is urgent to avoid massive losses and increase yields.The traditional methods of detecting plant diseases need diagnostic experts [14], are expensive in terms of time and effort, and need special tools in addition to a large percentage of error in diagnosis and cannot be performed with the naked eye [15], [16].Recently, due to the advancement of computer technology, particularly artificial intelligence techniques, it has been able to detect plant diseases early utilizing images to differentiate between diseases, an automated process for fields and farms [17], [18].This research article presents a new model that can detect and classify the most common diseases in tomato leaves without needing diagnostic experts, thus helping farmers improve production  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 509-515 510 and increase profits.This presented model is based on convolution neural networks (CNN), a high-performance deep learning network commonly utilized in crop disease identification [19], [20].
The following is how the paper is structured: Section 2 reviews the literature on current approaches.Section 3 covers the recommended technique, model, and measures followed to achieve the desired outcomes.Section 4 discusses the results, analyzes the proposed methodology, and compares it with previous models.Section 5 contains the paper's conclusion and scope for future work.

LITERATURE SURVEY
Based on machine learning and image processing, numerous researchers are attempting to offer various solutions to the issue of tomato leaf disease detection to produce an accurate automated classification.Researchers have recently begun to use deep learning techniques to produce more accurate results.Several of the most appropriate deep learning-based methods for detecting tomato leaf disease are covered in this section: Tm et al. [21] proposed a CNN-based LeNet model to detect and identify ten diseases in tomato crops in 2018.With the least amount of computational resources and the easiest technology (LeNet), the suggested work addresses the issue of tomato leaf disease identification.The photos in the PlantVillage dataset are scaled down to 60*60 resolution to expedite training and make model training computationally possible.The accuracy of the suggested system ranges from 94% to 95% on average.
Kumar et al. [22] analyzed visual geometry group (VGG) Net, LeNet, ResNet50, and Xception from scratch to classify nine different tomato leaf diseases using a pair of PlantVillage datasets, the ImageNet dataset in 2019.The findings demonstrate that, out of all the investigated architectures, the fine-tuned VGGNet achieves the enormous accuracy and lowest loss, with a test accuracy of 99.25%.The key drawback of the suggested approach is that training takes a lot of time and requires expensive hardware equipment.
Ashok et al. [15] introduced a new CNN-based model for detecting tomato plant leaf disease that has been developed utilizing open-source algorithms, segmentation, and clustering-based image processing techniques to improve the image.A Gaussian filter is used during preprocessing to remove blur and reduce noise.The feature is extracted using discrete wavelet transform (DWT) and grey-level co-occurrence matrix (GLCM).An accuracy level of 98% was attained using the suggested strategy.
The study of Zhao et al. [9] has developed a deep convolutional neural network with a multi-domain of feature extraction to diagnose tomato leaf diseases.The model is trained to distinguish between healthy and diseased tomato leaf images by integrating attention density and residual mass depth.The experiment findings demonstrated that the suggested model achieves an average identification accuracy of 96.81% on the tomato leaf diseases dataset, outperforming prior deep-learning experiments utilizing the most common PlantVillage dataset.
Benoso et al. [23] demonstrated developing applications and computational systems for disease detection in tomato plants in 2020.The authors used texture and color statistical descriptors to extract traits and different classifiers to identify diseases.Three diseases can be detected using the methods proposed in this article (mosaic virus, late blight, and septoria leaf spot) on tomato leaves.The plant village image dataset has 160 images for each disease, classified by K-nearest neighbors (K-NN), random forest, support vector machine (SVM), artificial neural network (ANN), and naive bayes.Random forest has the highest accuracy of 90.7%.
According to Agarwal et al. [5] created a CNN model with three max-pooling layers, three convolution layers, and two fully connected layers.The model's effectiveness was evaluated using various criteria, including training, validation, test accuracy, and the number of trainable parameters.The plant village dataset has nine disease classes and classes with healthy images.The proposed model has a 91.2% average accuracy.In addition, Four CNN architectures, including VGG-19, ResNet, VGG-16, and Inception V3, were proposed by Ahmad et al. [24] in 2020 to recognize and categorize tomato leaf diseases.The two data sets used in this study were self-collected field data and a lab-based dataset called Plant village.On the field-based dataset, these models don't perform well in the laboratory-based dataset using only four tomato leaf disease classes (2,364 images).On both datasets, Inception V3 is found to have the best performance.The highest accuracy of 93.40% was achieved in the laboratory-based data set.
Based on the recently created efficientNet CNN model, Chowdhury et al. [25] built a model of a deep convolutional neural network to segment the leaf images from the background.The authors used two segmentation models, a U-net and a modified U-net.In addition, a different class of healthy and diseased leaves was used for classifications.The findings demonstrated that the model performed better than several contemporary deep learning techniques, with an accuracy of 99.89%.
Two convolutional neural networks (CNN), GoogLeNet and VGG16, were used by Kibriya et al. [26] to classify tomato leaf diseases.By utilizing a deep learning strategy, the suggested work seeks to identify the ideal answer to the issue of tomato leaf disease detection.On the plant village dataset of 10,735 leaf pictures, VGG16 achieved an accuracy of 98.8% and GoogLeNet a score of 99.23%.In this study, the tomato plant's leaves contained just three infections, which were identified and categorized.The pictures of the tomato leaf disease were collected from the publicly published plantvillage [27] database in Kaggle, which contains 11,000 images of ten different classes, nine different disease classes, and a health class.The database was divided into 10,000 training images for each class, 1,000 images, and 1,000 for testing for each class 100 images.The size of all the images is 256×256, and the format is jpg.This class is the most common tomato leaf disease: mosaic virus, target spot, late blight, septoria leaf spot, yellow leaf curl virus, leaf mold, two-spotted spider mite, early blight, and bacterial spot, as shown in Figure 1.The dataset contained noise-free images, so noise removal was not required as a preprocessing step.The images have been downsized to a resolution of 128×128 pixels to expedite the training process.

Model evaluation
To evaluate the model, the accuracy test was used to measure the accuracy of the submitted model, as accuracy determines the classifier's capacity to make a precise diagnosis.The accuracy equation is shown in (1).Also, Precision (Pre), F_score, and Recall (Rec) were used to measure the performance of our presented model, as these are the most popular and efficient measures in the implementation of models.These measures are explained in the following equations: P1 is a number of true positives (the outcome is positive and expected to be positive).N2-refers to the number of false negatives (despite predictions to the contrary, a positive result occurs).F -denotes the number of false positives (the result is negative and was predicted to be negative.)N1-defines the number of true negatives (despite being predicted to be positive, the result is negative)

Model
The submitted CNN model has been implemented in Colab, an integrated software environment.The proposed CNN model consists of two parts, feature extraction, and classification.For feature extraction, three convolution layers are followed by three max-pooling layers; the first convolution layer contains 64 filters, the second layer contains 32 filters, and the third layer contains 16 filters.All the filters have 3×3 sizes.The rectified linear units (ReLU) activation function is present in these three layers, with a max pooling layer of size 2×2.The second part of this proposed model is the classification part.Our model used a fully connected  The optimization was carried out with an Adam optimizer, and a loss function was categorical cross entropy whose ( 5) is shown as follows: where (p) is the expected probability and (y) is the goal probability, and (m) is the number of classes.A dropout (0.5) was used to avoid overfitting and data augmentation techniques such as image zooming, vertical cropping, horizontal flip, and rescaling.The model was trained for 40 epochs with 64 batch sizes.The hyperparameters for the model are described in Table 1.

RESULTS AND DISCUSSION
The obtained results prove the possibility of the presented model, based on CNN, its efficiency in detecting and classifying tomato leaf diseases.By training the model on the PlantVillage dataset, the highest accuracy value in 40 epochs was obtained, with an average of 96% and a loss magnitude of 0.1.The test accuracy was 92%.The value of the precision was obtained at 91%, the value of the recall was 92%, and the value of the F-score was 92.Table 2 shows the most important results obtained.The proposed model is distinguished from the traditional models, such as VGG16 and ResNet, in its speed and does not require ample storage space compared to the traditional CNN models.The accuracy rating for the training and testing process in conjunction with the epoch number is presented in Figure 3.The loss values for the training and testing processes are displayed in Figure 4.The suggested model compares with previous CNN models that use conventional structures, such as Lenet, Resent, and Alexnet, which all use the same database.The results showed that our model is superior in training and testing accuracy.Our model uses fewer layers (4 layers only) than traditional CNN structures, which may reach a hundred layers.Hence, the suggested model is faster in implementation and does not need large storage space, as it has an area of 23 MB, while the traditional structures reach more than 100 MB.Tables 3 summarize the results of comparing our presented model with some previous models.Figure 5 illustrates the confusion matrix for the test data.

CONCLUSION
The tomato crop is one of the most abundant and famous vegetables in most countries.Tomato leaf diseases cause considerable losses in production, as millions of tons are lost annually.This proposed model based on CNN offers the possibility of early detection of tomato leaf diseases, thus contributing to maintaining production and increasing yield.The most common diseases were classified using the PlantVillage database, which contains 11,000 images of ten categories.The presented model also shows its superiority over the traditional methods, as it obtained an accuracy of 96%.Future work will improve the model's accuracy, test it on other crops, and convert it into a mobile application that can be used easily.

Int
Early detection of tomato leaf diseases based on deep learning techniques (Mohammed Hussein Najim)

Figure 1 .
Figure 1.Classes sample image of the dataset

Figure 2 .
Figure 2. The structure of tomato leaf disease diagnosis

Figure 5 .
Figure 5. Confusion matrix on the test data

Table 1 .
Hyperparameters for our model

Table 2 .
Model results

Table 3 .
Compared to previous models [21]r ModelNo. of layer Dataset No. of images No. of classes Accuracy Prajwala et al.[21]