Insights on assessing image processing approaches towards health status of plant leaf using machine learning

ABSTRACT


INTRODUCTION
With the increasing population score, there is an increasing demand for quality nutrition from a higher grade of food quality. This can only be confirmed when the cultivation is done because a healthy atmosphere and crops offer higher quality grains. However, diseases in the plants are inevitable, and it offers potential degradation towards the quality of the grains. Therefore, there is a higher degree of concern towards the diseases that inflict plants resulting in minimal production and yield. Such infliction can occur in any plants, right from roots, stems, branches, buds, flowers and leaves. There could be multiple reasons for this viz. adverse climatic condition, degraded quality of soil, poor irrigation, inferior practices in farming and adoption of conventional methods. With the increasing technology usage towards agriculture and cultivation, there are more chances of higher and quality yield in plants [1]- [5]. Sensors can be deployed over the cultivation fields to extract various data associated with the plants [6], [7]. Sensors can capture the images of plants that can be transmitted to another end, where an image processing algorithm can be executed to find the real-time status of the plant's health. It can be said that the majority of the diseases that have a negative impact on crop yield are highly visible, and this can be an identifier for image processing algorithms. Hence, an algorithm can be constructed based on formulated identifiers matching the specific information about the disease. A human can also assess such visual information in identifying the disease condition in plants. However, a human cannot monitor this abnormality for the crop field of a larger dimension. Therefore, there is a need for an automated approach that can prevent human interaction from carrying out this task of identification. As most of the problems associated with the disease are visually seen; therefore, this information can be checked autonomously by the machine itself using image processing. However, there are various challenges involved in this process. The first challenge is to perform extraction of features from many crops, and hence feature extraction is one essential operation [8]. There are various feature extraction mechanisms using various features viz. shape, energy, entropy, contrast, texture, fractal dimension, variation in red green blue (RGB) color, descriptors, grey levels histograms. Extensive work has been carried out towards this in the existing system. These extracted features are then used for further processing in order to carry out identification. In this process, two more processes evolve viz. identification and classification. The identification process generally attempts to match the defined data of disease with the input image, while classification generally implements a machine learning approach [9], [10] to categorize the types of disease. However, all these approaches are used for disease detection in plant leaf images and are also deployed for other purposes. This paper offers a snapshot of effectiveness in existing approaches towards applying computer vision over-identifying abnormalities in crop cultivation. This paper's organization is: section 2 discusses the diseases in plant leaves, followed by a discussion about existing approaches in section 3. This section further discusses all essential approaches and highlights their strength and weakness. Briefing of the existing trend of research is shown in section 4. Section 5 discusses the open-end issues about the existing studies while summarizing this paper in section 6.
Diseases in plant leave, before implying the image processing domain over the plant leaf's images, it is essential to understand the logical information about the disease associated with plants. Table 1 highlights some of the frequently studied diseases in plants from their leaf concerning disease representation, its corresponding pathogens, color space, and classifier used to categorize it. Hence, the infection in plants can occur by various means, and the leaf is one of the prominent areas of its diagnosis. The information stated in Table 1 shows various ways to analyze the disease and identify various pathogens. Understanding this fact is essential in order to develop a sufficient identification and classification approach. Such diseases can occur due to various reasons viz. nutrient deficiency, fungus, insect, and bacteria as shown in Figure 1. This information assists in developing a better classification process.  [11] Changes in the color of a leaf Deficiency of nutrient HIS Spatial (Euclidean) Maize [12] Color of the leaf, changes in morphology Shealth blight, leaf blight YCbCr Neural network (Backpropagation) Rice [13] -do-Deficiency of nutrient RGB Multilayer perceptron Grapes [14] Color of leaf, pigmentation Pest RGB Neural network Cotton [15] Strikes, Stains, spots Bacteria, bug hue, saturation, value (HSV), RGB Support vector machine Soyabean [16] Spot in leaf Fungus HIS The ratio of a spot with the lesion to the area of leaf Maize [17] Chlorotic area Maize streak virus Greyscale Thresholding of Pixel Maize [18] Holes in leaf The bacterial infection is identified from various external visuals, e.g., brown spot, soft rot, blight, Myrothecium and mould. The fungal infection identification can be found by leaf spot, red spot, stripe, rust, black spot and scab. However, it is not the same for nutrient deficiency as such effect may be visually seen in leaves, and sometimes it may be internal. Nutrient deficiency affects the complete plant and is much adverse compared to other diseases found from leaves. The diseases caused due to insects can be visually identified over leaves. All this information is technically required to formulate the criteria of detection and classification in plant leaves. Out of all the different approaches, the existing approaches are mainly found to tend to adopt machine learning approaches, especially deep learning techniques. There are specific reasons for this viz. i) adoption of deep learning allows effective handling of the data in the presence of a higher degree of noises; ii) adoption of deep learning also led to higher accuracy over the test images and it tends to reduce the dependency over a greater number of the test image; iii) identification criteria of complex types can be well formulated in in-depth learning approach; and iv) deep learning also led to an evolution of a predictive approach. Some studies have used the statistical inference approach towards disease identification over plant leaves.
Therefore, there are various approaches at present which is used for the identification of the diseases from plant leaves. While some processes address one specific technique and other amalgamate multiple existing approaches to achieve the objective of disease identification from plant leaves. The next section discusses the researchers' contribution more vividly in recent times associated with implying digital image processing over plant leaves.
At present, various work is being carried out towards investigating the specific problems associated with plant leaves using image processing. For this purpose, a thorough check is carried out towards various reputed journals to find all the signification processing techniques applied over plant leaves as shown in Table 1. It was found that such forms of investigation are carried out for two purposes viz. for disease identification and for other purposes, which is application-specific. All the journals published within the last decade have been considered for the review process. The prime target is to understand the strength and weaknesses of image processing approaches towards plant leaves as a case study. Following is the briefing of approaches.
The first set of the approach is related to segmentation, which is used for differentiating foreground object from the background scene. This process is utilized for localizing the object for a given scene. A unique approach of automated segmentation is introduced by Janssens et al. [19], where the system segment leaves from plants and obtains information about the leaf's symmetry line. The disease factor can be identified from this line of symmetry. This work's contribution is to use a unique feature extraction where image moments are extracted based on contours. The technique is found to offer a parallel process of multiple images at the same time. Another essential segmentation approach is introduced by Sun et al. [20], where multiple regression of linear form is used.
The second approach is related to feature extraction, which obtains numerical features from the raw image, making the image suitable for further processing without losing any significant information. Another benefit of performing feature extraction is dimensional reduction. The process of feature extraction has been carried out over plant leave where essential disease spot is identified in the process. The work carried out by Li et al. [21] has discussed histogram-based segmentation using an evolutionary approach where the gray portion of the leaves represents disease spot. The study has used statistical and visual features for spotting the lesion location. The technique uses a genetic algorithm to carry out segmentation; however, increasing the number of images will also degrade the search efficiency.
Moreover, this work does not support persistent feature extraction if the environment is dynamic. This problem is addressed in Lv et al. [22], where maize features are extracted in a complex environment using a neural network. The issues associated with the overfitting of a neural network are addressed using batch normalization. A similar direction of work is also carried out by Sun et al. [23], where deep learning is used along with the fusion of multi-scale form features. This mechanism also carries out preprocessing over the dataset, where the Retinex-based approach is used. The study performs fusing of low-and high-level features in order to accomplish better accuracy performance.
The third and most frequently used approach is the classification method towards identifying the disease in a plant leaf. It has been seen that machine learning is the dominant method for addressing classification related issues. The implication of machine learning was reported to identify the leaf rust disease, as discussed in the work of Ashourloo et al. [24]. The technique evaluates the spectra of leaf image using electromagnetic region where multiple machine learning approach has been assessed viz. gaussian regression process, support vector regression, and regression using partial least square. The study outcome shows the robustness of the machine learning approach. Although this technique offers robustness in the learning mechanism, it still induces higher memory consumption, increasing training operation over different forms of images. Hence, a better version of optimization is required to improve the classification performance further. A study to achieve this goal is reported to be accomplished using a bacterial foraging algorithm when integrated with the frequently used neural network with radial basis function. This work is discussed by Chouhan et al. [25], where bacterial foraging is used for allocating enhanced weight towards the radial basis function. This phenomenon is proven to increase accuracy. The study has considered the identification and classification of fungal disease on the leaf where higher accuracy performance is noted. However, the study is carried out over pre-trained images whose impact relationship with image quality cannot be identified. This issue is considered in Dai et al. [26], where it is stated that a fusion-based approach with a generative network could yield better results. The idea of this study is also to obtain an image with a higher resolution. This technique's positive point is its ability to detect multiple diseases, but its dependencies on the texture information are not found within the dataset. The prime reason for this is the non-inclusion of vegetation indices. Even this index is considered lower than its computational capability to identify variations in features in different diseases. This problem is addressed by Huang et al. [27], where vegetation indices have been substituted by spectral indices obtained from wheat's hyperspectral image. The study obtains weight information from bands of highly correlated wavelength in order to generate spectral indices. The outcome shows better detection performance. Adoption of hyperspectral images and disease detection is also seen in the work of Moriya et al. [28], considering the case study of sugarcane plants. The study develops a library that consists of hyperspectral image of both healthy and unhealthy plants. The technique further uses the block of the radiometer to obtain specific features of reflectance for generating mosaic for identification. Further kappa statistics are used for the classification process. The work carried out by Jiang et al. [29] have considered the case study of the identification of disease in apple leaves using enhanced convolution neural network on a real-time basis.
To some extent, such adoption of a rule-based approach offers better optimization enhancement in handling such issues. The work in this direction has been noted by Kaur et al. [30], where a rule-based approach is used for performing classification operation considering the case study of the classification of leave disease in the soybean plant. Using the k-mean algorithm, the study uses multiple forms of features (e.g., texture and color.) where the training operation is carried out using a support vector machine. Pham et al. [31] carried out a study towards the early classification process, emphasizing the heuristic-based solution using a hybrid approach. The idea is to implement a forward feed network to identify small diseases in plants.
The study also performs contrast enhancement as a preprocessing step followed by segmenting blob. The neural network takes the input of the feature to carry out training and later the classification. The adoption of deep learning is also seen in Wang et al. [32], where the automated process is used to assess the severity of the disease in plants. This classification approach has used segmentation using thresholding and used feature engineering. A similar approach to detecting severity is used by Zeng et al. [33] using deep learning. The comparesion of strength and weakness for different authors is as shown in Table 2.  [19] Segmentation Parallel processing Uses high-end processor to get results Sun et al. [20] Segmentation Reliability, higher precision -Accuracy can be furthermore optimized -Uses high-end processor to get results Li et al. [21] Feature extraction Enhance the efficiency of search Does not support scalable performance Lv et al. [22] Feature extraction Optimal learning process Specific to disease Sun et al. [23] Feature extraction Higher precision Specific to disease Ashourloo et al. [24] Classification Robustness learning method Could induce computational complexity

RESEARCH TRENDS
Understanding the research trend is essential to visualize the direction of the ongoing research in the area of analyzing disease from the plant leaves. For this purpose, all the research papers that are published during 2010-2020 are collected from reputed publication and reviewed. The analysis shows various explicit trends which are discussed in this section.

ANALYSIS OF EXISTING METHODS
At present, there are many existing methods that has been formed in order to address the issues associated with applying digital image processing over plant leaves. An outcome shown in Figure 2 shows that majority of the work has been carried out towards feature extraction process. Hence, the emphasis is more offered toward feature extraction process as essential information obtained from the extracted feature is highly contributory towards detection and classification. The adoption of deep learning, neural network, and support vector machines are next trend observed. This eventually means that more work is carried out towards classification process using machine learning. However, segmentation being an important part of image processing has seen a smaller number of attention and similar trend is found for regression. The rulebased method of fuzzy logic is also found to be very less adopted in existing approaches. Hence, the outcome of this inference is more usage of feature extraction methods and machine learning are frequently adopted topic of research in present era.

ANALYSIS OF ACCURACY ACHIEVED
Accuracy is one of the essential parameters for identification as well as classification. Exploring the research trend toward accuracy is essential to understand the achievement level, signifying the strength of existing approaches. Figure 3 showcase that means of the accuracies achieved from all research work published between 2011 to 2020. A closer look into the trend shows a steep fall of accuracy from 2011 to 2014 where the researchers have focused on different sets of problems where accuracy is considered a secondary parameter of performance. The different sets of problems include addressing testing with multiple forms of images, checking the visual quality, and considering parameters of training and run time of the algorithm. However, the accuracies started increasing from 2015 onwards stating that researchers are prioritizing the accuracy as the essential parameters of their experiments. The graphical outcome of trend also shows that accomplishment of accuracy remains nearly linear during 2018-2020.

ANALYSIS OF SCALABILITY
A scalability is assessed by evaluating the score of accuracy over increasing number of images dataset. It basically shows how a uniform algorithm could offer consistency of accuracy when exposed to different forms of images. The outcome shown in Figure 4 highlights that there are 55% of consistency in accuracy when exposed to increasing number of different forms of images. Therefore, it can be said that existing approaches are required to work more towards increasing this scalability factor and they are yet nor ready to be deployed for commercial application which demands much better consistency in accuracy. Hence, there is a need of more experiments and a greater number of modellings to evolve up with a new solution towards working on this scalability factor of accuracy.

OPEN END RESEARCH PROBLEMS
Open-end research problems: There are very few works being carried out towards preprocessing the input of plant leaf images. The preprocessing challenges in analyzing the plant's disease by the computer vision systems face some unique challenges compared to other image processing tasks of preprocessing. These challenges include viz. i) the occlusion on the leaves in the original images contain overlapping conditions that are a constitute of the noise; ii) the contrast mismatch and its balance among the fruit, its leaves, and background require efficient adjustment; iii) the highly varying lighting condition in the real-time scenario as the weather and sun variation brings associated challenges as well as due to the density factor of the orchard there exit low-intensity effect on the input image data; and iv) the imaging modality consists of large numbers of correlated but redundant feature set as an outcome of the various wavelength bands imaging modality.
Although studies are carried out in segmentation, it has not addressed the fundamental challenges within it. The challenges during the process of separating the disease affected portion from the background of the image include the following challenges: i) handling the dynamics of color associated with the disease during the segmentation based on the color feature-set; ii) handling the dimension overhead due to large color features of the original color of the fruits; and iii) variations of light condition, dimension or spread size of the disease, volume of the fruits are another essential challenges to be considered while designing the effective segmentation algorithms. Another important aspect towards the challenges to be handled during the segmentation process is that the popular region growing algorithm consumes more time, so the time overhead is not suitable to build the method suitable for the real-time process as well as consideration of the localization of the diseases affected area, its geometric features and textures require effective descriptors and extractor to get a better feature set for the learning model. Therefore, open-end research problems can be summarized as: a. The sensor and advance imaging systems pose enormous complexities for the deployments as they are not cost-effective. b. Many of the methods consider only the accuracy as performance parameters and do not consider metrics like F1-Score to handle the trade-off between accuracy and precision. c. Much of the work in disease detection focuses on the classification part but significantly less work on an integrated framework considering all aspects like preprocessing, segmentation, feature extraction, and learning model to provide an overall better sensitivity and specificity results along with the computation and time complexities trade-offs. d. For the model validations, most models benchmark their work or optimize the model only for accuracy.
In contrast, apart from the specificity and sensitivity analysis, the focus should also minimize the computational and time complexities to move the evolution towards real-time implementations. e. The literature lacks significant optimization method inclusions.

CONCLUSION
This paper has discussed an essential approach used to identify the abnormalities in crop cultivation using plant leaves' image. With a certain amount of strength in existing techniques of image processing, there is a more significant number of challenges associated with it. Hence, our future work will be focused on addressing the identified challenges and open-end research problems. The study will be carried out toward developing an evaluation framework in order to offer better performance.