MedicPlant: A mobile application for the recognition of medicinal plants from the Republic of Mauritius using deep learning in real-time

Received Feb 3, 2020 Revised Jun 15, 2021 Accepted Aug 12, 2021 To facilitate the recognition and classification of medicinal plants that are commonly used by Mauritians, a mobile application which can recognise seventy different medicinal plants has been developed. A convolutional neural network (CNN) based on the TensorFlow framework has been used to create the classification model. The system has a recognition accuracy of more than 90%. Once the plant is recognised, a number of useful information is displayed to the user. Such information includes the common name of the plant, its English name and also its scientific name. The plant is also classified as either exotic or endemic followed by its medicinal applications and a short description. Contrary to similar systems, the application does not require an internet connection to work. Also, there are no pre-processing steps, and the images can be taken in broad daylight. Furthermore, any part of the plant can be photographed. It is a fast and non-intrusive method to identify medicinal plants. This mobile application will help the Mauritian population to increase their familiarity of medicinal plants, help taxonomists to experiment with new ways of identifying plant species, and will also contribute to the protection of endangered plant species.


INTRODUCTION
There are thousands of medicinal plant species in the world and millions of patients use them routinely to meet their primary health care needs. Despite the availability of modern pharmaceuticals, natural herbal remedies are still popular in many cultures. Countries such as India, China, Egypt, Greece and Iran have a very rich history on natural therapies based on medicinal plants. Ayurvedic medicine from India and traditional Chinese medicine (TCM) are among the two most well-known alternative medicinal systems in the world. According to the world health organisation (WHO), 88% of member states out of 170 countries have developed national policies and regulations for the use of traditional medicine [1]. However, the number of countries using herbal medicines is actually higher because not all countries responded to the WHO survey. The world health organisation has also developed the WHO traditional medicine strategy 2014-2023 in order to reinforce the importance of traditional medicines [2]. Traditional medicines are no longer being 939 seen as an alternative system but rather as a complementary one which can be integrated and used alongside conventional medicine to provide a safe, cheap and people-centred health service. Medicinal plants have a variety of other uses ranging from perfumes, flavourings for food and cosmetics. The use of natural products has increased in recent years and this trend is expected to continue. A number of clinics offering Ayurvedic treatment and TCM have sprung up in many places in the Republic of Mauritius in the last few years. The unwanted and harmful side-effects resulting from the use of allopathic (conventional) medicines have contributed to a rise in the demand for herbal medicines [3]. However, in order to use medicinal plants wisely, it is important to identify and classify them correctly. The first step in quality control of herbal remedies is ensuring the correct identity of the plant. This is not an easy task for the layperson. Often, the only way to identify a plant is to rely on the experience and knowledge of a botanist. However, this process can sometimes be biased and is often non-conclusive. While there are several methods for identification of plants, the simplest and the least expensive one is microscopic identification, but this is still a laborious, inconvenient and time-consuming process. However, due to numerous challenges such as loss in biodiversity, climate change issues and the difficulties to monitor plants on the field, taxonomists are looking for a cheaper, more convenient and practical method to identify plant species [4]. This explains the interest for researchers to automate the recognition process based on the morphological and visible characteristics of plants. The majority of existing systems rely on the shape of leaves, but the stem, petals, flowers and seeds may also be used [4]- [7].
In this work, we propose to develop a smartphone app to identify medicinal plants from pictures of part of the plant or the whole plant. The images can be taken randomly from any angles but from within a reasonable distance. A deep learning architecture based on a convolutional neural network (CNN) will be used to train the system to create the model and the model will then be integrated into a mobile app for the identification of medicinal plants. The majority of current systems use hand-crafted features with traditional machine learning classifiers which are not robust to small disturbances and changes in the background. Often, they require the picture of a single leaf on a white background to provide acceptable performances. However, such methods have been found to be of little practical value as they are not scalable when larger numbers of classes are used or when there are changes in the environment. Recognition systems based on CNN have been found to provide much higher accuracies, are more robust and scale well. The availability of such an automatic plant recognition application will enable even non-experts and other stakeholders to identify medicinal plants quickly and effortlessly. Our proposed identification method is simpler, cheaper, faster and non-destructive compared to analytical techniques. Furthermore, our study is in line with the third sustainable development goal of the United Nations which is to ensure the well-being and a healthy life for everyone [8].
This paper proceeds as follows. In the next section, we give an overview of some of the work that has been done on the recognition of medicinal plants, in particular those using machine learning and deep learning techniques. The methodology and dataset are described in section 3. Section 4 describes the experiments and results. Finally, section 5 concludes the paper with a note on future works.

RELATED WORKS
Plants are the factories of life. They produce oxygen and food which are required by all living organisms. A large amount of research is being done on the automatic identification of plants as current laboratory methods are very expensive and impractical to use during field trips. The automatic recognition of medicinal plants has also attracted much attention within the research community as medicinal herbs have been used by people since ancient times and are still being used by millions of people across the world. Thus, in this section, we provide an overview of research that is being done in these fields.
One of the first complete studies on the automatic identification of medicinal plants was done by Gao and Lin [9]. They described an end-to-end system for the recognition of medicinal plants based on images of their leaves. An artificial neural network classifier was used for the classification. However, the authors did not provide adequate information on how much data was used for training and for testing. The system was also not evaluated properly. In 2013, Arun et al. [10] have used machine learning techniques to identify five medicinal plants using texture information only in contrast to other researchers who primarily used shape and colour information. Fifty images were taken for each type of plant using a high-resolution digital camera. Seventy percent of the images were used for training and the remaining thirty percent were used for testing. The best accuracy of 94.7% was obtained using the stochastic gradient descent optimisation algorithm.
In 2014, Sainin et al. [11] have used computer vision and image processing techniques to identify medicinal plants from Malaysia. They created their own dataset of 65 leaf images for five different medicinal plants from villages situated in the state of Perlis. Nine images from each species were used for training and the rest were used for testing. Standard classifiers from the Weka workbench were used for classification but the best result of 65% was obtained with an ensemble approach. However, given that there were only 5 classes in this study, an accuracy of 65% would not be considered as a good result. The researchers have used the Prewitt edge detection algorithm to extract the shape of the leaf in the earlier stages of processing. This is a highly lossy operation and we believe that this is causing the results to drop.
The practice of Ayurveda, which originated in India, is based on the principles of using natural remedies such as medicinal plants for the treatment of pain and illnesses. In 2015, Kumar and Talasila [12] have developed an automated method for the identification of ten different medicinal plants. To increase the robustness of the system and to cater for environmental variation, the plants were taken from different regions Fifty images were captured for each category of leaf using a very high-resolution digital camera from a distance of 10cm. The leaves were placed on a white background with natural daylight. Shape, colour and texture information were then extracted using image processing algorithms. However, simple statistical measures such as standard deviation, mean and variance were then used in an attempt to identify the plants but with moderate success.
Deep-Plant was the first application of deep learning and convolutional neural networks for the recognition of plants [13]. The researchers created their own dataset (MalayaKew) which contains leaf images of 44 different plants with an average of 8 images for plant species. Data augmentation was then performed by rotating the images in seven different orientations. Out of the 2,816 resulting images, 2288 were randomly selected for training and the remaining 528 images were used for testing. The authors report an accuracy of 98.1% which was much better than state-of-the-art solutions based on hand-crafted features. The authors have also created leaf patches of different sizes and then augmented them in the same way as before to produce a total of 43,472 sub-images. The idea was to know whether it is possible to identify a plant using part of the leaf rather than the whole leaf. 34,672 patches were then selected randomly and used for training and the remaining 8,800 patches were used for testing. In 2015, Lee et al. [13] report a slightly higher accuracy of 99.6% when using this approach. Thus, it was concluded that venation structure plays a major role in the identification of plants from leaf images.
In 2017, Sabu et al. [14] created their own dataset of ayurvedic medicinal plants consisting of 20 species. Different leaves from different plants from each species were photographed 10 times with a very high resolution. Five images from each class were used for training and the remaining five were used for testing. The researchers have used a combination of speeded up robust features (SURF) and histogram of oriented gradients (HOG) features and the k-nearest neighbour (kNN) machine learning classifier to classify the 100 leaves into the respective classes. The authors report a near-perfect accuracy on all the 20 ayurvedic plants.
In 2017, Sulc and Matas [15] have used deep learning methods for the identification of plants using texture information from their bark and leaves. The authors have used the middle European woods (MEW) dataset which contains 153 different types of plants from Central Europe [16]. They have achieved an accuracy of 99.5% compared with 84.9%, which was achieved by the researchers who created this dataset. They also tested their algorithm, known as Ffirst, on other popular leaf datasets such as Flavia [17], Foliage [18] and Leafsnap [19]. They obtained an accuracy of more than 99% on the Flavia and Foliage datasets but the score for Leafsnap was only around 84%.
The PlantCLEF 2017 dataset consists of a clean set of 256,288 images and a noisy set of 1,432,162 images. The dataset contains samples for 10,000 different plants and there is a very high imbalance between the classes. Using deep learning architectures such as the ResNet50, DenseNet201 and Inception v3, Haupt et al. [20] were able to achieve an impressive accuracy of 77%. Their performance was compared with human experts. Their classification results were better than some of the experts and at par with some others. However, there were some experts who achieved an accuracy of over 90%. The researchers notice that using noisy data improves the classification accuracy. Image augmentation techniques such as flipping, zooming, shearing, shifting and rotation also increase the accuracy. It was also found that the DenseNet201 architecture achieved better results than ResNet50 and Inception v3.
In 2017, Begue et al. [21] have created their own dataset of twenty-four medicinal plants that are commonly available in the Republic of Mauritius. The leaves were plucked from the plants and photographed with a smartphone camera in the computer lab under loosely controlled lightning conditions. Thirty images from different leaves were taken for each plant. Eighteen features were then extracted from these images and the data were fed to five different machine learning classifiers. The random forest classifier with 100 trees produced the best accuracy of 90.1%, followed closely by the ANN at 88.2% and SVM at 87.4%. The naïve bayes and kNN classifiers performed less well. A new measure called lobidity was also proposed. The dataset can be obtained by contacting the corresponding author.
A convolutional neural network (CNN) based on the GoogleNet deep learning architecture was used by Jeon and Rhee [22] to classify images from the Flavia dataset into eight different categories based on their leaf type. The leaf types were: lanceolate, oval, acicular, linear, oblong, reniform, cordate and palmate. One hundred images were used for testing. However, these images were first discoloured by 5-60%. Nevertheless,  [24]. The VGG-16 deep learning model [25] is pre-trained on the ImageNet dataset [26]. The ICL dataset contains 17,000 leaf images from 220 different plant species and at least 30 images per plant type. By firstly transforming their images to a device independent colour space and adding a principal component analysis (PCA) step to minimise data redundancy, the authors have been able to achieve state-ofthe-art classification accuracy of 98.2% on the ICL dataset.
The MobileNet deep learning architecture was used by Beikmohammadi and Faez [27] to classify images from the Flavia and Leafsnap datasets. The researchers report state-of-the-art accuracies of 99.6% on the Flavia dataset with 32 classes and 90.5% on the Leafsnap dataset with 184 classes. In 2019, Xue et al. [28] have used two completely different methods to recognise twenty different Chinese medicinal plants. In the first approach, a scanner was used to scan the leaves from which various morphological features were extracted. A large number of machine classifiers were tried on the MATLAB platform but the artificial neural network (ANN) provides the best accuracy of 98.3%. The authors have also used visible (VIS)/ near infrared red (NIR) spectroscopy for plant identification. This is a rapid and non-destructive approach as the experiments are conducted directly on the plant, without the need to pluck the leaves from the plant. Using this approach coupled with ANN, they were able to reach an accuracy of 92.5%.
One of the studies that is very similar to ours was done by Vo et al. [29]. In this study, the researchers created their own dataset of 10 different medical plant species from Vietnam. The images were taken in the natural environment using a smartphone and the whole plant was photographed from various angles and distances. An additional class containing 978 irrelevant images was also added to the dataset. The final dataset contained 10,279 images. The minimum number of images in one class was 860 and the maximum number was 1000. Using a VGG-16 deep learning model and the light gradient boosting machine (LightGBM) produced the best accuracy of 93.6%. All the experiments, testing, training and evaluation were done on a desktop machine using the Keras deep learning library running on a TensorFlow backend and the Python sci-kit learn machine learning library.
A thorough and recent review on automatic techniques for the classification and identification of plants from their leaves has been done by Azlah et al. [7]. The paper covers the following topics: preprocessing steps using image processing algorithms, feature extraction, popular machine learning classifiers, convolutional neural networks (deep learning) and pattern recognition methods. The specificities, difficulties and challenges of the different approaches are also outlined.

METHODS
A database of medicinal plants which are available on the tropical island of Mauritius has been created. Using different smartphones, thousands of images of seventy different medicinal plant species have been collected. There are 100 images for each type of plant. The resolution of each image is 1836x3264 or 3264x1836, as the pictures were captured both in the portrait and landscape formats, just like normal users would do with their smartphones. All these images are stored in the standard JPEG format. Previous studies have used digital cameras under tightly controlled laboratory conditions in order to take the images. The images are then pre-processed in order to remove noise and to resize the images to specific dimensions. Unfortunately, this approach of capturing images is not suitable for recognition using a mobile app. In this work, our intention is to develop a mobile app which can do the recognition in real-time without any limitations. All the steps in the development of this application are shown in Figure 1.
From Figure 1, we can see that once the images are captured, they must be resized to 299x299 pixels. The images are then separated into three parts. The largest set is meant for testing and two smaller sets are used for validating and testing the model. Next, a deep learning architecture has to be chosen for the training. In this case, we have used the Inception-v3 image recognition model due to its high prediction accuracy [30]. Training and validation is carried out to find the optimal model, which is then tested with unseen data, after which the best model is integrated into a mobile application to make predictions in realtime without the need for an internet connection. Sample images of medicinal plants are provided in Figures 2 and 3. The full set of seventy medicinal plants, together with their scientific name and common name, are listed in Appendix 1.
In our proposed system, there are no pre-processing steps that have to be done on the images in order to enhance them or to prepare them for the recognition task unlike the majority of previous approaches where the images had to be filtered to remove noise, cropped or complex image processing algorithms had to be used to segment the relevant parts. In many existing systems, it is necessary to perform shadow removal and thresholding as well after which the leaf will be bounded by a box in order to extract basic attributes such as the width, length, area, perimeter, area of white space, area of bounding box, area of hull and perimeter of hull. From these basic attributes, many other attributes will be calculated. Some of them are aspect ratio, circularity, solidity, convexity, white area ratio, and hull ratio, amongst others. Traditional machine learning classifiers such as decision trees, random forests, naïve bayes, k-nearest neighbours, support vector machines and neural networks from platforms such as MATLAB, Scikit-learn, and Weka, are then used for the identification. Such systems based on hand-crafted have been found not to be very reliable and robust. Therefore, we propose a deep learning approach to tackle the recognition of a large number of medicinal plants in real-time using a mobile app.

IMPLEMENTATION AND RESULTS
The first step is to create a deep learning model. Thus, starting with the Inception-v3 pre-trained model developed at Google [30], we added a new layer which can recognise medicinal plants. The Inception-v3 model is a 42-layer deep learning network that has been trained on a subset of the ImageSet dataset consisting of 1000 classes with 1000 images in each category. This method of training is known as transfer learning. It is computationally very expensive to start training from scratch. To further reduce the computational cost, the images are resized to a resolution of 299x299. All the three channels of information (red, green, blue) were used for training.
The training was performed on a powerful desktop computer. In this system, 80% of the images were used for training, 10% for validating the model and the remaining 10% were used for testing the model. An accuracy of about 95% was obtained on both the testing and validation sets. This was deemed to be a satisfactory model and it was therefore integrated into the mobile app using TensorFlow Mobile. The model was saved as a graph.pb file and the labels are stored in a simple text file. The app has been developed only for the Android platform using the Java programming language. The app is available for download for free from PlayStore. The name of the app is MedicPlant. Once the app is installed on the smartphone, an internet connection is not required to use the app. The size of the app is 64M.
There are two main functionalities in the app. The user can use the app to identify medicinal plants. However, there are three ways in which the image can be input into the prediction model. The user can take a new picture, select an existing one from the gallery or use the real-time detection feature as shown in Figure 4. The real-time detection feature is the default one. In this mode, the user simply has to point the mobile phone on the plant and the identification will be done instantaneously as shown in Figure 5, where a Carica papaya (pawpaw) has been identified correctly with a confidence of 77%. If the plant cannot be identified, the information box at the top of the screen remains empty as shown in Figure 6. In the take picture mode, the user can click a picture and if a match is obtained, the result will be displayed as shown in Figure 7. In this case, a full record about the medicinal plant is displayed which contains information such as its scientific name, English name, Mauritian (common name), whether the plant is an endemic or exotic species, a short general description of the plant, the types of illnesses that can be treated with this plant (purpose) and the source of the above information. If the plant cannot be identified, an unknown plant message (toast) is displayed to the user. As shown in Figure 8, a user can also use an existing picture from the smartphone's gallery. The second functionality is the possibility to view the full dataset as shown in Figure 9. This is a list of all plants which contain a small picture of each plant together with their scientific name, English name and Mauritian (common) name. If the user taps on the thumbnail record of a plant, the full record is displayed as shown in Figure 7. The user can also make use of the search button to retrieve the record of a specific plant.

CONCLUSION
In this work, we have developed a mobile application named MedicPlant which uses an artificial intelligence (AI) model based on a deep learning architecture to accurately recognise seventy different medicinal plants from the Republic of Mauritius. The app is available for download for free on PlayStore. The majority of works that have been done on the recognition of plants or medicinal plants have often been developed on the MATLAB platform and are often limited to laboratory settings and therefore, they tend to perform poorly under normal environmental conditions. MedicPlant is the first app for the recognition of medicinal plants which does not require an internet connection to work as the majority of field work is done in mountainous regions, in forests and in remote locations where internet is often not available. We also make use of the largest number of classes with regards to studies on medicinal plants and contrary to many studies where specialized equipment is used to capture images of leaves in specific settings. We have used only smartphones to capture the images of the plants in their natural environment. This research could be extended in three main ways. Firstly, we could gather more images for each plant for the current list of plants and then build a more accurate model for the recognition. Secondly, we could increase the dataset to cover more than one hundred different medicinal plants and thirdly additional information about each plant could be provided to the user. These extensions would enhance the usability of the mobile application.

APPENDIX 1
Seventy medicinal plants with scientific name and common name