Classification of adult autistic spectrum disorder using machine learning approach

Received Oct 7, 2020 Revised May 2, 2021 Accepted May 16, 2021 Autism spectrum disorder (ASD) is a neurological-related disorder. Patients with ASD have poor social interaction and lack of communication that lead to restricted activities. Thus, early diagnosis with a reliable system is crucial as the symptoms may affect the patient’s entire lifetime. Machine learning approaches are an effective and efficient method for the prediction of ASD disease. The study mainly aims to achieve the accuracy of ASD classification using a variety of machine learning approaches. The dataset comprises 16 selected attributes that are inclusive of 703 patients and non-patients. The experiments are performed within the simulation environment and analyzed using the Waikato environment for knowledge analysis (WEKA) platform. Linear support vector machine (SVM), k-nearest neighbours (k-NN), J48, Bagging, Stacking, AdaBoost, and naïve bayes are the methods used to compute the prediction of ASD status on the subject using 3, 5, and 10-folds cross validation. The analysis is then computed to evaluate the accuracy, sensitivity, and specificity of the proposed methods. The comparative result between the machine learning approaches has shown that linear SVM, J48, Bagging, Stacking, and naïve bayes produce the highest accuracy at 100% with the lowest error rate.


INTRODUCTION
Autism spectrum disorder, or mainly known as ASD, is a neurological disease either in children or adults. Poor social interaction and communication have become more prevalent due to neurological disease [1]. Three psychological aspects of ASD are clinically diagnosed such as speech and language, mutual communication, and limited activities. ASD is identified in one lifespan and claimed as a psychological disorder in which the symptoms have occurred during the first two years [2]. Commonly, the beginning of ASD symptoms is during childhood and remains until the entire lifetime.
Furthermore, the potential factors for ASD are biological and environmental. Numerous diagnosis approaches have been applied for ASD such as autism diagnostic observation schedule-revised (ADOS-R) and autism diagnostic interview (ADI) [3], [4] autism quotient trait (AQ) [5], and social communication questionnaire (SCQ) [6]. Most of these approaches have employed mathematical formulas to diagnose accuracy. Thus, reliable clinical methods are highly demanded enhancing the accuracy and significant period to diagnose the disease [7].
Nevertheless, recent studies of ASD using machine learning did not foresee the conceptual, implementation, analysis, validation, and challenges. These challenges are not restricted to forms in which Int J Artif Intell ISSN: 2252-8938  Classification of adult autistic spectrum disorder using machine learning approach (Nurul Amirah Mashudi) 745 sequence section in more detail. Last of all, the classification methods used in this study are described concisely.

ASD for adult dataset
In this study, we used data ASD screening data for adults obtained from the UCI machine learning repository [16]. The data was extracted using mobile application called ASD test conducted by Thabtah [8]. The dataset consists of 703 subjects with 21 features of adults' screening data for autism. The response class is categorized into two classes in which adults with ASD have 189 subjects, and adults without ASD have 515 subjects. Ten behavioral features are proven to be effective and reliable in differentiating ASD cases from the controls and 10 individual features. The features used in this study along with their types and description are given in Table 1.
The app enables users to analyze ASD behaviors using four modules [17]. One of the early autism screening tools intended to help adults to identify autistic symptoms in a personality questionnaire is the autism spectrum quotient (AQ) [18]. Initially, fifty questions of the AQ test concerned the cognitive skill of autism. Each question has four possibilities which are definitely agree, slightly agree, slightly disagree, and definitely disagree, however, there is a score on each question [17]. A shortened AQ test namely AQ-10-Adult that consisted of 10 questions was proposed by Allison et al. [19]. Nevertheless, during the screening process, users should choose the similar four options for each question in the AQ-10-Adult test to compute the score using diagnostic rules. An autistic person can be classified if any individual is scored more than six. There would be either 0 or 1 point for each question; a point is added when the answer to questions 1, 7, 8, and 10 are either "Slightly Agree" or "Definitely Agree". Additionally, there is either slightly or definitely disagree with the answer to question 2-6, and 9 [17]. The response is clarified based on the screening process AQ-2 Binary (0, 1) The response is clarified based on the screening process AQ-3 Binary (0, 1) The response is clarified based on the screening process AQ-4 Binary (0, 1) The response is clarified based on the screening process AQ-5 Binary (0, 1) The response is clarified based on the screening process AQ-6 Binary (0, 1) The response is clarified based on the screening process AQ-7 Binary (0, 1) The response is clarified based on the screening process AQ-8 Binary (0, 1) The response is clarified based on the screening process AQ-9 Binary (0, 1) The response is clarified based on the screening process AQ-10 Binary (0, 1) The response is clarified based on the screening process Age description Text Age category Screening score Integer The total score was determined using the implementation of the screening algorithm Class/ASD Boolean (yes or no) The result is shown after the test

Methods
The process of the classification system in this study is illustrated in Figure 1, to present a better understanding of the implementation of the process. The explanation of each task is given in the subsequence section.

Software platform
The Waikato environment for knowledge analysis (WEKA) was used in this study to perform data pre-processing and classification of ASD for adult dataset [20]. WEKA is an open-source machine learning software using the JAVA programming language. Most researchers used the software as it supports numerous data mining functions for instance classification, clustering, association, data pre-processing, feature selection, and regression.

k-Fold cross validation
The ASD is divided into k subsets for adult data. In general, the data (k-1)/k is used for training, and the data 1/k is used for the testing. Then, the process is reiterated k-times. As a final point, the validation result of mean k-time is selected as the last rate estimation. In this study, the performance is measured by 3, 5, and 10-folds cross validation, which is the ratio of training and testing at 67:33, 80:20, and 90:10, respectively.

Data pre-processing
In this study, all missing values were substituted for nominal and numerical data to tackle the issues of inadequate and incompatible data with missing values. Furthermore, data are filtered into nominal features using discretization to generate strong results for a variety of numerical features in data. The discretize equation is written is being as, where ∆ is known as the step size or time step.

Performance evaluation
The performance of the classification model is measured by the amount of test data that are formulated using a confusion matrix based on correctly and incorrectly predicted models. The measurement of accuracy, sensitivity, and accuracy is then calculated from the confusion matrix. The confusion matrix of ASD and No-ASD is shown in Table 2.  True positive (TP) is the data that the patient has identified with ASD.  False positive (FP) is the data that non-patient has identified with ASD.  False negative (FN) is the data that the patient has not identified with ASD.  True negative (FN) is the data that the non-patient has not identified with ASD.
The equation of accuracy, sensitivity, and specificity are being as,

Classification methods 2.3.1. k-Nearest neighbours
k-Nearest Neighbours is also known as k-NN. It is a supervised machine learning technique to overcome challenges in classification and regression [21]. The number of classes in the dataset with a small value and positive integer is the initial value of the input parameter. The majority of neighbours are classified as the input data. The k-NN algorithm needs to run several times with different K values and choose the K that reduces the number of errors and maintains the prediction accuracy. Thus, in this case, the input parameter K of the ASD dataset is 3. A brute force search algorithm is implemented by using the Euclidean distance function for the nearest neighbour search as in (5). The function of Euclidean distance is used to compute the distance between instances that is good for numeric data on the same scale.

Linear support vector machine
Support vector machine or SVM is a supervised learning technique used for classification [22] and regression. Principally, the function of SVM to classify the outcomes by mapping data between input vectors to a huge perspective space. Thus, linear SVM aims to fully utilize the distance between the decision hyperplane and the marginal distance, which is the nearest data point [23]. In this study, SVM has implemented John Platt's sequential minimal optimization algorithm for training a support classifier as in (6). SVM is used to obtain the performance accuracy for ASD screening models. The linear regression is selected as the calibrator in the SVM classifier with the M5 selection method. Furthermore, PolyKernel is chosen as the kernel function.

Naïve bayes
Naïve bayes is one of the supervised machine learning approaches that are mainly known as Bayesian algorithms with a simple probability distribution [24]. The main principle of naïve bayes is focused on the expectations of freedom, which indicates less training time to be compared to the SVM approach. Furthermore, naïve bayes is also known as numeric estimator precision values that are chosen based on the analysis of the training data. The equation of naïve bayes is stated in (7).

J48 decision tree
J48 decision tree is a comprehensive machine learning approach [25], which has been used by most researchers nowadays. Generally, J48 is used to develop a classification tree based on a hierarchical tree system, in which the decision results have illustrated the attributes and terminal nodes. The visual classification of the J48 approach is effective and efficient. Nevertheless, J48 is vulnerable to the noise in the data [26]. Variety of decision trees algorithms used for classification such as classification and regression tree (CART), chi-square automatic interaction detector (CHAID), ID3, and C4.5. Therefore, J48 is implemented in this study as one of the classification accuracies approaches.

AdaBoost
Adaptive boosting known as AdaBoost was developed by Freud and Schapire [27]. AdaBoost is a supervised learning algorithm of machine learning application. The core idea of AdaBoost is to match a sequence of weak learner models that are more effective than random guessing. Each instance in the training dataset is weighted to determine the accuracy either it is classified correctly or incorrectly. The decision stump is used as a classifier for AdaBoost models. The primary purpose of the decision stump is to boost the AdaBoost M1 nominal classifier. Only minor class problems can be tackled. The final prediction is then obtained from the combination of the predicted model based on a weighted majority vote (classification) or weighted sum (regression). In (8) shows the formula of AdaBoost.

Bagging
Bagging is one of the most popular techniques in ensemble methods and is known as bootstrap aggregation. Bagging is the earliest and simplest algorithm developed by Breiman [28]. This method can be used to reduce the variance for the algorithms that have high variance such as decision trees. In this study, bagging is used to predict ASD disease. The equation of the bagging method is stated in (9). The fast decision tree learner algorithm is used as the default classifier to enhance the classification accuracy. The algorithm generates a decision tree and prunes it with a reduced-error with back fitting. The lack of values was coped with by dividing the corresponding instances into bits. The final decision tree was obtained as a composition of all base classifiers with the maximum votes.

Stacking
Stacking is an ensemble machine learning approach used to integrate either diversified classification or regression through meta-classifiers. The features on the results of the base level are prepared using a proper training set that contains various machine learning approaches. Thus, stacking is a stratified approach. In this study, stacking is employed for ASD disease. Various classifiers are implemented in stacking such as 0-R, naïve bayes, logistic regression, sequential minimal optimization (SMO), k-NN (k=3), PART, fast decision tree learner, and J48 decision tree. PART decision list is selected as a meta-classifier in this study.

RESULTS AND DISCUSSION
Data pre-processing is the initial stage to be performed prior to the simulation for all models. According to the ASD for the adult dataset, there are few missing values in the features and one value for gender feature that contains an irrational number has caused inconsistent value. Thus, data with missing values have been omitted. Several features or attributes that do not contribute to autism were omitted to enhance the classification accuracy of ASD data such as ethnicity, country of residence, used app before, age description, and relation. Hence, the number of features used in this study has been reduced to 16 features which were age, gender, jaundice, autism, screening score, 1-10 questions related to autism behavioral features, and class/ASD. Once feature selection was executed, the numerical features were filtered into the nominal features using discretization that required all attribute indices. The classification process was then implemented using k-NN, linear SVM, naïve bayes, J48, AdaBoost, Bagging, and Stacking.
The confusion matrix was computed for each model to obtain the significant prediction of class. Confusion matrices associated with the seven different machine learning approaches are tabulated in Table 3. A 10-fold cross validation was carried out to predict the results. The confusion matrices Table shows some machine learning approaches used in this study that have produced the highest predicted class for ASD and non-ASD patients who have or have not identified with ASD disease. The machine learning approaches that influenced the best accuracy of the ASD class are linear SVM, naïve bayes, J48, Bagging, and Stacking.
The accuracy, sensitivity, and specificity of the classification methods were compared by the k-fold cross validation as tabulated in Table 4. The discretization techniques were performed to all k-fold cross validation throughout the pre-processing process. The classification accuracy results for each approach have increased when the k-fold cross validation was escalated. AdaBoost reported a similar result with 98.3% for 3-fold and 10-fold cross validation. Nevertheless, the 10-fold cross validation result presented better performance with the lowest error rate compared to the smaller k-fold cross validation. The finding indicates several proposed machine learning approaches have produced the best classification accuracy at 100%, respectively. Furthermore, the classification accuracy for Stacking and k-NN (k=3) methods were boosted from 99.7% to 100% and 98.6% to 99.2%, respectively, as the k-fold cross validation increases.
The classification accuracy for all machine learning approaches with 10-fold cross validation is demonstrated in Figure 2. The machine learning approaches; Stacking, Bagging, J48, and linear SVM have produced 100% without error rate. However, naïve bayes has produced accuracy results at 100% with a minimum error rate of 0.0028. As the result of these approaches have shown better performance for k = 3 and k = 5, thus the performance testing is sufficient to achieve at 3-fold cross validation.  Figure 2. Classification accuracy of machine learning approaches for 10-fold cross validation Altay and Ulas [12] have applied the k-NN method in the ASD for child dataset for comparative analysis. The study has conducted 70% of the training dataset and 30% of the testing dataset. Based on Table 5, the accuracy of our proposed k-NN method has shown significant accuracy for the adult dataset at 99.1% as compared to the accuracy result for the child dataset, due to different approaches for cross validation and various numbers and subjects in the dataset. The implementation of SVM according to the literature has presented a variety of outcomes, including our proposed approach. The comparative result of the SVM method is classified in Table 6 shows our proposed approach in linear SVM has produced the highest accuracy compared to the methods in the literature. Li et al. have collected data for 16 autism spectrum condition (ASC) adults and 16 healthy adults [9]. The study was conducted for 40 means and standard deviations with eight situations and three questions. Nevertheless, the study has implemented two types of SVM, which are RGF and linear. The result has shown that linear SVM has high accuracy compared to RBF SVM. Thus, the different procedures and approaches in this study have influenced the outcome of accuracy. Vaishali and Sasikala [10] used ASD for the child dataset with 10-fold cross validation. However, the authors have performed binary firefly as a feature selection to optimize the performance process. The contradiction of subjects in the dataset and feature selection leads to the classification accuracy that produces 97.8% compared with our proposed approach. Moreover, to compare the linear SVM [10] and k-NN [9] result as both authors have implemented similar dataset, linear SVM has shown a better performance by using the selected feature.  Table 6. Comparison of the classification accuracy using SVM approach Author(s) Accuracy (%) B. Li et al. [9] 86.7 R. Vaishali and R. Sasikala [10] 97.95 M. Duda et al. [11] 96.5 This study 100 A comparative analysis of the classification accuracy between the proposed method and in the literature is presented in Figure 3. The proposed method used in this study almost surpassed the other methods proposed by the author in literature. Linear SVM, naïve bayes, J48, bagging, and stacking have the same accuracy rate as a result in [13], which is 100% accuracy. Thus, the approaches have enhanced the accuracy rate from 3-fold cross validation to 10-fold cross validation.

CONCLUSION
A cognitive disorder that prevents verbal and speech development, analytical, and social skills, is known as ASD. The potential factors for ASD are biological and environmental. A significant challenge for autism is upgrading the performance of the diagnostic forms in the current screening tools to minimize the diagnostics time effectively without affecting the validity or sensitivity of the test. The proposed study adopted the ASD screening data for adults to predict the classification model of ASD disease specifically for the adult who is ASD patient and non-patient that has or has not classified with ASD disease. Crossvalidation was implemented with 3, 5, and 10-folds into the dataset. Thus, to evaluate the classification accuracy with other methods in the literature, only 10-fold cross validation was used. The data-preprocessing stage was performed through the dataset by replacing missing values and discretization afterward. Few features were omitted which have no significant value for the classification process. In addition to ASD studies, machine learning approaches showed strong findings in various applications. Machine learning approaches such as Bagging, Stacking, AdaBoost, linear SVM, naïve bayes, J48, and k-NN used to classify data correctly have therefore been proposed. According to the results, Bagging, Stacking, linear SVM, naïve bayes, and J48 have achieved a significant accuracy at 100%, respectively. The accuracy results in this study were compared to the previous works that used a variety of ASD repositories. Besides, accuracy, specificity, and sensitivity were also counted in this study to find the number of patients with ASD disease and without ASD disease. Therefore, machine learning methods used in this study can significantly contribute new methods to diagnosis cases related to ASD and minimize the features of current ASD methods without affecting the performance of specificity and accuracy of the test.