A systematic literature review of machine learning methods in predicting court decisions

Nur Aqilah Khadijah Rosili, Rohayanti Hassan, Noor Hidayah Zakaria, Shahreen Kasim, Farid Zamani Che Rose, Tole Sutikno School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Johor, Malaysia Faculty of Computer Science and Information System, Universiti Tun Hussein Onn, Johor, Malaysia School of Mathematical Sciences, Universiti Sains Malaysia, Penang, Malaysia Department of Electical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia


INTRODUCTION
The globalised world today demands speedy and efficient handling of every action [1]- [3]. The fastmoving actions are essential in ensuring that the services can be implemented in line with the rapid development of technology and information, including in the legal system [4]- [20]. Judges and lawyers generally handle legal cases, but the help of technology is critically essential due to the massive numbers of cases daily. The effect of 'delay in justice' may lead to various consequences, such as witness hostility, unfitness of the plaintiff or accused and other adverse impacts [21].
Legal professionals currently focus on artificial intelligence [22]. According to historical datasets in the legal context, judicial decisions' prediction is standard and widely practised in the worldwide legal system. Machine learning is a budding scientific algorithms study, and statistical models are artificial intelligence's parts that enable systems to automatically learn and improvise experience from the test data [23]- [30].
The legal system's advancement via the usage of the machine learning algorithm is crucial in reducing the workload of legal professions and saves the time to settle pending cases during the Covid-19 pandemic [21], [31]- [33]. Therefore, this study aimed to investigate the existing machine learning method developed to predict judicial decisions. the cases that used this approach were identified, and the methods' performance was monitored to study the methods' effectiveness.

METHOD 2.1. The review protocol-ROSES
The ROSES review protocol lead the current research. The ROSES protocol is developed for systematic review and environment management field maps [34]- [45]. Additionally, the ROSES protocol also encourages researchers to guarantee that they offer the correct information with explicit details. The researchers began the SLR by formulating research questions according to the review's protocol [46]- [48]. Subsequently, the researchers were required to describe the systematic searching strategy that consists of three processes, such as identification, screening and eligibility. the researchers were also required to perform a quality appraisal of the selected articles. Lastly, the authors elaborated on the outcomes generated from the chosen principal articles.

Formulation of research questions
The research questions for this study were formulated according to the elements of Population or Problem (P), Interest (I) and Context (Co), or PICo. The PICo is a tool to help researchers to construct research questions for the review. The PICo context encompasses the following aspects in this research: i) Population: Machine Learning, ii) Interest: Prediction, and iii) Context: Judicial Decision. The formulated research questions were: 1). What types of judicial decisions have been predicted using the machine learning method? 2). What are the machine learning methods used to predict judicial decisions? 3). How was the performance of the machine learning method used to predict judicial decisions?

Systematic searching strategies
The searching process in SLR comprises three main steps: i) identification, ii) screening, and iii) eligibility [7]. The whole process was summarised in the flow diagram depicted in Figure 1, and explained in the below sections.

Identification
The purpose of the identification process is to maximise the number of keywords to be searched in databases. The keywords are developed based on the research questions formulated. The variation of keywords relies on an online thesaurus to identify synonyms and related terms, keywords used in previous studies and suggested by databases and experts. Nevertheless, the main keywords used in this study are prediction, judicial decision and machine learning. This study refers to two major indexed databases, namely Scopus and Web of Science. These databases were chosen due to several advantages.
First, the databases control the article's quality and consist of articles from various multidisciplinary fields. Second, the databases provide comprehensive and advance searching functions. The researchers constructed a full search string using the Boolean operator "AND" and "OR", phrase searching, truncation and wild card provided in both databases, as Table 1. Furthermore, the identification process also included manual searching to identify relevant articles in predicting judicial decisions using machine learning. This process managed to retrieve 94 articles from Scopus and 32 articles from Web of Science.

Screening
The screening process was undertaken for all the selected articles in the identification process. The purpose of the screening purpose is to include and exclude articles based on the criteria determined. the initial screening process restricts the timeline to be in a specific interval recommended by Okoli [49]. The searching process was limited to articles published from the year 2000 to 2021 only. Nevertheless, the searching process was started in March 2021, and the year has not come to an end. Thus, the findings were limited to March 2021. the second inclusion criterion was the language used in the published articles or journals. All non-English language articles were excluded due to possible translation difficulties. The inclusion and exclusion criteria are enlisted in Table 2. TITLE-ABS-KEY(("predict*" OR "prediction*" OR "predicting*" OR "forecast*") AND ("court decision*" OR "legal decision*" OR "law decision*" OR "judicial case*") AND ("machine learning*" OR "artificial intelligence*" OR "AI*" OR "supervise* machine learning*")) Web of Science (TS = (("prediction*" OR "predict*" OR "predicting") AND ("court decision*" OR "judicial decision*" OR "legal decision*") AND ("machine learning" OR "AI")))

Eligibility
The final process in the systematic searching procedure is eligibility. This process was undertaken manually to review the articles by reading all the articles' titles and abstracts thoroughly. The eligibility step ensures that all the selected articles complied with the pre-determined criteria. the eligibility process included 20 articles retrieved from Scopus and 14 articles from Web of Science after manually reviewed.

Quality appraisal
The purpose of constructing quality assessment (QA) is to decide concerning the chosen studies' overall quality [22]. Thus, the following quality criteria were utilised to evaluate the chosen studies to figure out the strength of the studies' findings: The 26 selected studies were examined through the five QA questions to determine the researchers' confidence in the chosen studies' credibility. Two experts were invited to appraise the QA to determine the articles' content quality. the reviewer ranked the articles into three levels: low, moderate, and high, as suggested by [51]. The articles ranked as moderate and high were eligible for review in the following process. The researchers adapted the scoring strategy employed by [52] to assess the articles' quality. The scoring of the quality evaluation was structured as: i) 1 point represents 'Yes', ii) 0.5 point represents 'Partly', and iii) 0 point represents 'No'. The scoring point ranked the articles into three categories: i) zero (0) to two (2) points were considered as low, ii) two-point-five (2.5) to three (3) points were considered as moderate, and iii) three-point-five (3.5) to five (5) points were considered as high. Finally, only 22 articles were eligible for QA after scoring was undertaken.

RESULTS AND DISCUSSION
The outcomes of the chosen significant studies, visualisation of publication year and the outline of the QA findings are summarised in the following sections.

Selected primary studies
Twenty-two studies were chosen through the SLR to identify the types of legal judgement cases that employ the machine learning method to envisage the findings. Subsequently, the machine learning methods used are listed, and the performance of each method is discussed. Table 3 summarises the selected studies and consists of the studies' identity (ID), the publications' titles, the articles' authors and the articles' publication year.

Publication years
The chosen studies were published between 2000 and 2021. Nevertheless, the earliest study published on this topic was from 2005. Figure 2 displays the number of studies published within the selected timeline. Nevertheless, the graph is not plotted for the year 2021, as the research for the particular year is still ongoing. Overall, the only latest study was published in January 2021, while four articles were published in 2020. Five articles were published in 2019, two in 2018, three in 2017 and two in 2016. Only one article was published in 2012, 2010, and 2006, whereas two articles were published in 2005. Based on the results, many studies were observed to have been published in the last five years. Therefore, the machine learning method can function as one of the approaches in improving the legal system by predicting outcomes.

QA result
The chosen studies were assessed based on the QA questions explained in Section 2.4, and the analysis is presented in Table 4. The table demonstrates that 17 studies received high scores between the total score of three-point-five (3.5) to five (5), whereas five studies obtained a moderate score of 3. Conversely, four studies that obtained low scores were excluded from the review.

Types of judicial decision
The research questions are discussed in this section. The first research question that was addressed: (RQ1) What types of judicial decisions have been predicted using the machine learning method? In the world of the legal system, judgement consists of various subtasks that have to be considered. The legal system is difficult to be understood by the civilians as the legal processes include interacting with a lawyer, hiring the lawyer, proceeding decisions and the legal decisions' consequences and the implications of words in the case files [53]. This study investigated how machine learning can be used in court proceedings to predict judicial decisions. the prediction can be of various types, such as predicting the legal judgement's outcome or the charges that require multilabel text classification. Multiple subtasks in legal judgement typically comprise comprehensive and complex sub-clauses, such as charges, penalty terms, and fines [52]. Nevertheless, most research experimented with a binary task that classifies only two possible outcomes. Besides predicting the outcome of judicial decision, several countries that utilise the civil law system, such as Germany, France and China, deemed that the prediction of relevant articles is a fundamental subtask that guides and supports the prediction [52].
In this SLR, seven research papers were found to have discussed envisaging construction litigation's outcome. Arditi and Phulket [54] mentioned that construction litigation is ordinary in numerous construction projects, explicitly involving large contracts. Miscommunication, insufficient specifications and plans, rigid contracts, changes in site conditions, non-payment, catch up profits, limited workforce, insufficient tools and equipment, ineffective supervision, notice requirements, constructive changes not acknowledged by owner, delays, and acceleration measures provoking claims and causing disputes. Therefore, Arditi and Phulket [54] proposed a tool to predict the outcome of litigation to minimise construction disputes caused by disagreements that are complicated to be settled without engaging in legal actions [54], [55].
Legal action requires a higher settlement cost because the litigation process is costly as the process involves complex issues. Additionally, the disagreement between client and contractor may lead to reputation damage on both sides [54]. In addition, legal action is time-consuming for complex construction disputes and may take two to six years before trial, depending on the jurisdiction [56]. Therefore, the researchers recommend several machine learning methods to ensure the accuracy of predicting a dispute resolution's outcome in courts. the methods can efficiently decrease the number of disputes that require higher spending costs through the litigation process [51].
According to the current study's findings, nine research papers predicted the outcome for crimerelated cases. Nevertheless, crime-related cases can be divided into few categories. Aletras presented the first systematic study that predicted the outcome of cases in the European Court of Human Rights based on textual analysis [57]. The authors classified the prediction outputs into 'violation' and 'non-violation' based on text extracted from previous cases. Further studies were conducted by improving the number of articles and different variables using the same dataset [58]. This proposal can benefit lawyers and judges as a supporting tool to identify cases and extract text that guides decision-making [57].
Luo [59] asserted that the technique of analysing textual fact is crucial for legal assistant systems where civilians unfamiliar with legal terms can find similar cases or possible penalties by describing a case with their own words and understand the legal basis of their search cases. Furthermore, Luo [59] proposed an attention-based neural network method as a standard method to predict charges and extract relevant articles in a unified framework. The findings demonstrated that providing related articles can enhance the charge prediction results and envisage charges for cases with diverse expression styles effectively.
Zhong et. al. [60] proposed a different approach in modelling the judgement prediction framework that utilises multiple subtasks by claiming that previous studies only designed approaches for particular subtasks set and difficult to scale to other subtasks although developed to predict law articles and charges simultaneously. Additionally, it focused on murder related cases by undertaking such analysis. Extraction of legal judgement can be utilised to identify the details of case-specific legal factors but does not involve easy work and is time-consuming. Therefore, essential factors that will affect the prediction for murder related cases are evaluated by preparing a dataset to determine the factors as descriptors for prediction outcomes. The outcome prediction is viewed as a binary classification for classes as 'acquittal' and 'conviction' of the accused person.
The current study's finding is further discussed with cases that do not involve civil law and specifically focus on family law cases. Among the highlighted cases are disengagement, divorce, parental rights and dowry. Ben-David [61] conducted a crucial study regarding court decisions in 'favour' or 'against' the termination of parental rights that found the balance between the child's best interest, the parent's right 1097 and the privacy of the family unit [62]. Li et. al. [63] proposed a prediction model for divorce. the research objectives were to predict the decisions for divorce cases with diverse expression styles and provide an easy understanding to the public regarding the results [60]. In addition, García-Jiménez et. al. [34] studied disengagement prediction where the researchers examined the variable needed by victims from legal proceedings before modelling the prediction model. This study developed a binary logistic regression model that predicts disengagement with two variables that are different from previous approaches. the first variable is the contact with the abuser, whereas the second variable is the interaction between the contact and thought of reuniting with the abuser. The paper aimed to predict disengagement by protecting women from being oppressed by court decisions. They believed that other factors should not influence court decisions in disengagement cases, such as not granted a protection order, not feeling supported by lawyers or unconvincing responses from professionals during proceedings [64].
Beneficiaries in India spent a long time waiting to get decisions from the court due to the scarcity of skilled workforce and infrastructure [21]. The prolonged legal proceeding may lead to various consequences. Sil et al. proposed a model that will assist legal professionals in analysing and performing predictions to give an outcome as 'guilty' or 'not guilty' depending on the parameters of death-related dowry cases [21]. A worker type approach has also been proposed in predicting court decisions for employment rights and protection purposes [65]. The outcome of various types of cases has been explored in predicting the outcome of court decisions using machine learning, leading to a conclusion that there are still opportunities and room for other cases to adapt the machine learning method as a supporting tool in decision-making. Future studies can include an extensive study on cases that require machine learning as a prediction model to lessen decision-making time.

Methods of machine learning
In this section, the following research question is discussed: (RQ2) What are the machine learning methods used to predict judicial decisions? Legal professionals are currently focused on artificial intelligence [66]. Envisaging judicial decisions based on historical datasets in the legal domain is not new and widely used in the legal system globally. Machine learning is an emerging scientific study of algorithms and statistical models that are part of artificial intelligence, enabling the system to learn automatically and improve the experience from test data. The core research aspects in applying machine learning in jurisprudence are the extraction of information and analysis on existing legal documents. in previous practices, lawyers and judges have to do all the works manually. However, machine learning has taken the stream of society to become more intelligent by interpreting the text documents and extracting the documents' content [53].
The researchers observed the proposed machine learning in this SLR by determining the types and names of the classifier used in predicting judicial decisions. the majority of studies attempted to extricate efficient features from text content or case annotations (dates, terms, locations, and types) [1]. Nevertheless, Zhong et al. [60] asserted that the conventional methods could only employ shallow textual features and manually designed factors. the features and factors need enormous human efforts and regularly undergo generalisation problems when applied in other scenarios. the achievement of neural networks on natural language processing (NLP) tasks inspired the researchers to start handling legal judge prediction by integrating neural models with legal knowledge [59]. Luo [59] laid out an attention-based neural network that jointly models charge prediction and relevant article extraction. Nonetheless, these models are designed for specific subtasks. Therefore, non-trivial elements should be widened to other subtasks of legal judge prediction with complex dependencies.
The current study researchers classified the methods using two types: single classifier and combined classifier. Subsequently, the researchers identified the name of the classifier(s) involved as the prediction model. the single classifier refers to an individual model of machine learning that is used in the prediction. In contrast, combined classifiers refer to an ensemble model that used more than one classifier in making predictions. As shown in Table 5, the most common classifier is the support vector machine (SVM). According to the current SLR, six papers proposed SVM as the prediction model in various cases.
Nevertheless, this finding cannot be concluded as the preferred method in prediction as other models also displayed a good performance in predicting judicial decisions depending on cases. The ensemble method provides an enhanced approach when compared with another approach. Thus, the researchers concluded that this research area is still new and open for exploration. This research is still actively ongoing in the recent five years, as observed in Figure 2. Therefore, a great opportunity is present for further research concerning implementing machine learning methods in predicting court decisions.

Performance of the machine learning methods
The following research question is addressed in this section: (RQ3) How was the performance of machine learning methods used to predict judicial decisions? The performance of the prediction model proposed should be assessed prior to understanding the approach used. the efficiency of any machine learning model can be measured through k-fold cross-validation, accuracy, sensitivity, specificity, recall, precision, and F-measure [63]. Based on the observations from the 22 reviewed papers, most researchers used accuracy, precision, recall and F-measure in evaluating the performance of their models. F-measure, precision and recall are frequently utilised in extracting information as performance measurement since machine learning performance assessments include specific trade-off levels between true positive and true negative rates [63]. Table 6 summarises the information regarding accuracy, precision, recall or sensitivity adapted from [21]. There are four important terms used in measuring the performance metrics, namely true positive (tp), true negative (tn), false positive (fp) and false negative (fn) [21]. Earlier research (S1, S2, S3 and S4) used different approaches in evaluating the performance of the methods used. the average prediction rate generated in the reported study is within the range of 80% to 91%. Nevertheless, the study was expanded into the next stage by adjusting the number and format of attributes and the number of cases used to better predict rates [67]. The ratio of a correctly predicted result to the total actual result tp + tn tp + tn + fp + fn Precision The ratio of a correctly predicted positive result to the total positive predicted result tp tp + fp

Recall of Sensitivity
The ratio of a correctly predicted positive result to the total result tp tp + fn F1 Score The weighted average of precision and recall if the class distribution is uneven 2(recall * precision) recall + precision The most intriguing finding of the SLR found is that 16 out of the 22 selected review papers obtained more than 80% of accuracy, precision or prediction rate through the evaluation process. Only four papers (S7, S10 S15 and S22) obtained the range of accuracy or precision of 50% to 70%. Conversely, two papers (S6 and S13) did not discuss the performance of their prediction models in detail. the summary of the performance results of the 22 reviewed papers is presented in Table 7. This approach explicitly observed that the prediction model could be a reliable supporting tool in determining court decisions as the models' performance achieved more than 70% overall accuracy. Int J Artif Intell ISSN: 2252-8938 

CONCLUSION
This study has presented an investigation regarding predicting court decisions using machine learning methods. The importance of predicting judicial decisions can be identified in various cases and from the research outcome obtained. This approach can improvise the legal system by making it more systematic and reliable. the methods and features derived from the findings could fill the existing gaps in the study area for future scholarly work. This systematic review study is expected to contribute to the body of knowledge by providing an overview regarding existing models used in predicting judicial decisions, the performance of the predicting model and discussion on several types of cases in the legal system that adapted this approach. The review also offers several recommendations for future studies, including new types of cases for predicting judicial decisions and a new machine learning method that requires a combined classifier to improve the predicting tools' performance.