IAES International Journal of Artificial Intelligence (IJ-AI)

Rico Kurniawan, Budi Utomo, Kemal N. Siregar, Kalamullah Ramli, Besral, Ruddy J. Suhatril, Okky Assetya Pratiwi Department of Biostatistics and Population Studies, Faculty of Public Health, Universitas Indonesia, Depok, Indonesia Department of Electrical Engineering, Faculty of Engineering, University of Indonesia, Depok, Indonesia Faculty of Computer Science and Information Technology, Gunadarma University, Depok, Indonesia Department of Environmental Health, Faculty of Public Health, Universitas Indonesia, Depok, Indonesia


INTRODUCTION
Hypertension is a serious disorder that can lead to a variety of life-threatening conditions, including cardiovascular disease [1], [2]. It is thought to contribute to 13 to 19% all deaths worldwide each year, [1], [3]- [5] and it is projected about 1.56 billion people will experience hypertension by 2025 [6]. Around 22% of the world's population aged 18 years or older have elevated blood pressure. In Indonesia, approximately 34% of the population aged 15 years or older have high blood pressure, higher than the world average, but only 8.8% person who have elevated blood pressure are aware that they have elevated blood pressure [7]. Uncontrolled blood pressure conditions increase risks such as coronary heart disease, heart failure, stroke, myocardial infarction, atrial fibrillation, peripheral artery disease, chronic kidney disease, and cognitive impairment. It is known that hypertension is a significant contributor to deaths caused by these diseases [8]. Hypertension is the leading risk factor for cardiovascular disease, which is a modifiable risk factor. However, as a medical condition, high blood pressure is influenced by several factors, both demographic and lifestyle factors [9]. Risk factors such as age, sex, family history, smoking habit, alcohol consumption, body mass index, waist circumference, hip circumference, and waist-hip ratio are among the most practical and cost-effective measures for predicting cardiovascular risk as well as hypertension [9]- [11]. Prevention of hypertension and its complications has long been a subject in the public health domain. Population-based approaches to reducing risk factor levels through lifestyle modifications are getting increasing attention in preventing, detecting, evaluating, and treating high blood pressure. Early identification of risks and classification of blood pressure conditions is essential for controlling hypertension [12]. Early identification of blood pressure levels makes it possible to classify the blood pressure condition, whether normal, prehypertension, or hypertension. The classification demonstrates the progressive nature of hypertension and highlight the possibility of early detection of prehypertension and advanced hypertension [13].
Prediction of the risk hypertension is expected to improve decision making [14]. The use of predictive models for hypertension, either in routine care or at the community level, has many potential benefits, including adjusting the medication and intensity of prevention strategies in high-risk populations. Risk prediction of hypertension is used to identify individuals at high risk of hypertension and then take preventive strategies to delay or prevent the onset of hypertension so that health complications related to hypertension can be controlled. In fact, many hypertension risk prediction models have been constructed but could not be generalized to all populations [15]. Most predictive models have been developed in developed countries, and only a few have been based on populations in developing or less developed countries [9]. Differences between populations in terms of risk factors and characteristic will affect the result of risk prediction [16], [17]. Hence, it is necessary to construct a risk-prediction model specific to the Indonesian population. Also, many prediction models that have been developed still use a traditional statistical approach.
ML approaches to predict and classify health outcomes are increasingly used in the health sector. ML as a part of artificial intelligence (AI) is gaining immense attention in the management of chronic disease and is considered a promising alternative to traditional methods for clinical predictions [11], [18], [19]. Therefore, developing a hypertension prediction model using a ML approach is necessary. Identifying and concentrating on people who are at high risk is one effective preventative strategy [12]. This study aims to develop and validate a hypertension risk-prediction model using a machine-learning algorithm for the Indonesian population. So, it is necessary to build prediction models that can assist in diagnosing hypertension. The combination of many methods to detect hypertension may be of great use either for clinical or communities, particularly on the Indonesia population [20].

METHOD
This is a cross-sectional study using the fifth Indonesia life family survey (IFLS5) conducted in 2014/2015. The IFLS is a longitudinal panel survey begun in 1993. This survey collects extensive information at the individual, household, and community levels on the socio-demographic factors and health and measures vital health information, including blood pressure [21]. The data used in this study were individuals aged 15 years and over who had their blood pressure measured.

The features
The predictors or features used in this study include socio-demographic factors (age, sex, employment status, and education), [9], [14] body mass index, [22] lifestyle factors (tobacco use and physical activity), history of chronic diseases (diabetes and/or high cholesterol), blood pressure, and acute morbidity symptoms (headache). The average of three measurements of systolic blood pressure (SBP) and diastolic blood pressure (DBP) were used to indicate blood pressure condition. Blood pressure was recorded using an Omron meter by trained interviewers at home with the respondent in a seated position [23]. Because this measurement falls in the out-of-office measurement or home blood pressure measurement (HBPM) category, we identified respondents has having elevated blood pressure if SBP ≥ 135 and/or DBP ≥ 85. [24], [25] physical activity in IFLS5 was assessed using the modified international physical activity questionnaire [23]. For further analysis, we categorized physical activity into two groups: insufficient and sufficient. The outcome variable was diagnosis with hypertension (systolic blood pressure ≥140 and/or diastolic blood pressure ≥90) of a person aged 15 years old or older by a health worker or a person who routinely takes antihypertension medication.

Model development and evaluation
Before developing the hypertension prediction model, we conducted a univariate correlation to explore the characteristics of the data and identified correlations between predictors and the target variable. Using the orange data mining application, several machine-learning models were compared, namely, decision tree, random forest, gradient boosting, and logistic regression. We divided the data randomly into training (75%) and testing (25%) data, using data sampler tools in the orange data mining application.
To evaluate the models, we used ten-fold cross-validation. This approach splits the dataset into ten equal groups at random, each with a comparable proportion of hypertensive people. Each subset is used by the orange software as a test dataset interchangeably; the remaining data are used to train the models. Several parameters, such as the area under the curve (AUC), classification accuracy (CA), precision (rate of true positives among data classified as positive), recall/sensitivity (rate of correctly predicted positive observations to the total observations in the actual class), and F1 score (weighted average of precision and recall) were used to compare the models.

Subject characteristics
From 48,139 individuals recorded in IFLS5, 32,804 people aged 15 years or over were eligible for this study. After data pre-processing, 30,320 individual data were suitable for further analysis and use in developing a prediction. In all, 3,637 individuals (12%) were diagnosed with hypertension. Using blood pressure measurements at data collection, 9,992 (32.96%) of respondents were identified as having high blood pressure (SBP ≥ 135 and DBP ≥ 85) from the HBPM guideline, of whom 26% were clinically diagnosed. In bivariate analyses, all predictors in this study showed a strong association with hypertension, except for physical activity (p=0.730). Table 1 showed us the characteristic of study participants and associate with hypertension condition. Hypertension was also more prevalent in females (14.4%) than males (9.3%). There were also differences in social determinants, such as education and employment status. Those with higher education had less hypertension (9.3%) than those with low education (14.3%). Hypertension was higher in non-tobacco users (13.2%) was higher than tobacco users (9.9%). We found that physical activity was not significantly different between people with hypertension and people without hypertension (p=0.735). Physical activity is considered one of the main factors that affect blood pressure in general, so we included it as a predictor in model development using machine-learning algorithms.

Model prediction performance
In developing models with a ML approach, there are several algorithms that can be used. we applied several algorithms to develop a hypertension prediction model using ML: decision tree, random forest, and logistic regression. The dataset that had previously been divided into testing data and validation data was analyzed using the four algorithms previously mentioned. Table 2 describes the parameter values used to assess the performance of the algorithms.  The machine-learning algorithm was shown to have good predictive values. Random forest and decision tree are algorithms showed better accuracy and precision than the others in the prediction results. However, after evaluating the algorithms in testing, it was found that logistic regression and gradient boosting resulted in better parameter values. We found that the logistic regression model had better parameter values than the others, with AUC 0.829, accuracy 0.898, recall (sensitivity) 0.896, precision 0.878, and F1 score (the weighted average of precision and recall) 0.877.
The AUC value obtained from the logistic regression model was 0.829, indicating that the model could distinguish between the class's pf hypertension and non-hypertension better. The AUC reflects how well the model recognized the distinction. The greater the AUC, the better. AUCs of 0.5 or above indicated that the classifier had a good probability of distinguishing hypertension from non-hypertension class values. The other intuitive performance indicator for machine-learning algorithms that was used was CA, which is assessed as the rate of correctly predicted observations to total observations. Our model received a score of 0.898, indicating that it predicted hypertension with 89.8% accuracy. Figure 1 shows a comparison of receiver operating characteristic (ROC) curves for each algorithm used in this study. The ROC curve is a visualization that indicates the algorithm's performance to do classification. The closer the curve to the top-left corner, the better an algorithm performs the classification. The ROC curve for the logistic regression algorithm Figure 1(a), with an AUC value of 0.829, shows that the curve is closer to the top left corner. Gradient boosting dan random forest Figure 1(b) and Figure 1(c) have AUC values lower than logistic regression 0.821 and 0.781, respectively. Meanwhile, the algorithm whose curve is further away from the top-left corner is the decision tree Figure 1(d) algorithm with an AUC value of 0.544 Thus, these ROC graphs show that logistic regression is the better classifier than other algorithms used in this study A machine-learning model, in this case a logistic regression algorithm, is not much different from the multivariate logistic regression analysis of the type used in other statistical software. The overall percentage in the classification table in the multivariate logistic regression test is 89.5%. Therefore, home blood pressure measurement (HBPM), after being controlling for other predictors, have a predictive ability to of 89.5% of diagnosing hypertension relative to diagnosis of a physician or health worker examination as a gold standard.
High blood pressure is a major risk factor for morbidity and mortality worldwide, especially in cardiovascular diseases [26]. In general, hypertension risk factors can be grouped into two, namely modifiable and non-modifiable risk factors. Modifiable risk factors include diet, smoking behavior, [27], [28] alcohol consumption, [29], [30] stress level, [31] physical activity, [32]- [34] and body mass index. Risk factors that cannot be modified include age, [35], [36] sex, parental history of hypertension, and other genetic factors [37]. Early intervention both in terms of lifestyle modification and appropriate treatment in a condition of blood pressure is recognized to reduce the risk of hypertension [17]. Therefore, the ability to predict an individual's risk of developing hypertension will be very helpful for health workers and for the community broadly speaking. Early identification of blood pressure condition will help health workers to plan and administer lifestyle modification recommendations or therapeutic interventions to prevent or delay the development of hypertension [9], [16], [38].
Its diagnostic accuracy and prognostic significance in predicting cardiovascular events give home blood pressure monitoring the potential to enhance hypertension control and make it a helpful addition to standard office blood pressure readings [39]. This study indicates that at least about 89.6% of people with elevated blood pressure based on HBPM have clinical hypertension, where SBP ≥ 140 mmHg and/or DBP ≥ 85 mmHg. This is in line with the resutls reported by Jacob George, who foujnd that around 15-30% of home blood pressure measurements are not capable of determining the classification of blood pressure [25].
Four machine-learning algorithms were tasked with producing hypertension predictions based on non-invasive data collection. Age, sex, level of education, working status, tobacco usage, physical activity, body mass index, diabetes history, high cholesterol history, and home blood pressure measurement are all significant predictors of hypertension and were used. The algorithms demonstrated good prediction accuracy in general, with logistic regression doing better than decision tree, random forest, and gradient boosting algorithms in terms of discrimination ability. In a similar study, [38], [40]  gradient boosting model that we used in this study were only slightly different from those in the logistic regression model. The application of ML is relatively new in public health study, a field of science that focuses on the construction and study of systems that can automatically learn from data to generate highly accurate predictive models [10]. Machine-learning predictive models can generate robust diagnostic parameters because they produce correct predictions from observed correlations [41]. Machine-learning models can identify which variable or group of variables is most useful for predicting hypertension [10]. Health research could benefit from using machine-learning techniques to verify combinations of variables that best predict particular outcome, which is hypertension in our case.
The use of hypertension predictive models both in health facility or community has several benefits, including enabling adjustment the prescription/therapy and intensity of preventive solutions in those at high risk of developing hypertension, as well as improving shared decision making through accurate risk communication to people at high risk. Apart from its use in routine clinical situations, the prediction of hypertension risk scores can also be used to identify people at high risk for inclusion in hypertension and project the future burden of hypertension at the community level. In each of these applications, estimates of hypertension risk obtained from predictive models must be accurate and valid [9], [14].
Illness prediction, disease categorization, and medical image recognition methods are just a few of the many ML approaches that have been extensively used in medicine [42]. Hypertension prediction models using a machine-learning approach can produce a robust prediction model [10], [12]. On the other hand, traditional statistical approaches such as binary logistic regression or linear regression require several essential assumptions such as independence and multicollinearity, while ML does not take these assumptions into account.

CONCLUSION
Nowadays ML often employed in sophisticated data analysis and optimization techniques for many types of medical issues. Machine-learning models have been widely used for making predictions, especially in the health sector. Although much research has been conducted on hypertension, no one can claim that we have developed a universal human instrument to anticipate hypertension. However, many of the prediction models that have been developed are still underutilized, both in health care facilities and in the community. Researchers prefer to employ fewer components and overlook the impact of others since hypertension is so complicated and related with so many variables. The hypertension prediction model that we developed here estimates the probability of a person's risk of hypertension based on blood pressure measurements taken at home. This hypertension prediction model could be used to assist decision making both at the clinical level or at the level of the health care facility and at the household or community level. Further development and translation of machine-learning algorithms into decision support system applications is very important. Use of this model is easy, based on simple predictors, and would not require invasive interaction with patients. From this study, we estimate that 89.6% of people with elevated blood pressure obtained through home blood pressure measurement will show clinical hypertension.

LIMITATIONS
Our findings were based on a single cross-sectional study to predict hypertension. These predictors have a restricted range of use, and their value may change over time. The study data could not reflect the entirety of the population of Indonesia. Longitudinal data are needed to predict the risk of new-onset hypertension and produce a better prediction model. ISSN: 2252-8938 