Towards a disease prediction system: biobert-based medical profile representation

Rima Hatoum, Ali Alkhazraji, Zein Al Abidin Ibrahim, Houssein Dhayni, Ihab Sbeity


Predicting diseases in advance is crucial in healthcare, allowing for early intervention and potentially saving lives. Machine learning plays a pivotal role in healthcare advancements today. Various studies aim to predict diseases based on prior knowledge. However, a significant challenge lies in representing medical information for machine learning. Patient medical histories are often in an unreadable format, necessitating filtering and conversion into numerical data. Natural language processing (NLP) techniques have made this task more manageable. In this paper, we propose three medical information representations, two of which are based on bidirectional encoder representations from transformers for biomedical text mining (BioBERT), a state-of-the-art text representation technique in the biomedical field. We compare these representations to highlight the powerful advantages of BioBERT-based methods in disease prediction. We evaluate our approach efficiency using the medical information mart for intensive careIII (MIMIC-III) database, containing data from 46,520 patients. Our focus is on predicting coronary artery disease. The results demonstrate the effectiveness of our proposal. In summary, BioBERT, NLP techniques, and the MIMIC-III database are key components in our work, which significantly enhances disease prediction in healthcare.


Clustering; Coronary artery disease; Disease prtrediction; Healthcare

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats