Enhancing medical language models with big data technologies

Ayoub Allali, Ibtihal Abouchabaka, Najat Rafalia

Abstract


In this study, we present an end-to-end, big-data–driven framework for continuously enriching and fine-tuning large language models (LLMs) with the latest professional and scientific medical knowledge. Streaming updates from premier sources such as The New England Journal of Medicine (NEJM) are ingested via an Apache Kafka cluster for low-latency delivery and durably archived in a three-node Apache Hadoop (Hadoop distributed file system (HDFS)) system. Each new article is preprocessed into high dimensional embeddings and indexed in a Milvus vector database to enable sub-second semantic retrieval over millions of records. At query or batch time, our retrieval-augmented generation (RAG) module retrieves the top-k relevant embeddings from Milvus and injects them into prompts for DeepSeek-R1, GPT-4o-mini, and Llama 3, models which are hosted, fine tuned, and served via Ollama on an NVIDIA GeForce RTX 3050 Ti GPU for efficient inference and continual learning. The enriched outputs are seamlessly delivered to end users through a Telegram bot programmed in Python using the Telebot library, linking the RAG-enhanced LLMs to an intuitive chat interface. Our Kafka, HDFS, Milvus, RAG, LLM, or Telegram bot pipeline demonstrably improves factual accuracy and topical currency of AI-generated medical insights across clinical decision support, patient engagement and education, drug discovery and development, virtual health assistants, and mental health support, laying the groundwork for truly intelligent, responsive, and data-driven healthcare solutions.

Keywords


Big data; DeepSeek-R1; GPT 4o-mini; LLMs; Llama3; Retrieval-augmented generation; Vector database

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v15.i1.pp289-299

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Ayoub Allali, Ibtihal Abouchabaka, Najat Rafalia

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats