Explainable deep learning for scalable record linkage: a TabNet-based framework for structured data integration

Fatima Zahrae Saber, Ali Choukri, Mohamed Amnai, Abderrahim Waga

Abstract


Record linkage is considered a fundamental process for ensuring data quality and reliability, with critical applications in domains such as healthcare, finance, and commerce. A machine learning-based approach for optimizing record linkage in structured datasets is presented in this paper. By integrating hybrid blocking methods (combining standard blocking and sorted neighborhood approaches) with advanced similarity measures, computational overhead is significantly reduced while high accuracy is maintained. The performance of TabNet, a deep learning model designed for tabular data, is compared with traditional deep neural networks (DNNs) in the classification phase. Experimental results on a synthetic dataset of 5,000 records demonstrate that comparable precision and recall are achieved by TabNet to DNNs while execution time is reduced by over 79%. The scalability and efficiency of the proposed method are highlighted by these findings, making it well-suited for large-scale data management tasks. Practical and computationally efficient solutions for record linkage in the era of big data are contributed to by this work.

Keywords


Big data; quality; Deep neural networks; Record linkage; TabNet

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v15.i1.pp725-743

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Fatima Zahrae Saber, Ali Choukri, Mohamed Amnai, Abderrahim Waga

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats