FrBMedQA: the first French biomedical question answering dataset

Zakaria Kaddari, Toumi Bouchentouf

Abstract


FrBMedQA is the first French biomedical question answering dataset, containing 41k+ passage-question instances. It was automatically constructed in a cloze-style manner, from biomedical French Wikipedia articles. To test the validity and difficulty of the dataset, we experimented with four statistical baseline models, a biomedical bidirectional encoder representation from transformers (BERT)-based model, and two French BERT-based language model. We also did human evaluation on a subset of the test set. All the three tested models were not able to surpass the best performing baseline model. Human performance at 61.11% is leading the leaderboard with more than 8% from the best performing model. We made available the dataset and the code to reproduce our results.

Keywords


Biomedical; Dataset; FrBMedQA; Information retrieval; Question answering;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v11.i4.pp1588-1595

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.