FrBMedQA: The first French biomedical question answering dataset
Abstract
FrBMedQA is the first French biomedical question answering dataset, containing 41k+ passage-question instances. It was automatically constructed in a cloze-style manner, from biomedical French Wikipedia articles. To test the validity and difficulty of the dataset, we experimented with four statistical baseline models, a biomedical bidirectional encoder representations from transformers (BERT)-based model, and two French BERT-based language model. We also did human evaluation on a subset of the test set. All the three tested models were not able to surpass the best performing baseline model. Human performance at 61.11% is leading the leaderboard with more than 8% from the best performing model. We made available the dataset and the code to reproduce our results.
Keywords
biomedical; dataset; FrBMedQA; information retrieval; question answering;
DOI: http://doi.org/10.11591/ijai.v11.i4.pp%25p
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.