Graph Transformer for Cross-Lingual Plagiarism detection

Oumaima Hourrane, EL Habib Benlahmar


The existence of vast amounts of multilingual textual data on the internet leads to the cross-lingual plagiarism phenomenon that becomes a serious problem in different areas such as education, literature, and science. Current cross-lingual plagiarism detection approaches usually employ syntactic and lexical properties, external Machine Translation (MT) systems, or similarities with a multilingual set of documents. However, most of these methods are conceived for literal plagiarism such as copy and paste, and their performance is diminished when handling complex cases of plagiarism including paraphrasing. In this paper, we propose a graph-based approach that represents text fragments in different languages using knowledge graphs. We present as well a new graph structure modeling method based on the Transformer architecture that uses explicit relation encoding and provides a more efficient way for global graph representation. The mappings between the graphs are learned both in semi-supervised and unsupervised training mechanisms. Experimental results in  French-English and Spanish-English plagiarism detection indicate that our graph transformer approach outperforms the state-of-the-art cross-lingual plagiarism detection approaches. It proves effective in dealing with paraphrasing cases of plagiarism and provides as well interesting insights on the use of knowledge graphs on a language-independent model.


Knowledge Graphs; Graph Neural Network; Graph Transformer; Cross-lingual Plagiarism



  • There are currently no refbacks.

View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.