Financial text embeddings for the Russian language: a global vectors-based approach

Kostyantyn A. Malyshenko, Dmitriy Anashkin

Abstract


The article presents a software implementation of the linguistic embedding method for the Russian language, based on the global vectors for word representation (GloVe) model. The GloVe method allows to obtain word vectors that reflect their semantic and syntactic properties. The resulting vector model can be used in various natural language processing (NLP) tasks, such as machine translation and text clustering. The article describes the architecture of software that implements a method similar to the GloVe algorithm for Russian-language financial texts. The mechanisms used to train the model as well as to compute word vectors are described. Testing with typical classification methods demonstrated that the developed program generates accurate vector representations of Russian-language texts, proving effective in various NLP tasks. This work is one of the first studies devoted to the software implementation of the GloVe method for the Russian language using learning algorithms based on sparse matrices. The results of this study can be used in various NLP tasks, such as machine translation and text clustering.

Keywords


Global vectors; Linguistic embedding; Machine learning; Natural language processing; Russian language

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i1.pp692-701

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats