Effect of word embedding vector dimensionality on sentiment analysis through short and long texts

Mohamed Chiny, Marouane Chihab, Abdelkarim Ait Lahcen, Omar Bencharef, Younes Chihab

Abstract


Word embedding has become the most popular method of lexical description in a given context in the natural language processing domain, especially through the word to vector (Word2Vec) and global vectors (GloVe) implementations. Since GloVe is a pre-trained model that provides access to word mapping vectors on many dimensionalities, a large number of applications rely on its prowess, especially in the field of sentiment analysis. However, in the literature, we found that in many cases, GloVe is implemented with arbitrary dimensionalities (often 300d) regardless of the length of the text to be analyzed. In this work, we conducted a study that identifies the effect of the dimensionality of word embedding mapping vectors on short and long texts in a sentiment analysis context. The results suggest that as the dimensionality of the vectors increases, the performance metrics of the model also increase for long texts. In contrast, for short texts, we recorded a threshold at which dimensionality does not matter.

Keywords


Deep learning; Gated recurrent unit; Global vectors; Sentiment analysis; Word embedding

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v12.i2.pp823-830

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.