Co-training pseudo-labeling for text classification with support vector machine and long short-term memory

Sri Handayani, Rizal Isnanto, Budi Warsito

Abstract


The scarcity of labeled data may hamper training text-processing models. In response to this issue, a novel and intriguing strategy that combines the co-training method and pseudo-labeling design is applied to enhance the model's performance. This method, a component of an efficient semi-supervised learning paradigm for processing and comprehending text, is a fresh perspective in the field. The model, which combines a support vector machine (SVM) for classification and long short-term memory (LSTM) for text sequence interpretation, is a unique approach. By introducing samples that may be marginalized in the labeled data, the co-training approach could help solve the class imbalance problem by using a small amount of labeled data and the rest unlabeled. This study assesses the model's performance using a student dataset from higher education institutions to establish a threshold for each model's degree of confidence and ascertain how much the model can be generalized depending on the threshold. The SVM threshold was calculated as >=0.88, and the LSTM threshold was calculated as >=0.5 using a mixture of confidence metrics.

Keywords


Co-training; Long short-term memory; Pseudo-labeling; Semi-supervised learning; Support vector machine

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i3.pp2158-2168

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats