Co-training pseudo-labeling for text classification with support vector machine and long short-term memory
Abstract
The scarcity of labeled data may hamper training text-processing models. In response to this issue, a novel and intriguing strategy that combines the co-training method and pseudo-labeling design is applied to enhance the model's performance. This method, a component of an efficient semi-supervised learning paradigm for processing and comprehending text, is a fresh perspective in the field. The model, which combines a support vector machine (SVM) for classification and long short-term memory (LSTM) for text sequence interpretation, is a unique approach. By introducing samples that may be marginalized in the labeled data, the co-training approach could help solve the class imbalance problem by using a small amount of labeled data and the rest unlabeled. This study assesses the model's performance using a student dataset from higher education institutions to establish a threshold for each model's degree of confidence and ascertain how much the model can be generalized depending on the threshold. The SVM threshold was calculated as >=0.88, and the LSTM threshold was calculated as >=0.5 using a mixture of confidence metrics.
Keywords
Co-training; Long short-term memory; Pseudo-labeling; Semi-supervised learning; Support vector machine
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i3.pp2158-2168
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).