Classification of Kannada documents using novel semantic symbolic representation and selection method
Abstract
Kannada is one of the 22 scheduled Indian regional languages. It is also a low-resource regional language. The Kannada document classification is arduous due to its vocabulary richness, agglutinative terms, and lack of resources. The good representation and the prominent feature selection aid in solving the challenges in document classification tasks. In this paper, we are proposing semantic symbolic representation and feature selection method, for better representation of Kannada terms in interval values embedded with positional information. Following, selection of prominent discriminative symbolic feature vectors is also proposed. Further the symbolic document classifier is used to classify the Kannada documents. The proposed cluster based symbolic representation preserves the intra class variance and reduces the ambiguity in classification of Kannada documents. The experiments are performed over two Kannada document datasets which are multilabel and unbalanced. The comparative analysis of proposed method with other standard methods is also presented.
Keywords
Classification; Feature selection; Kannada documents; Semantic analysis; Symbolic representation
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i4.pp3354-3365
Refbacks
Copyright (c) 2025 Institute of Advanced Engineering and Science
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).