Classification of Kannada documents using novel semantic symbolic representation and selection method

Ranganathbabu Kasturi Rangan, Bukahally Somashekar Harish, Chaluvegowda Kanakalakshmi Roopa

Abstract


Kannada is one of the 22 scheduled Indian regional languages. It is also a low-resource regional language. The Kannada document classification is arduous due to its vocabulary richness, agglutinative terms, and lack of resources. The good representation and the prominent feature selection aid in solving the challenges in document classification tasks. In this paper, we are proposing semantic symbolic representation and feature selection method, for better representation of Kannada terms in interval values embedded with positional information. Following, selection of prominent discriminative symbolic feature vectors is also proposed. Further the symbolic document classifier is used to classify the Kannada documents. The proposed cluster based symbolic representation preserves the intra class variance and reduces the ambiguity in classification of Kannada documents. The experiments are performed over two Kannada document datasets which are multilabel and unbalanced. The comparative analysis of proposed method with other standard methods is also presented.

Keywords


Classification; Feature selection; Kannada documents; Semantic analysis; Symbolic representation

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i4.pp3354-3365

Refbacks



Copyright (c) 2025 Institute of Advanced Engineering and Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats