Impact of smoothing techniques for text classification: implementation in hidden Markov model

Norsyela Muhammad Noor Mathivanan, Roziah Mohd Janor, Shukor Abd Razak, Nor Azura Md. Ghani

Abstract


A hidden Markov model (HMM) is widely used for sequence modeling in various text classification tasks. This study investigates the impact of different smoothing techniques, such as Laplace, absolute discounting, and Gibbs sampling on HMM performance across three distinct domains: e-commerce products, spam filtering, and occupational data mining. Through the comparative analysis, Laplace smoothing consistently outperforms other techniques in handling zero-probability issues, demonstrating superior performance in the e-commerce and SMS spam datasets. The HMM without any smoothing technique achieved the best results for job title classification. This divergence underscores the dataset-specific nature of smoothing requirements, where the simplicity of parameter estimation proves effective in contexts characterized by a limited and repetitive vocabulary. Hence, the findings suggest that tailored smoothing strategies are crucial for optimizing HMM performance in different textual analysis applications.

Keywords


E-commerce products; Job title classification; Occupational data mining; Product classification; Sequential data; Spam filtering; Supervised learning model

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v14.i6.pp5183-5192

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Norsyela Muhammad Noor Mathivanan, Roziah Mohd Janor, Shukor Abd Razak, Nor Azura Md.Ghani

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES).

View IJAI Stats