Electrocardiogram sequences data analytics and classification using unsupervised and supervised machine learning algorithms
Abstract
This paper explores the prediction of cardiovascular disease (CVD) through the classification of electrocardiogram (ECG) sequences using both supervised and unsupervised machine learning (ML) algorithms. ECG 5000 dataset is considered to perform essential data analytics, clustering, and classification, effectively categorizing ECG heartbeats into optimal groups to forecast CVD. The Elbow and Silhouette methods are applied to estimate optimal number of clusters within the dataset. Using K-means and hierarchical clustering algorithms, the data is grouped into two and five distinguishable clusters, with performance metrics indicating that two clusters are more viable. Subsequently, multiple supervised ML classifiers—including kernel classifiers, support vector machine (SVM), naïve Bayes (NB), decision trees (DT), k-nearest neighbor (KNN) and neural networks (NN)—are trained on the labeled and clustered datasets to ensure accurate classification of ECG sequences and anomaly detection. A novel modified ML classifier, kernel-SVM with Chi-Square (χ²) feature selection, is introduced and demonstrates exceptional performance, achieving an impressive accuracy of 0.9848, recall of 0.9973, and a training time of 1.6944 seconds, surpassing benchmarks from prior research. The results and discussion section includes a comparison of various algorithm performances, affirming that the proposed approach is an alternative to the complex deep learning (DL) and transformer-based models.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v14.i3.pp2055-2071
Refbacks
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).