A spark-based parallel distributed posterior decoding algorithm for big data hidden Markov models decoding problem

Imad Sassi, Samir Anter, Abdelkrim Bekkhoucha


Hidden Markov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.


Apache Spark, Big data, Cloud computing, Hidden Markov models, Posterior decoding, Parallel distributed approach

Full Text:


DOI: http://doi.org/10.11591/ijai.v10.i3.pp789-800


  • There are currently no refbacks.

View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.