Machine learning approaches for anomaly detection of Jakarta air quality index
Abstract
Anomalies in time series data are observations that deviate markedly from surrounding values or overall patterns. Air quality index (AQI) data, which vary over time, provide a suitable context for anomaly detection. Time series anomaly detection can be done with machine learning approaches like long short-term memory (LSTM) and extreme gradient boosting (XGBoost). These methods have advantages over conventional methods in handling nonlinearity and large data dimensions. This study compares LSTM and XGBoost methods for detecting anomalies in Jakarta's hourly AQI data. The dataset was obtained from the AirNow website and covers the period from January 1, 2018, to December 31, 2023. Anomalies in the observed data were labeled using moving range (MR) (2) and (3) approaches with three and four-sigma thresholds, and feature engineering (FE) was applied to improve model performance. The results indicate that LSTM is more suitable than XGBoost for forecasting and classification tasks in AQI data. LSTM achieved an average mean absolute percentage error (MAPE) of 10.3840%, a root mean square error (RMSE) of 10.5913, and a balanced accuracy (BACC) of 0.9424 under MR (2) labeling with the four-sigma rule. The anomalies detected mostly occurred between 21:00 and 09:00 and during the rainy season.
Keywords
Air quality index; Anomalies; Machine learning; Outliers; Time series
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v15.i3.pp2543-2553
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Muhammad Rizky Nurhambali, Yenni Angraini, Anwar Fitrianto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).