Efficient commodity price forecasting using long short-term memory model

ABSTRACT


INTRODUCTION
Commodity price forecasting is an essential task for stakeholders such as governments, policymakers, retailers, and customers.Accurate commodity price forecasting has a huge impact on managing inflation, securing commodity supplies, and avoiding socio-economy disruptions [1], [2].Particularly in a world marked by interconnected economies and complex trade relationships [3], precise commodity price forecasts can aid in avoiding crises and promoting stability in the global commodity supply chain [4].In recent years, commodity price fluctuation has been a significant concern for stakeholders around the world.Several factors have contributed to these fluctuations, leading to challenges in managing commodity supplies, inflation, and socio-economic stability [5]- [8].Particularly, food price fluctuations can have far-reaching consequences, especially for vulnerable populations in low-income countries, where a significant portion of income is spent on food.High and unpredictable food prices can lead to food insecurity, malnutrition, and social unrest [9], [10].Governments, policymakers, and international Int J Artif Intell ISSN: 2252-8938  Efficient commodity price forecasting using long short-term memory model (Mohammad Tami) 995 organizations continue to work on measures to mitigate the impact of these fluctuations.Accurate food price forecasting remains a crucial tool in managing these challenges and promoting global food security [11]- [13].
Many traditional statistics and machine learning (ML) models have been utilized to accomplish commodity price prediction [14]- [18], specifically food price forecasting [19]- [22].These models are easy to implement and don't require huge computing power.However, these models failed to capture complex patterns and trends in the time-series data since they have prior assumptions about the data [23], [24].With the rise of deep learning (DL) power, neural network (NN) models gain attraction for being used in many downstream forecasting tasks [25], [26].Historically, different methods have been applied in commodity price prediction.These include statistical approaches like autoregressive integrated moving average (ARIMA) [27] and seasonal autoregressive integrated moving average (SARIMA) [28], machine learning techniques such as support vector machine (SVM) [29], and deep learning models like long short-term memory (LSTM) [30], [31].LSTM models are often preferred over other models for time series forecasting due to their ability to capture long-term dependencies and handle sequential data effectively [32].LSTM's non-parametric characteristic, combined with its ability to handle non-linear patterns and its independence from the need for a stationary process, make it a favorable option for implementation in time-series applications [33].
Various researchers have extended LSTM models to improve forecasting performance.For instance, Ly et al. [34] introduced a hybrid model that combines the strengths of LSTM and ARIMA to forecast the prices of cotton and oil, their approach depends on training two separated models, namely LSTM and ARIMA, then averaging the forecasting result from both trained models to achieve better result compared to each individual model.In contrast, Krishnan et al. [35] utilized a diverse set of complex LSTM models, namely basic LSTM, bidirectional LSTM (Bi-LSTM), stacked LSTM, convolutional neural network LSTM (CNN-LSTM), and convolutional LSTM (Conv-LSTM), in their five-commodity forecasting study.Each of these models offered unique architectural variations to explore and analyze their predictive capabilities.The authors argued that their complex models are capable of capturing complex patterns and dependencies within the commodity market data, allowing for more accurate and robust predictions.On the other hand, the use of deep learning models in combination with natural language processing such as transformer-based models [36] and bidirectional encoder representations from transformers (BERT) [37] has been gaining traction in recent years due to their ability for capturing market sentiment tracking and context-aware analysis.For instance, Sonkiya et al. [38] proposed a generative adversarial network (GAN) combined with BERT model for predicting the price of stock predictions, in their study, BERT is utilized to analyze news and headlines, extracting valuable insights.These insights are then incorporated into the GAN model as external factors, which lead to enhancing their model's predictive capabilities.
Despite the advancements in machine learning and deep learning models, there remains a need for models that are not only accurate but also interpretable and computationally efficient.The black-box nature of many deep learning models often leads to a need for more interpretability, which can slow down their adoption in certain sectors [39].On the other hand, the computational requirements of these models can be a limiting factor, especially in low-resource settings [40].This calls for the development of models that balance accuracy, interpretability, and efficiency, which is the focus of this paper.This paper aims to address these limitations by introducing a straightforward effective LSTM model for predicting the prices of various commodities.The developed model demonstrates comparable performance compared to other deep learning models.The paper approach places significant emphasis on the feature engineering aspect of the task.It was found that implementing diverse feature transformations, such as moving averages and volatility changes, greatly enhances the performance of the suggested model.The model's simplicity and efficacy, combined with less demand for computational resources, suggests its potential for broader application in commodity price forecasting.Future work could involve the inclusion of external factors such as social media sentiment and news headlines into the model to further improve its predictive performance.

METHODOLOGY
A comprehensive methodology is used in this paper to address the research objectives effectively.Figure 1 illustrates the chronological flow chart of the adapted methodology.The first step involved data acquisition.Subsequently, a data cleaning process was performed where some missing prices were imputed by the average of the lag and lead prices.Next, an exploratory data analysis (EDA) phase was conducted to gain deeper insights into the dataset, identify patterns, and establish descriptive statistics that aid in understanding the underlying characteristics of the data.Following the EDA, feature engineering techniques were employed to transform the raw data into meaningful and informative features that can enhance the performance of the models.After feature engineering, data transformation techniques were applied to normalize the data, ensuring that all variables are on a comparable scale.This step helps in improving the model's convergence and performance.Subsequently, the modeling phase involved a training loop, wherein  ISSN: 2252-8938 Int J Artif Intell, Vol.14, No. 1, March 2024: 994-1004 996 various hyperparameters were tested to optimize the model's performance and achieve the best possible results.Finally, the performance of the trained models was evaluated using appropriate evaluation metrics to assess their effectiveness in solving the research problem.This methodology ensures a systematic and rigorous approach to conducting the study, enabling reliable and insightful conclusions to be drawn from the analysis.Through this research, we will illustrate the process of choosing appropriate features, transforming them suitably, and training an LSTM model for forecasting commodity prices.The outcome is expected to provide valuable insights for stakeholders in the commodity industry, enhancing their decision-making capabilities and promoting more efficient practices.

Dataset
The foundation of the proposed model relies on an extensive dataset that encloses a detailed history of various commodity prices within the State of Palestine.The origin of this dataset is the World Food Program Price Database [41], an extensive global source of commodity price information that spans 98 countries and approximately 3,000 distinct markets.The database audits prices for many commodities, including but not limited to bread, meat, milk, oil, and petrol.Although the database is refreshed weekly, the availability of monthly data is more frequent.Specifically, the dataset under consideration for current research referred to as 'Palestine Food Price Dataset', boasts a rich history of commodity prices in Palestine dating back to 2007, and spans approximately 28,000 entries.The volume of data provided in this dataset introduces an opportunity to derive significant insights into food security studies within this region.The Palestine food price dataset consists of 14 attributes, which include date (on a monthly frequency), district (West Bank & Gaza), city, geographical locations (latitude and longitude), commodity category, commodity item, unit, and price, among others.Samples of prices were collected from twelve distinct cities in the West Bank and Gaza.The dataset comprises commodities that are categorized into eight groups, with a total of 39 categories within the commodity type.The dataset was cleaned and preprocessed prior to analysis.This process involved dealing with missing data, transforming attribute types, and normalizing certain variables.

Feature engineering
Time series datasets distinguish themselves from other types of datasets mainly due to their inclusion of a temporal component (time), which introduces an extra dimension to the analysis [42].Unlike typical datasets where data points might be independent, time series data is sequential, and each point often has a relationship with its preceding and sometimes succeeding points.The temporal nature of such datasets makes them particularly challenging to analyze and model, especially when the goal is to forecast future values.Additionally, due to the inherent characteristics of time series data like autocorrelation, the risk of misleading interpretations increases if the analysis isn't handled properly.To address these challenges and effectively capture underlying patterns, it's essential to apply certain transformations.These transformations not only assist in identifying trends and seasonality but also in reducing the impact of noise and outliers.By enhancing each data point with rich information from prior periods, models can be trained to make more accurate and informed predictions, ultimately leading to more insightful and actionable results [43].
Efficient commodity price forecasting using long short-term memory model (Mohammad Tami) 997

Moving average
A moving average, also known as a rolling or running average, is a technique often used in timeseries data analysis to smooth out short-term fluctuations and highlight long-term trends or cycles [44].The idea is to calculate the average of a particular subset of numbers, and as new data comes in, recalculate that average by moving the subset window forward as described in (1).

Simple moving average
Where Pi is the rice of the i th period, N is the total number of time periods.On the other hand, the exponential moving average (EMA) is a type of moving average that places more weight and importance on the most recent data points while still considering the historical data [45].This makes the EMA more responsive to recent price changes compared to the simple moving average (SMA), which gives equal weight to all data points as shown in (2).
Where: C is the current data point price, P is the exponential moving average of the previous period, and N is the total number of time periods.* Simple moving average applied for the first period.
In this study, we found that the simple moving average worked better when we used it on four different time periods: 6 months, 12 months, 18 months, and 24 months as illustrated in Figure 2. The superior performance of the simple moving average over the four distinct time periods can be attributed to several reasons.Firstly, commodity prices often follow cyclical patterns and trends, and the simple moving average, being a trend-following method, can effectively smooth out short-term fluctuations, capturing the underlying price movement.Secondly, using multiple time periods allows capturing varying dynamics of the market, from short-term to more extended cycles, providing a more comprehensive insight.Price volatility refers to the rate at which the price of an asset, such as a stock or commodity, increases or decreases.It is a statistical measure of the range of the change for a given market index.High volatility indicates that the commodity's price can change significantly in a short time frame in any direction, whereas low volatility implies that the price remains steady [46].
Volatility is often calculated using the standard deviation or variance between returns from the same market index.The most used method is the standard deviation which is typically calculated as illustrated in (3), where n is the number of price returns used in the calculation, x represents each individual price return, μ is the average (mean) of the price returns, Σ denotes the sum of the squared differences.The price volatilities for each data point were calculated over five different time frames, including three months (quarterly), six months (semi-annually), one year, two years, and five years.This analysis aids the model in predicting future trends.When there is significant volatility during a certain period, the model is more likely to forecast a greater probability of significant price fluctuations as illustrated in Figure 3. Results in Figure 3 show the volatility of a price over a 3-month window size.It shows a line graph with the x-axis representing time and the y-axis representing the calculated volatility of the asset's price at each data point.Volatility measures the level of price fluctuation; higher volatility indicates greater price variability.The chart's line illustrates how the volatility changes over time, highlighting periods of higher and lower price instability.In the task of price forecasting using LSTM models, these volatility lines play a crucial role.LSTM models can incorporate volatility as a feature, helping to capture market dynamics and improve forecasting accuracy.High volatility periods can indicate potential market disruptions or significant price movements, which LSTM models can leverage to generate more accurate predictions.

Long short-term memory (LSTM): An overview
Long short-term memory (LSTM) is a variation of recurrent neural network (RNN) architecture, designed to model temporal sequences and their long-range dependencies more accurately than vanilla RNNs.It was proposed by Hochreiter and Schmidhuber [32] in 1997.The key to LSTM is the cell state.This is a kind of "conveyor belt" that carries information across time steps with only minimal changes, which helps to mitigate the vanishing gradient problem faced by traditional RNNs [47].In LSTMs, the information flows through a mechanism that is controlled by various gates as illustrated in Figure 4.These gates decide what information should be kept or discarded at each time step.− Forget gate: This gate decides which piece of info should be kept or thrown away.Input is passed through this gate, which processes it using a sigmoid function.− Input gate: The input gate produces a new cell's state via utilizing a sigmoid function that decides which part to be updated and a tanh which creates a new candidate vector.− Output gate: This gate produces the next hidden state to the cell that embedded info about the previous input.This hidden state is crucial for prediction.

Model architecture and training loop
The primary aim of this study is to utilize sophisticated feature engineering methods to build a straightforward yet efficient deep learning model.The LSTM model architecture constructed 77 input layers, and 4 hidden layers, succeeded by a fully connected neural network to combine the spatial data from the surrounding station's layer, which takes 4 inputs from the final hidden layer.Lastly, the rectified linear unit (ReLU) function is applied to prevent the model from predicting negative values.This structure excels due to the advanced transformations applied to the data prior to the training.The training was carried out on a local MAC laptop equipped with 16GB memory and an 8-Core Intel processor.Remarkably, the training time for the model on the specified dataset was a few minutes, which is a significant time reduction compared to the computational resources of other researchers which usually take hours as other researchers reported in [33]- [34].The LSTM model was implemented using the pytorch deep learning library.Three categorical variables (city, category, commodity) were encoded using one hot encoding.The numerical variables were normalized using MinMaxScaler to ensure they were in the same range.Lastly, the dataset was divided into three subsets: training, evaluation, and testing, with ratios of 70%, 15%, and 15% respectively as illustrated in Figure 5.Where n is the number of data points, Yi is the observed price, and Fi is the forecasted price.
To optimize the training process, the widely used Adam optimizer [48] was utilized, leveraging its adaptive learning rate capabilities.The training loop was executed for a substantial number of iterations, precisely 1000 epochs, ensuring that the model had the opportunity to learn from the data.To prevent overfitting, two effective early-stopping techniques were employed [49].These techniques acted as gatekeepers during the training process, monitoring the model's performance closely.Training would halt if either there was minimal improvement in the training loss; precisely 1/1000 of the learning rate or if the validation loss increased for two successive epochs.By incorporating these strategies, as described in Table 1, the training process became more robust and resilient to overfitting, ultimately leading to a more accurate model.

Model evaluation
The performance of the proposed model was assessed using three evaluation metrics described in ( 5) to (7) and those are: root mean squared error (RMSE), Mean absolute percentage error (MAPE), and the coefficient of determination (R 2 ).RMSE measures the square root of the average of the squared differences between predicted and actual values, providing a measure of the model's accuracy.MAPE calculates the average percentage difference between predicted and actual values, offering insights into the model's relative performance.Lastly, the coefficient of determination (R2) indicates the proportion of the variance in the dependent variable that is predictable from the independent variable, indicating the model's ability to explain the data's variability.In equation ( 5)- (7), Ai and Fi are the actual and forecasted prices, respectively, µy is the mean price of all data points, and n is the number of data points.

RESULTS AND DISCUSSION
The utilized dataset consists of eight categories containing thirty-nine distinct commodities.For training the model, five commodities were selected, resulting in varied but closely related outcomes.The differences in these outcomes can be attributed to the availability of pricing data for each commodity.Among the predictions, the estimation of Bread price showed exceptional performance as shown in Figure 6, with a RMSE of merely 0.14, this suggests that the average difference between the forecasted and actual prices was exceptionally low, indicating the model's precision in capturing the price fluctuations.Additionally, the MAPE of 3.04% indicates that, on average, the model's predictions were within a very small percentage of the actual prices, further substantiating its reliability.Moreover, the high R-squared value of 98.2% denotes an impressive fit of the model to the observed data, indicating that a significant proportion of the variability in Bread prices was accurately accounted by the LSTM model.For other commodities i.e. (meat, milk, oil, and petrol).Table 2 summarizes the evaluations conducted on all of them.The findings in this paper show a notable advancement of the proposed model over earlier published models.Several previously published models have been compared to our model to offer a thorough assessment of the efficiency and accuracy of the proposed model in this paper.The advantages of the proposed model become clear when considering its superior scores across all evaluation metrics.Furthermore, while many models in the past have shown strength in one or two metrics but weakness in others, our model maintains consistency in its high performance across RMSE, MAPE, and R 2 .This consistency is indicative of the robustness of our approach, setting a new standard in the field.Table 3 presents a comparison of different predictive models in the literature, evaluating their performance based on RMSE, MAPE, and R 2 metrics.The models presented in Table 3 showcase various methods, including hybrid approaches, recurrent neural networks (RNNs), and generative models.The first model, a hybrid of ARIMA and LSTM, achieves an RMSE of 0.15 and a MAPE of 4.3%.However, the R 2 value is missing, making a comprehensive evaluation challenging.The second model, a stacked LSTM, performs similarly in terms of RMSE (0.15) but significantly better in MAPE (0.079%) and R 2 (96.8%), highlighting its superiority over the hybrid model.The third model based on gated recurrent unit (GRU) presents an RMSE of 0.7 without reporting the MAPE and R 2 values, and this makes the overall performance of the model difficult to be assessed.The fourth model uses sentiment analysis with a generative adversarial network (S-GAN) and achieves an RMSE of 0.56, but again, the lack of MAPE and R 2 values limits its evaluation.The fifth model, a hybrid of time delays neural network (TDNN) and ARIMA perform poorly with an RMSE of 3.35, with MAPE and R 2 values not provided for further assessment.Our LSTM model presented herein delivers outstanding results with an RMSE of 0.14, a MAPE of 3.04%, and an impressive R 2 value of 98.2%.These findings demonstrate the LSTM model's superior predictive accuracy and its ability to capture underlying patterns effectively.In conclusion, the comparison of various predictive models highlights the superiority of the stacked LSTM and the LSTM model presented in this paper.Both models exhibit remarkable performance with low RMSE, low Notably, the LSTM model presented in this paper stands out due to its simple design and effective feature engineering techniques.Despite its simplicity, our model demonstrates outstanding predictive accuracy, surpassing even the stacked LSTM in some metrics.This is due to the emphasis on feature engineering that has likely contributed to the model's ability to capture essential patterns in the data, leading to superior predictions.Moreover, one of the key advantages of the LSTM model presented in this paper is its ability to achieve such high performance with relatively low computation power and training time.This is crucial in practical applications where computational resources and time are essential.The model's efficiency in training and inference makes it an attractive choice for real-time predictions and large-scale deployments.

CONCLUSION
The study presented in this paper focuses on predicting essential commodity prices using an LSTM model enhanced by feature engineering.Our results indicate that this approach yields competitive performance compared to other models found in the literature.The proposed model demonstrated superior performance, particularly 0.14, 3.04, and 98.2% in RMSE, MAPE, and R 2 respectively.The simplicity and computational efficiency of the proposed model make it a promising approach for commodity price forecasting, especially in scenarios where computational resources may be limited.Future work could explore several directions.For instance, future models could incorporate external factors into the price prediction model.This could be achieved by using natural language processing techniques to analyze news headlines and social media sentiment related to the commodities under consideration, or by integrating economic indicators and climate data.Further research could also explore the use of other deep learning architectures for commodity price prediction.Hybrid models combining the strengths of different architectures could be particularly promising.

Figure 2 .
Figure 2. Simple moving average of five window sizes

Figure 3 .
Figure 3. Price volatility over a 3-month window size

Figure 5 .
Figure 5.The dataset splits into three subsets: training, evaluation, and testing

ISSN: 2252- 8938 Figure 6 .
Figure 6.The actual and predicted price of Bread

Table 2 .
Model evaluation on five commodities

Table 3 .
Comparison between results in the literature and the proposed model