Industrial AI Blog from China

A sales prediction method based on LSTM with hyper-parameter search

18 May 2021

Dai Yun

Dai Yun
Hitachi (China) Research & Development Corporation

In business operations, being able to accurately forecast sales is critical in planning production as well as supply, affecting merchants' replenishment, promotion, and so on. Existing sales forecast methods focus on three categories: statistical ways, [1] traditional machine learning,[2] and deep learning.[3]

Looking to the real world, we face three challenges in accurate prediction: sparse data, user preference, and a single model with good performance. Data sparsity occurs when a certain product or products only represents a small part of all products in the sales records every day Further many products only record sales in certain periods and none in other periods. User preference refers to business operations which may seek a form of forecast results depending on their strategy. For example, if the predicted value is small, the shop may not restock in time and result in customer dissatisfaction and loss. If the predicted value is large, this will increase logistical costs in transportation and storage. If the cost of storage cost is less than cost of being out-of-stock, appropriately increasing the predicted value is a benefit to the whole business operation.
And then there is the third challenge. In a real-world forecasting system, although the ensemble model is effective, it is not easy to maintain. A single model with good performance is more valuable.
However, few experts and scholars consider these limits during the sales forecast. The ensemble model which combines multiple machine learning models is effective in real forecast systems however its drawback is that it isn’t easy to maintain. Therefore, a single model with good performance would be more valuable but there is not enough work that considers a single practical and effective sales forecasting model for the real-world.

Our proposal

We are designing a LSTM (long short-term memory) model to overcome the challenges in real-world sales forecasting. LSTM is an extension of RNN (recurrent neural network) that can learn long-term dependencies in the sequence. As the loss function commonly used in LSTM assigns equal weighting to each prediction, and incorporate user preferences, it can accommodate business strategies by appropriately underestimating or overestimating predicted values. We designed a new loss function which can adjust by weight (“w”); the greater the “w”, the more the overestimation of the predicted result - but it still based on the law of ordinary predicted value. Further, as the sales of merchandise cannot be less than “0,” the rectified linear activation function (ReLU) was defined as ReLU (x) = max (x, 0). The model structure and loss function are as shown in Figure 1 below.

Figure 1. Model structure and loss function

Also, while hyper-parameter search is an effective method for deep learning that automatically adjusts parameters in LSTM,[4] if this method is used on all of the data, it will take an unbearable amount of time. We therefore decided approach the problem from a greedy strategy perspective where we would sample a representative training dataset, use a large-scale search on a small dataset and then, use the full data for fine-tuning, as this would help to balance the cost of accuracy and time.

Results from our experiments

We used auto machine learning (AutoML) and ordinary LSTM models as a baseline, and conducted experiments on the Rossmann Store dataset provided by Kaggle, which consists of sales and store information from 2013 to 2015 including 1115 chain stores. A sparse dataset was extracted to simulate the real-world challenge. The results of the weekly prediction are shown in Table 1. Our conclusions were that:

(1): In AutoML models, the prediction effect of KNN and random forest is the worst. Relatively, CatBoost / XGBoost / LightGBM significantly improved the performance, but the stacking ensemble of all the models did not contribute to better results.
(2): Compared with AutoML models, the LSTM model showed better performance in predictions.
(3): Compared with the ordinary LSTM model, the prediction results for the first, third, and fourth weeks were significantly improved by adding a hyper-parameter search. The lowest (best!) root mean square error (RMSE) and root mean squared percentage error (RMSPE) results are seen in the first-week prediction. Both MAPE and RSMPE for our design (LSTM + hyper-parameter search) showed an improvement of 79% and 75%, respectively, compared to when we only applied the LSTM model.

Table1
Table1. Results of weekly predictions using different models

Table 2 shows the corresponding results. We found that the predicted value increased as w increased. When w=5, the result is close to the true value which is the prediction without preference. When w=3 and w=7, it gave an underestimated and overestimated prediction, respectively.

We extracted the predicted values for the four stores in the same week to verify the influence of the loss function weight on the predicted values.

Table2
Table2. Influence of different weights on the prediction results

To verify the prediction ability of our proposed model at different time granularities, we conducted a monthly sales volume prediction. The results are shown in Table 3.

We found that the error value in the weekly forecast is smaller, and the closer the time, the more accurate the forecast, but this trend is not obvious in the monthly forecast. Overall, the model achieved good results in different time granularities.

Table3

Table3. Different period prediction results using LSTM + hyper-parameter search

Conclusion

We designed and proposed a new LSTM model with a special loss function and hyper-parameter search which can be applied to overcome challenges in real-world sales forecasts. The new model was compared with traditional methods, and the results indicated that the proposed model could provide more reliable predictions.

If you would like to learn more, please read the full paper which I co-authored with Jinghao Huang at South China Normal University. The paper entitled, "A Sales Prediction Method Based on LSTM with Hyper-Parameter Search" was presented at the 2020 International Conference on Industrial Applications of Big Data and Artificial Intelligence (BDAI2020) which was held in on 26-29 November 2020 in Shenzhen, China..

Acknowledgements

Thanks to my co-author Jinghao Huang, and my colleagues, Teruo Nakata, Bin Zuo and Suhong Lai, for discussion on research direction as well as refinement of this blog.

References

[1]: P.C. Chang and Y.W. Wang. Fuzzy Delphi and back-propagation model for sales forecasting in PCB industry. Expert Systems with Applications, vol. 30, issue 4, pp. 715-726, May 2006.
[2]: S. Ji, X. Wang, W. Zhao and D. Guo. An Application of a Three-Stage XGBoost-Based Model to Sales Forecasting of a Cross-Border E-Commerce Enterprise. Mathematical Problems in Engineering, vol. 2019, Article ID 8503252.
[3]: H. Hewamalage, C. Bergmeir, K. Bandara. Recurrent neural networks for time series forecasting: Current status and future directions. arXiv preprint arXiv:1909.00590, 2019. (Also in, International Journal of Forecasting, vol. 37, issue 1, pp. 388-427, 2021.)
[4]: J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10): 281-305, 2012.

Industrial AI blog