Data augmentation for stock return prediction

Tanapong Potipiti, Win Supanwanid

Abstract


In the last decade, there have been advances in machine learning performance in various domains, including image classification, natural language processing, and speech recognition. The increase in the size of training data is essential for the improvement in these domains. The two ways to have larger training sets are acquiring more original data and employing effective data augmentation techniques. However, in stock prediction studies, the sizes of datasets have not changed much and there is no accepted data augmentation technique. Consequently, there has been no similar progress in stock prediction. This paper proposes an intuitive and effective data augmentation technique for stock return prediction. New synthetic stocks are generated from linear combinations of original stocks. Unlike previous studies, our augmentation mimics actual financial asset creation processes. Our data augmentation significantly improves prediction accuracy. Moreover, we investigate how the characteristics of original data affect the data augmentation performance. We find a U-shape relationship between accuracy improved from the augmentation and return correlation in original data.

Keywords


Data augmentation; Forecasting; Machine learning; Prediction; Stocks;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v11.i4.pp1563-1569

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.