Data augmentation for stock return prediction

Tanapong Potipiti, Win Supanwanid


In the last decade, there have been advances in machine learning performance in various domains, including image classification, natural language processing, and speech recognition. The increase in the size of training data is essential for the improvement in these domains. The two ways to have larger training sets are acquiring more original data and employing effective data augmentation techniques. However, in stock prediction studies, the sizes of datasets have not changed much and there is no accepted data augmentation technique. Consequently, there has been no similar progress in stock prediction. This paper proposes an intuitive and effective data augmentation technique for stock return prediction. New synthetic stocks are generated from linear combinations of original stocks. Unlike previous studies, our augmentation mimics actual financial asset creation processes. Our data augmentation significantly improves prediction accuracy. Moreover, we investigate how the characteristics of original data affect the data augmentation performance. We find a U-shape relationship between accuracy improved from the augmentation and return correlation in original data.


Data augmentation; Forecasting; Machine learning; Prediction; Stocks;

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats