The goal of this research paper is to present a new benchmark dataset for mid-price prediction in high-frequency financial markets. Specifically, the paper describes a dataset of normalized representations of high frequency data for five stocks extracted from the NASDAQ Nordic stock market, which is made publicly available.
An experimental protocol is also outlined in the paper, which can be used to evaluate the performance of related research methods. The authors provide baseline results based on linear and nonlinear regression models to demonstrate the potential of these methods for mid-price prediction.
The dataset is based on ITCH flow data, and includes information on order submissions, trades, cancellations, administrative messages, event controls, and a net order imbalance indicator for ten full days of ultra-high-frequency intra-day data for five stocks traded on the Helsinki exchange from June 1-14, 2010. The data is stored in a Linux cluster and can be used for interdisciplinary research on high-frequency trading and limit order books.
The complex nature of high-frequency trading and limit order books makes them suitable for interdisciplinary research. The paper also provides a comprehensive review of recent methods exploiting machine learning approaches, such as regression models and neural networks, which have been proposed for the inference of the stock market.
— 2023年1月18日