Universal features of price formation in financial markets: perspectives from deep learning
JUSTIN SIRIGNANO and RAMA CONT
(Received 21 July 2018; accepted 6 May 2019; published online 9 July 2019)
Using a large-scale Deep Learning approach applied to a high-frequency database containing billions of market quotes and transactions for US equities, we uncover nonparametric evidence for the existence of a universal and stationary relation between order flow history and the direction of price moves. The universal price formation model exhibits a remarkably stable out-of-sample accuracy across a wide range of stocks and time periods. Interestingly, these results also hold for stocks which are not part of the training sample, showing that the relations captured by the model are universal and not asset-specific.
The universal modelâtrained on data from all stocksâoutperforms asset-specific models trained on time series of any given stock. This weighs in favor of pooling together financial data from various stocks, rather than designing asset- or sector-specific models, as is currently commonly done. Standard data normalizations based on volatility, price level or average spread, or partitioning the training data into sectors or categories such as large/small tick stocks, do not improve training results. On the other hand, inclusion of price and order flow history over many past observations improves forecast accuracy, indicating that there is path-dependence in price dynamics.
1. Price formation: how markets react to fluctuations in supply and demand
The computerization of financial markets and the availability of detailed electronic records of order flow and price dynamics in financial markets over the last decade has unleashed TeraBytes of high-frequency data on transactions, order flow and order book dynamics in listed markets, which provide us with a detailed view of the high-frequency dynamics of supply, demand and price in these markets (Cont 2011). These data may be put to use to explore the nature of the price formation mechanism which describes how market prices react to fluctuations in supply and demand. At a high level, a âprice formation mechanismâ is a map which represents the relationship between the market price and variables such as price history and order flow:
where is a set of state variables (e.g. lagged values of price, volatility, and order flow), endowed with some dynamics, and is a random ânoiseâ or innovation term representing the arrival of new information and other effects not captured entirely by the state variables. Market microstructure models, stochastic models and machine learning price prediction models can all be viewed as different ways of representing this map F.
One question, which has been implicit in the literature, is the degree to which this map F is universal (i.e. independent of the specific asset being considered). The generic, as opposed to asset-specific, formulation of market microstructure models seems to implicitly assume such a universality. Empirical evidence on the universality of certain stylized facts (Cont 2001) and scaling relations (Mandelbrot et al. 1997, Benzaquen et al. 2016, Kyle and Obizhaeva 2016, Patzelt and Bouchaud 2017, Toth et al. 2017, Andersen et al. 2018) seems to support the universality hypothesis. Creamer and Freund (2007) recommended training models via a universal approach in order to capture the diversity of different companies. Yet, the practice of statistical modeling of financial time series has remained asset-specific: when building a model for the returns of a given asset, market practitioners and econometricians typically use data from the same asset. For example, a model for Microsoft shares would be estimated using only time series of Microsoft share prices and would not use data from other stocks.
Furthermore, the data used for estimation is often limited to a recent time window, reflecting the belief that financial data can be ânon-stationaryâ and prone to regime changes which may render older data less relevant for prediction.
Due to such considerations, models considered in financial econometrics, trading and risk management applications are asset-specific and their parameters are (re)estimated over time using a time window of recent data. Such a model for an asset i may be expressed in the form
where the model parameter is estimated using recent data on price and other state variables related to asset i. As a result, data sets are fragmented across assets and time and, even in the high-frequency realm, the size of data sets used for model estimation and training are orders of magnitude smaller than those encountered in other fields where Big Data analytics have been successfully applied. This is one of the reasons why, except in a few instances (Sirignano et al. 2016, Kolanovic and Krishnamachari 2017, Dixon 2018a, 2018b, Sirignano 2019), large-scale machine learning methods such as Deep Learning (Goodfellow et al. 2017) have not yet been deployed for quantitative modeling in finance.
On the other hand, if the relation between these variables were universal and stationary, i.e. if the parameter varies neither with the asset i nor with time t, then one could potentially pool data across different assets and time periods and use a much richer data set to estimate/train the model. For instance, data on a flash crash episode in one asset market could provide insights into how the price of another asset would react to severe imbalances in order flow, whether or not such an episode has occurred in its history. This idea, known as transfer learning, has been used with great success in applications such as image and text recognition.
In this work, we provide evidence for the existence of such a universal, stationary relation between order flow and market price fluctuations, using a nonparametric approach based on Deep Learning. Deep learning can estimate nonlinear relations between variables using âdeepâ multilayer neural networks which are trained on large data sets using âsupervised learningâ methods (Bengio et al. 2015).
Using a deep neural network architecture trained on a high-frequency database containing billions of electronic market transactions and quotes for US equities, we uncover nonparametric evidence for the existence of a universal and stationary price formation mechanism relating the dynamics of supply and demand for a stock, as revealed through the order book, to subsequent variations in its market price. We assess the model by testing its out-of-sample predictions for the direction of price moves given the history of price and order flow, across a wide range of stocks and time periods. The universal price formation model exhibits a remarkably stable out-of-sample prediction accuracy across time and...