#jeff

Jeff Neuenschwander

Investigating Limit Order Book Characteristics for Short Term Price Prediction: a Machine Learning Approach, Faisal Qureshi

Investigating Limit Order Book Characteristics for Short Term Price Prediction: a Machine Learning Approach, Faisal Qureshi

Link to paper: https://arxiv.org/abs/1901.10534

Link to source: Noted in the paper, stored on google drive.

Further steps: Some of the features proposed in this paper are interesting. They could easily be coded up and explored with other datasets.

The author of this paper explores one day of limit order book in four large-cap stocks to see if he can find predictive power in the data. The dataset is about 800k data points, which actually seems oddly small for one day of data in those instruments. This appears to at least partically be because he only uses the top 10 levels.

The experiment was run on a Celeron with 8gb of RAM. Since he is at a major university, my first questions are 1) why didn’t he use TAQ data, and 2) why didn’t he at least use a real desktop PC for this. But I digress.

The baseline classifier is one that makes random predictions based on the frequencies of the target classes.

Several classifiers were then tested.

One result was that the classifier were better at predicting whether price would change than the direction of the change.

The author used a smoothing method a new class label to try to deal with that issue. He decided that Random Forests showed the most promise, and then set about trying to determine which features were the most important.

Several features types are discussed. The author notes that orderID has predictive power, which doesn’t make sense as it is just a unique order ID. The limit order book features, which are the volume and price for various sums of orderbook levels, are also found to have some predictive power by the author. He doesn’t explain exactly what these features look like or how they are calculated, but this could be determined be examining his source code.

The author also discusses some features based on order imbalances at certain levels, and cites another paper on this topic. The author finds that order imbalances do have some predictive power.

Next, “order arrival rates” is discussed as another feature type. These features “capture the volume of buy and sell order created, canceled, and executed within a short historic window” and also contain information on the which order book level this happens at. This paper cites another paper that discussed these features in more detail.

The author combined the “best” features of each category, and puts them together into a combined dataset, but this does not improve Random Forest prediction.

This paper is mainly interesting for its discussion of features based on order book information, and some order-book based features could be coded up for later use in other models. Also of note is the problem of generating classes for microstruture events, and the class imbalance issues found at the microstructute level.