Advances in Financial Machine Learning

Marcos Lopez de Prado

Published by
Wiley
ISBN 978-1-119-48208-6
RRP £42.50
Reviewed by Paul Ramsay
Score

8 out of 10

Machine Learning is about gaining confidence in your algorithm. Looking at a financial trading model, you only get a limited amount of data from, for example, Bloomberg services on which to formulate confidence. Drilling down you may approximate third party transactions on which you can only obtain partial viability. In this book we look at the various factors that obscure a supply data model and which therefore reduce the information that may be derived. Given a large and diverse supply population, backtesting becomes a crucial retrospective that may give pointers to trading forecasts, but they are only pointers; looking backwards is at best simple guide forecasting. However, there are several ways of analysing supply data for subsequent information.

Having gained separate PhDs in Financial Economics and Mathematical Finance, and holding multiple patent applications on algorithmic trading, our Dr and one-time academic Marcos Lopez de Prado now manages several multibillion-dollar funds using ML algorithms.

Complex, often inter-related topics are covered with a simplicity that only comes from mastery of the subject areas, useful to every data analyst and business analyst supporting risk management. Being a proliferate author, Lopez often references back to his prior publications. A matrix of each topic - Financial Data, Software, Hardware, Math (well, he is American!), Meta-Strat and Overfitting are each chapter and part within the book. However, defining the Sharpe Ratio and its common derivatives such as Deflated Sharpe Ratio after its extensive earlier use is a presentational faux pas.

Standard industry financial risk models come with copious programming snippets in Python. Ultimately, terms such as molecules and atoms are used when trying to illustrate parallel (python) programming. Using KISS (Keep it Simple and Stupid) methodology, should Python Go?

Several elementary tools are introduced to visualise supply market data such as Time Bars, Tick Bars, Volume Bars and Dollar Bars and then one vertical and two horizontal barriers, information derivative bars and then Multi-Threaded Monte Carlo.

Cross-validation (CV) splits supply data into either Training or Testing pools to assist in model development and backtesting. K-fold CV, a popular model, being considered to be faulty, Lopez uses hyper-parameters in his own Purged k-fold CV to improve leakage using purging and embargoes.

Backtesting, (i.e. Stress Testing) is considered from several viewpoints. Three major Walk-Forward (WF) disadvantages such as a single scenario can easily be overfit, WF is not normally representative of future performance, and that its initial decisions are made on a limited portion of the total sample are each considered. Arguing against the benefits of WF, Lopez concludes the goal is to infer future performance from a number of out-of-sample scenarios. Extending his Purged k-fold principles, Lopez offers his Combinatorial Purged Cross-Validation method (CPCV), claiming it leads to fewer false discoveries, easily defeating WF overfitting.

Strategic risk and then portfolio risk are well covered, though this is a relatively stable topic with few recent advances.

Basics covered, we then focus on Asset Allocation, with increasing use of GCE A Level mathematics, looking at Markowitz’s Curse, Tree Clustering, Out-of-Sample Monte Carlo Simulations, Inverse Variance Allocation and others. Shannon’s Entropy and other financial applications of entropy then follow, as does a review of Microstructural Feature publications, including various Lamda’s such as Kyle’s and Hasbrouck’s.

Simple illustrations conclude with brute force used in Quantum Computing to find optimum solutions by examining all feasible solutions at the same time. Considering the maturity of Quantum Computing, the amount of advances especially to standard models seems somewhat lacking.

Finally, Dr Kesheng Wu and Dr Horst Simon look at Hierarchical Data Format 5 (HDF5), Supernova hunting, and High Performance Computing (HPC) against Cloud computing, reasoning HPC offers better cost effectiveness and higher performance. Several use cases are presented including Intraday Peak Electricity Usage, the latter providing recent interesting insights (2014) applied to a summer time study of American Advanced Metering Infrastructure (AMI), the Flash Crash of 2010 and High Frequency Events with Non-uniform Fast Fourier Transform used in the natural gas futures market.

Advances in Financial Machine Learning is a very interesting book. I would give it 8 out of 10 - the author knows his subject.

Further information: Wiley

August 2018