Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst

Dean Abbott

Published by

Wiley

ISBN

9781118727966

RRP

£33.99

Reviewed by

Dr Patrick Hill CEng MBCS CITP

Score

10 out of 10

In recent years, the data collected by computer applications has grown significantly both in volume and diversity. Data owners are keen to derive value from these data sets by discovering patterns and relationships within them and by using them to generate predictions that can be employed in variety of ways.

This book takes a process-level view of predictive analytics programmes.

The content, which is aligned to the Cross-Industry Standard Process for Data Mining (CRISP-DM), provides a practical description, founded on the author’s own experience, of each of the six stages of CRISP-DM, which together take a predictive analytics programme from initial business requirements gathering through to final deployment.

After an introductory chapter, which outlines key concepts relating to predictive analytics, the book moves on to discuss the definition and initiation of a predictive analytics programme, including the identification of business objectives, source data and evaluation criteria.

The author describes a variety of exploratory data analysis techniques, including the use of data visualisation approaches, that enable practitioners to familiarise themselves with the available data sets and their relationships to the business context. The author also describes a variety of unsupervised learning techniques as a means to derive descriptive models, which support further exploration of source data.

Recognising that the data sets used in analytics programmes are often incomplete, an extensive chapter of the book is devoted to data preparation. Here, the author explores some of the key challenges that can be posed by real-world data sets and describes techniques that can be used to identify and handle incorrect or missing data, reduce data size and select appropriate data subsets for model training.

As may be expected, a significant part of the book is devoted to the description of supervised predictive analytics approaches including practical advice on the selection and evaluation of algorithms.

In contrast to some books in this field, which focus on the mathematical and statistical inner workings of algorithms, this book explains concepts in plain English and treats analytics algorithms more in terms of ‘black boxes’ with particular characteristics, which make them more or less suitable for different data sets and applications.

This is supplemented by a chapter on the use of multiple models, “model ensembles”, as a means to improve model accuracy, as well as a chapter devoted to mining data from unstructured text sources.

Having covered the process of building and evaluating suitable predictive models, the author provides a useful chapter that describes some of the practical issues, especially those of a non-technical nature, that may be encountered when deploying the models.

This book provides an excellent background to predictive analytics and should appeal to a broad readership. The writing style is authoritative, and while occasionally jumping ahead or assuming knowledge that the book has yet to cover, is generally easy to read. Real-world examples are used throughout, and while the book does not refer to any particular analytics software, the data for these examples is available for download from the book’s website, enabling readers to explore the topics using their software of choice.

Further information: Wiley

December 2014