Data Wrangling with Python

Jacqueline Kazil and Katharine Jarmul

Published by
O’Reilly
ISBN 978-1-4919-4881-1
RRP £31.99
Reviewed by Deryn Graham FBCS, Retired Senior Lecturer / Data Scientist
Score

10 out of 10

From the introduction, the target audience for this book is not obvious beyond non-Python experts. The description of data wrangling seems to broadly relate to data analytics: “taking a messy or unrefined source of data and turning it into something useful”.

The book perhaps does not sufficiently highlight the first, most difficult stage of analytics in establishing the business case / problem identification, referred in the book to as “formulating a question”. This supports the common over-simplification of data analytics, as the fundamental problem with obtaining value is determining the business case (if any).

Python is excellent for encoding algorithms for cleansing and analysis, etc. but not all of data analytics in its entirety (such as establishing the business case) can be achieved by implementation through Python alone, implementation comes later and is by comparison, much easier.

Going beyond this important point and the acknowledged, but a little peculiar, relationship to journalism, the book is well written and comprehensive. Every topic is not covered, although most are touched upon. Beginning with advice on topics often neglected but necessary, like installation, the book has helpful chapters on data and file types (as expected), and the chapter on PDFs is particularly useful and insightful. Advanced topics include some details on parallel processing.

The book also provides examples and online support/forums. The book attempts to explain very difficult concepts and is as stated, aimed at non-Python experts however, a good background in computer science is essential if the reader is to get the most from reading this book. It’s excellent overall. 

Recommended as a good supportive text for data wrangling (analytics) for computer scientists who are not experts on Python.

Further information: O’Reilly

February 2018