Spark Cookbook

Rishi Yadav

Published by

Packt Publishing





Reviewed by

Patrick Hill CEng CITP MBCS


7 out of 10

Apache Spark is part of the open-source big data ecosystem that provides a general-purpose cluster computing platform along with specialised tools for machine learning, structured data processing, graph data processing and real-time analytics.

This book attempts to demonstrate how to use a number of Spark’s components and capabilities in a variety of different contexts by providing short sets of step-by-step instructions or ‘recipes’. 

The book’s author is CEO of the InfoObjects consultancy which provides a freely downloadable ‘sandbox’ environment in the form of a Linux virtual machine pre-installed with a variety of open-source big data components, including Spark, along with sample datasets. A note in the book’s preface indicates that the sandbox is required for the recipes. 

The code content of the book is entirely based on using Spark’s interactive shell, which is essentially the Scala shell with some preconfigured Spark objects. The examples are generally quite short, which in itself is illustrative of the power of Spark. While often there is detailed explanation of the background to the examples, the explanation of the code examples themselves are rather light.

Consequently, the reader is often assumed to have the requisite background knowledge, or is required to reach for other resources. As an example, the book frequently refers to RDD, the core data abstraction in Spark, but RDD is never defined or described. Similarly, there are no obvious instructions on how to retrieve and use the sandbox.

That said, there is undoubtedly lots of useful material in this book and once the environment is configured and running, the reader is quickly able to use Spark on real-world datasets. The book’s structure enables it to act both as a reference and as a springboard for the reader to explore related topics using other resources. There were gaps which made the book initially a little frustrating but this was overcome with a little perseverance.

Further information: Packt Publishing

February 2016