Hadoop 2 Quick Start Guide

Douglas Eadline

Published by

Addison Wesley, Data & Analytics Series

ISBN

9780134049946

RRP

£21.99

Reviewed by

Len Keighley

Score

9 out of 10

The book concerns the concepts and operation of a Big Data environment using the Apache Hadoop 2 ecosystem. As well as the usual introductory sections it contains 10 major sections and 5 appendices.

The book really does take you from soup to nuts, as they say in the US, starting with an introduction to the concepts and history of Hadoop and Big Data, through installation, file system basics, MapReduce Framework & Programming, Hadoop Tools (including Yarn applications), and finally the management and administration of Hadoop under Apache Ambari. The book also has its own website, complete with code downloads, question & answer forums, resource links and update information.

The guide starts at the very beginning for the complete novice user, taking them through a step by step process to install Hadoop in a single platform environment for a virtual Hadoop sandbox (Hortonworks HDP [Hortonworks Data Platform] Sandbox to be precise) or pseudo distributed mode. The former being available for Microsoft or Apple operating systems.

The latter, while more complex, does more closely resemble a fully operational Hadoop environment. Normally, the Hadoop environment uses a cluster of servers running in a data centre setup, but this Quick Start Guide provides the necessary process to implement Hadoop on a stand-alone desk or laptop for personal use and evaluation. Obviously, this does restrict the size of data involved and the analysis that can be undertaken, but it does also provide an introduction for the individual approaching Big Data for the first time.

In a similar manner the book then takes the reader through the full operation of the Hadoop 2 system with code examples where necessary. All this can therefore be used by either novices or more experienced users using the full blown operational Hadoop environment.

The structure of the book is also linked to the video tutorials, Hadoop Fundamentals: Live Lessons and Apache Hadoop Yarn Fundamentals: Live lessons, also produced by Douglas Eadline and Addison-Wesley, so that the two can be used in conjunction. The author suggests that this may be the best approach for taking on board the subject matter.

In essence there is something in this book for everyone, from some that just want to see what all the Hadoop noise is about, to those that are regular Hadoop users or administrators. The format used is excellent for this type of book, and one that should perhaps set the standard for other ‘quick start’ guides.

The instructions and code examples are easy to follow and provide all the required background. The layout also aids the reader who wants to pick and choose what they read, dependant on their needs at that time, while still providing for the reader who needs to see the whole picture.

Particularly interesting was the section on HDFS (Hadoop Distributed File System) which provides information on the background to the chosen structure for its storage and command environment.

One of the Appendices even gives a summary of the additional resource content in the full sections so that the really high level ‘helicopter’ reader is also served.

Obviously, as the title suggests, there is more detail to be had and I look forward to reading Douglas Eadline’s books at that level as well.

Further information: Addison Wesley, Data & Analytics Series

February 2016