Practical Hadoop Ecosystem

Deepak Vohra

Published by
ISBN 9781484221990
RRP £25.99
Reviewed by Prof Emmanuel Ojo Ademola SME, FBCS

9 out of 10

Nowadays, it is becoming increasingly difficult to predict how data will grow. Notwithstanding, like clockwork, innovation is in a time of vulnerability. It shows that we are on the cusp of the next "awesome" thing. Often, we find that it is a popular fashion that is soon supplanted by the next glossy trinket. The expectations of technical experts as to what will come next in coding perceptively places Hadoop as the “the next great thing”. Deepak Vohra’s Practical Hadoop Ecosystem provides answers to various questions of what experts could use Hadoop to do.

This book is a useful guide on utilising the Apache Hadoop functionalities, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. It emphasises the setting up of the working context, to running various example applications in every part. It is pragmatic, instructional and presents exercises on utilising Apache Hadoop ecosystem ventures. While a few books on Apache Hadoop are available, most focus on the primary functions and HDFS, and none discuss the rest of the Apache Hadoop ecosystem and how these integrate to form a robust big data improvement platform.

The author uses the book to present multiple learning platforms for readers, although I doubt the content could attract the attention of non-technical readers. Some of the learning functions covered are: how to set up an environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5; how to run a MapReduce job; how to store data with Apache Hive and Apache HBase; how to index data in HDFS with Apache Solr; how to develop a Kafka messaging system; step by step instructions to enhance a Mahout User Recommender System; step by step instructions to stream Logs to HDFS with Apache Flume; step by step instructions to exchange information from MySQL database to Hive, HDFS and HBase with Sqoop; and, how to make a Hive table over Apache Solr. The ideas on why Hadoop is essential for web-scale data processing and storage are well presented.

Overall, it is a book with an extensive implementation to show why Hadoop could be a big thing, a tool of exemplary impact in Distributed Computing Systems. The book also provides examples throughout the chapters. It explains challenging concepts and is aimed at coders, software developers, programmers, and technical reviewers - it is a book for experts and mentors. Recommended as a useful guide for Hadoop’s learners and experts.

Further information: Apress

April 2018