This book is an in-depth guide on developing distributed systems using the Apache Hadoop framework.
Summary of The Book
Hadoop is an open software framework that facilitates the construction of large distributed networks that handle huge amounts of data, running into terabytes or petabytes. It is built on the MapReduce algorithm developed by Google. Hadoop was developed by The Apache Software Foundation.
Fundamentally, all implementations of information technology are about handling data, processing it, and presenting it in a useful form. This is a data intensive age where huge amounts of data need be shared across vast geographical distances, across different networks and systems. The sharing and processing of data has to be transparent and smooth, so that the applications processing the data and the people using the data need not bother about the details not directly relevant to them.
This involves building networks and distributed systems that are capable of breaking down data transmission and the execution of applications across different systems, to ensure that everything runs smoothly. Plans also have to be put in place to ensure that failure in one part of the system does not bring the entire system to a halt.
Apache Hadoop helps companies build large distributed systems that are reliable, scalable, extremely fast, and capable of handling huge amounts of data. It is built for data intensive operations.
The book Hadoop: The Definitive Guide is divided into 16 chapters. The first chapter provides an overview of Hadoop and the next three chapters provide an introduction to MapReduce, the algorithm on which Hadoop is built. The fifth chapter shows readers how to set up MapReduce to run an application, and Chapter 6 explains how MapReduce is implemented in Hadoop. Chapter 7 discusses the MapReduce programming model and the data formats that it can handle, while Chapter 8 covers MapReduce features like library classes and sorting and joining data.
Chapters 9 and 10 are about the administrative aspect of Hadoop, and these chapters explain how to set up and maintain Hadoop clusters. The next five chapters cover Apache components like the Pig programming platform and Pig Latin, the Hive Data Warehouse system, the HBase database system, the Zookeeper configuration service, and the Sqoop data transfer service. The last chapter is a compilation of some case studies from the Apache Hadoop community.
Hadoop: The Definitive Guide shows readers how to use Hadoop Distributed File System (HDFS) to store and process large data sets. It shows them how to configure MapReduce and then setup Hadoop and the various components for a distributed system that is reliable, fast, and secure.
About Tom White
Tom White is a member of the Apache Software Foundation and is on the Apache Hadoop Committee.
Hadoop: The Definitive Guide is his only book so far.
White currently works for Cloudera, a company that provides Apache Hadoop based solutions and services. He has a degree in Mathematics from Cambridge University and a degree in Philosophy of Science from the University of Leeds.
Best book for learning hadoop
Ananda Prakash Verma
21 Jan, 2012
Must have book
30 Aug, 2012
29 Apr, 2012
Great book to start with
23 Mar, 2012