Introducing Hadoop – Part I

What is Hadoop?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures. – Source :

The above definition can be bit overwhelming when read all at once. But the reality is that it is a very simple concept. Continue reading

Brick walls are there for a reason, they let us prove how badly we want something.”Randy Pausch

Cloudera Developer Training for Apache Hadoop

I just completed the Cloudera Developer Training for Apache Hadoop course which was held in Denver, CO, USA from 4th Dec 2012 to 7th Dec 2012.

This program prepares you for the following certification:

Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410)

The course was spread over 4 days that gives you a complete understanding of the Hadoop system (HDFS & MapReduce) along with a few tools that are part of the Hadoop ecosystem. Continue reading

[HOW TO] Install Hadoop on Ubuntu/Linux Mint

Note : You can watch the screencast for a more complete tutorial at  Hadoop Screencasts – Installing Apache Hadoop

Following are the steps for installing Hadoop. I have just listed the steps with very brief explanation at some places. This is more or less like some reference notes for installation. I made a note of this when I was installing Hadoop on my system for the very first time. Continue reading

“It’s not the daily increase but daily decrease. Hack away at the unessential.”Bruce Lee