So Good They Can’t Ignore You



I got hold of this book after reading Derek’s book notes. I was a bit skeptical at first because of the title, as I thought it would be just another book trying to use a famous quote. But after reading the book notes I was convinced that I have to read this one. The book is an absolute beauty and is a must read for everyone. Cal Newport has used simple words with very good examples to show you how and why skills trump passion. I don’t have much to add on what the book outlines as Derek has done a great job outlining the major points. You can read it here.


Introducing Hadoop Screencasts


Apache Hadoop has gained lot of attention over the past few years and there are many organizations using Apache Hadoop to process their large (Terabytes to Petabytes) data. This change has also got many software engineers interested in Hadoop and the various components of its ecosystem. Organizations such as Cloudera and Hortonworks also conduct some really good courses and certifications to enable interested individuals get a good grip on Apache Hadoop. The internet too has tons of resources on Apache Hadoop.

In November 2012 I completed my Cloudera Training for Apache Hadoop Developer and in December I was a Cloudera Certified Apache Hadoop Developer. Since then, I have posted a few tutorials on my blog and have been receiving good feedback.

The continuous interaction with visitors via blog comments and email made me want to start Hadoop Screencasts. Hadoop Screencasts is a simple website that will contain 10-15 minute video tutorials and screencasts on Apache Hadoop and the various components of its ecosystem.

I intend to start by covering the basics of Hadoop and slowly move on to more advanced topics and components. Every week, two videos will be posted. I am starting this as a side project.

Hadoop Screencasts will launch in the last week of June 2013.

Please visit Hadoop Screencasts and fill up the questionnaire which will enable me to develop and build better quality and relevant tutorials on Apache Hadoop.

Hadoop Screencasts Questionnaire
Follow Hadoop Screencasts on Twitter: @hadoopcasts

Put Your Head Down

Exploration is a human nature. Everyone wants to know more, learn more, read more, write more and something more. There is a very thin line between exploring and being distracted. So often we tend to use exploration as a excuse for distraction. It is somewhere in our heads, that the more we are aware of what is happening around the more productive it will help us be at our work.

In my opinion, this is not TRUE.

Exploring and being amazed by what you find is alone not going to get you anywhere. It is the application of what you explore is what will get you doing things you always wanted to do. Discovering new motivational articles is not going to motivate you more than the last article, though it may seem like that for a while. There is nothing that motivates you more than doing what you have always wanted to do and seeing it through to the end.

Read, but read well
Never stop reading. Set aside sometime to read. It should not be an activity that should be done whenever you get time. Set aside two hours and read and research the subject you are reading. Make notes.

Write and write often
It is a belief that if one is a good reader, he or she will also be a good writer. Again, this is not true. It is a different game when you are writing. You will be amazed how words don’t make it out of your head to paper. Writing needs constant practise just like any other activity. The more you write the better you will get.

Keep your knives sharp
Skill acquired over time need to be sharpened once in a while. Keep revising your skills and don;t let them fade. It is necessary as it may be so frustrating to realize that a skill that you were so good at is a tough task now.

Keep your eyes set on growth.

Relax and ponder
You need to relax. As human beings we can relax our bodies but it is a battle to relax our minds. Stop reading & writing or listening and sit aside and just ponder. The best ideas often surface when pondering over the things you have in your head.

Be Simple
Bruce Lee puts this really well, “It’s not the daily increase but daily decrease. Hack away at the unessential.”

So put your head down right now and start working.

[HOW TO] Install Apache Hive

Important: I have made a complete screencast demonstrating the installation of Apache Hive. You can find it at Hadoop Screencasts – Episode 4: Installing Apache Hive. You can ignore the post below and follow the instructions on the screencast.

Environment Details
Operating System : Linux Mint Release 14
Hadoop Version : 0.20.2

Following are the steps to install Apache Hive:

  • Download Apache Hive

$ wget

  • Untar the archive. I have untarred the file to the /usr/local/

$ tar -xzvf hive-0.10.0.tar.gz
$ mv hive-0.10.0 hive

  • Set the environment variable HIVE_HOME to point to the installation directory:

$ cd hive
$ export HIVE_HOME=/usr/local/hive

  • Add $HIVE_HOME/bin to your PATH:

$ export PATH=$HIVE_HOME/bin:$PATH
$ hive

Do let me know if you need any information.

Cloudera Certified Hadoop Developer (CCD-410)

I cleared the Cloudera Certified Hadoop Developer (CCD – 410) examination and I just wanted to list down a few suggestions for those wanting to appear for the same.

Note : If you are here looking for questions that are part of the CCD-410 test, you have come to the wrong place. However, if you are here to learn more on how the test is and how to prepare for it you have come to the right place.

If you are very serious about the Hadoop certification, I highly recommend the Cloudera Developer Training for Apache Hadoop. I had attended it and found it very useful in understanding the correct working of Hadoop. For more details on the Cloudera training you can read my blog post : Cloudera Developer Training for Apache Hadoop. Continue reading

Introducing MapReduce – Part I

MapReduce is the programming model to work on data within the HDFS. The programming language for MapReduce is Java. Hadoop also provides streaming wherein other langauges could also be used to write MapReduce programs. All data emitted in the flow of a MapReduce program is in the form of <Key,Value> pairs.

We have seen in the previous post a typical flow for the Hadoop system. Here we will break down the MapReduce program and try and understand each part in detail. Continue reading

Introducing Hadoop – Part II

Hadoop uses HDFS to store files efficiently in the cluster. When a file is placed in HDFS it is broken down into blocks, 64 MB block size by default.These blocks are then replicated across the different nodes (DataNodes) in the cluster. The default replication value is 3, i.e. there will be 3 copies of the same block in the cluster. We will see later on why we maintain replicas of the blocks in the cluster. Continue reading