I cleared the Cloudera Certified Hadoop Developer (CCD – 410) examination and I just wanted to list down a few suggestions for those wanting to appear for the same.
Note : If you are here looking for questions that are part of the CCD-410 test, you have come to the wrong place. However, if you are here to learn more on how the test is and how to prepare for it you have come to the write place.
If you are very serious about the Hadoop certification, I highly recommend the Cloudera Developer Training for Apache Hadoop. I had attended it and found it very useful in understanding the correct working of Hadoop. For more details on the Cloudera training you can read my blog post : Cloudera Developer Training for Apache Hadoop.
To clear this test you need to have a very good understanding of the flow of data in Hadoop, i.e. how the files are stored and read. You should be able to visualize on how the MapReduce programs interact with data and how they process them as key-value pairs.
All questions are multiple choice questions. Few questions were very tricky where all answers seemed correct out of which you are expected to select the best answer. You are not expected to write code for this test. You will however be asked questions on the different API calls for Hadoop.
Read about the difference between the new and old APIs for Hadoop. Understand the changes in YARN.
Concentrate heavily on understanding HDFS and MapReduce. Spend some time analyzing how the different daemons(NameNode, TaskTracker, JobTracker) work and interact.
The guidelines provided in the Cloudera website is very good and will help you understand the weight-age of the different topics that are part of the test. It also explains in detail all the topics that are part of the test.
Link to the Cloudera Website : Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410)
Books:
- Hadoop in Action
- Yahoo Developer Network : Hadoop Tutorial - Highly Recommended.
Do let me know if you need any further information.
All the best !

Hi Rohit,
I am planning to prepare and give the examination by the end of March. I have started going through the definitive guide and try to have a hands-on with Map-Reduce almost everyday. Will this be enough for the preparation? Though I have heard from a lot of people that the Cloudera training is good, I won’t be able to afford it owing to its high price ($1700 is it?). So any tips you can give will be deeply appreciated.
Thanks,
Arun Allamsetty
Hi Arun,
The definitive guide is a very good reference material for the certification. However it gets a little difficult to keep reading the book. I would suggest you start reading the Yahoo Developer Tutorial on Hadoop and then the Definitive Guide. The course fee, which I paid was $2995 which includes the certification exam fees.
1. Spend more time on the concepts rather than coding.
2. Study the function of each daemon in detail.
3. Try and visualize the file in HDFS and how it is read and processed.
Try and go through the my blog posts on Hadoop and MapReduce:
Introducing Hadoop Part I
Introducing Hadoop Part II
Introducing MapReduce – Part I
I should be adding more to this soon.
Please feel free to get in touch with me if you need any help.
Hi,
I have been a programmer in a technology called cool:Gen for 7 years. I am looking to move out of it now and learnt that Hadoop technologies has a good future (correct me if i am wrong) and my interests are predominantly with Data Analysis.
I do not have Java experience, will Hadoop be a good career path for me in the future?
Jay
Hi Jay,
I too have been a developer for a few years now and my interest in the architecture of Hadoop got me interested in learning it. The field of Data Science is definitely something to look out for, as many organizations are now leveraging on their archived data, thanks to tools like Hadoop.
This is a good article which will give you a fair idea about Data Science and the Data Scientist : http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
Since you are already a programmer you should not have a problem learning the concepts of Hadoop. This should give you a fair idea about whether Hadoop really interests you. Majority users of Hadoop do most of their work tools like Hive (SQL like interface) and Pig(scripting language) which are simpler abstractions over the MapReduce model of Hadoop, making Hadoop accessible to the non-programmers too.
Go through the articles on my blog and the Hadoop tutorial from Yahoo Developer Network and feel free to ask me any questions you have.
Hello Rohit,
I have been working on learning Hadoop concepts and coding and have few doubts right away. I understand Perl, Java can be used with Hadoop (correct me here), what else can be used? and which is preferable to use (i have done Java in my UG and it did not interest me much)
Thanks
Anand
Hi Anand,
Hadoop has a streaming API. The Streaming API allows developers to use any language to write Mappers and Reducers as long as the language can read from standard input and write to standard output.
So languages like Ruby, Perl, Python, etc can be used to write MapReduce programs for Hadoop.
You could also pick up Pig (a scripting language from Yahoo) which is a very good tool to query data from HDFS (Hadoop Distributed File System).
If you are not into programming, you can use tools like Hive to query the data stored in HDFS. Hive gives you an SQL like interface to query data present in HDFS just like you would query data from tables.
Hive Tutorial : https://cwiki.apache.org/confluence/display/Hive/Tutorial
Do let me know if you need any information.
Hi Rohit,
I have been a programmer in mainframe for around 6 years. I got interested in Hadoop last year and started my preparation to clear this certification. But I didn’t do it with full energy. I don’t have experience in Java, Distributed Programming and Data warehousing. I looked for job opening in the market, most of the jobs are asking for either 5+ years experience in Java or in Distributed programming. Is it worth to do certification? Please advise me.
Hi Chidu,
More than the certification I feel the need to get a hands on experience is what matters.
If you can build something of your own and put up the details online (like code, design, etc), you can prove that you have knowledge on Hadoop.
Go through the books and links I have mentioned and I believe that should be sufficient to get started.
Your experience in Java should be of great help and you will grasp the concepts of MapReduce with ease.
So, certification is not a must.
Do let me know if you need any information and feel free to get in touch if you need any help.
Thanks Rohit.
I have one question
What font you are using for your site?
small suggestion
add Home link in the nav bar.
I usually don’t get time to update the look and feel of the blog.
However, thanks for you suggestion. I have added the home link with few other links to the Nav bar.
The font that I use is http://www.google.com/webfonts/specimen/Lora
Thanks.
Hi Rohit,
I am mostly a C++/Python developer. I have around 8 yrs of exp. I want to transition to being Hadoop developer. It looks interesting and future looks promising.
Self Study is my strength.
I am right now almost decided on doing the certification. Of course a lot of groundwork will be needed I guess. But I am not worried on that.
What is chewing me up is if the companies will look at me as a potential candidate.
I have not dealt with Terabytes of Data as such in my career. This will be a virgin territory to me. I will have no professional exp on Hadoop to show.
Any inputs on this?
Also how long you took to finish the certification? I mean time from start of prep till the date of exam?
BTW is Java really needed?
And any books which teach the concepts and have example in Python/C++?
Sorry for so many Questions.
Thanks in Advance.
Pankaj
Hi Pankaj,
Self study is the best way to learn anything and everything. So if that is your strength, you are already half way there
As a C++/Python developer you can utilize the potential of Hadoop using “Streaming”.
The certification alone will get you recognized but will not be sufficient to prove your expertise on the subject.
My recommendation would be to build something with what you learn. Put it up online and open source it. Let the potential recruiters look at what you have built and that will show your expertise.
Create a problem and solve it using Hadoop.
Think of terabytes of data as several small bytes of data. So if your code can work on small bytes of data efficiently, Hadoop will take care of making it work on the terabytes of data. You will know more about this, the more you read about Hadoop. So if your demo project can demonstrate this ability that should be good enough.
Java is not a must, but if you are a C++ developer with the years of experience you have mentioned then you should have no problem picking it up.
Having Java in your armor is a good thing as you could use the complete potential of the MapReduce model of Hadoop. Almost 80%-85% of the jobs expect you to have a Hive or Pig experience.
I will try to put up some examples of the using Python in Hadoop and will let you know once they are up.
To get started in Hadoop you can refer to the following:
Introducing Hadoop – Part I
Introducing Hadoop – Part II
Introducing MapReduce – Part I
Introducing MapReduce – Part II (Code Listing)
For a complete tutorial on Hadoop I highly recommend:
Yahoo Developer Network – Hadoop Tutorial
Do let me know if you need any information.
Many thanks for your detailed reply Rohit.
Infact right now I am going through your website tutorials and the Yahoo Developer Network Tutorial only. The books seem a little dense as of now.
Can you give me an idea how long it takes to complete the prep with say 4 hrs of study everyday?
Just an approximation is fine.
Thanks in advance.
Hi Pankaj,
I am not sure how that will work because that depends on every individual.
However, I would recommend working on the topics rather than bothering about how many hours you are putting in.
Spend most of your time understanding the core concepts of Hadoop. Understand the functions of each daemon that is part of Hadoop.
There won’t be any questions that would require you to write logic or code. The focus is primarily on the understading the of the Hadoop Architecture and the flow of data across the system.
Hope this helps.
Happy learning !
Hi Rohit
I have installed Hadoop on multiple VMs and have configured a mini cluster of 5 machines – one master and 4 Slaves. I wrote a couple of programs just to test the data flow between these machines.
I am reading Hadoop in Action and use ‘Hadoop – The Definitive Guide’ more for reference purposes. Also intend to go thru’ your blog as well as the Yahoo Dev Network tutorial.
I have scheduled certification in April end
Can you please advice on what more I need to study from and concentrate on?
Should I practice more M-R programs?
– Subu
Hi Subu,
The 2 books you have mentioned are really good books. Having installed Hadoop on multiple VMs would have helped you understand several aspects of Hadoop already, so I would suggest that you concentrate in understanding the individual components clearly. Understand the complete flow of how a file is stored in HDFS and how a program interacts with files on HDFS.
Learn the complete details of how MapReduce works. Study the functions of the NameNode, Secondary NameNode, JobTracker, TaskTracker and DataNodes.
Understand how Hadoop handles node failures, slow task execution, etc.
No programming questions are expected from you, but understand how the programs work. You should understand what is a Map phase, a Reduce phase, etc and the signatures of their respective functions.
The Hadoop In Action is a really good book, and I would suggest going through it well. Use the Definitive Guide to understand the concepts in detail.
My blog posts on Hadoop should help you with the complete flow of data.
I hope I have answered your questions.
Do let me know if you need any information.
Thanks a ton Rohit. Will bug you more as the exam comes closer!!
Hi Rohit, I have worked in Hadoop for almost 18 months. Wanted to be certified. Can you please guide me? How much perp time is good and what would be the best sequence and books to read for ? Appreciate your help.
Hi Jagaran,
Having hands on experience in Hadoop is a great advantage you have and should in most cases more than enough to clear the certification. As I have mentioned in my blog post and replies to comments, understand the basics really well.
You should find a lot of information as part of the comments to this blog post.
I highly recommend the Yahoo Developer Network Tutorial on Hadoop. If you need anything specific feel free to ask me and I shall try my best to help you.
Hi Rohit.. Do you have any idea about hortonworks certification.. I searched in web and couldnt get enough information about exam topics.. could you pls help me..
Hi Yuvraj,
I do not have any idea about HortonWorks Certifications.
Hi Rohit,
Thanks for the post!
I been in Oracle middleware for quite some time. I am very interested in learning Hadoop.I know this is a very hypothetical question to ask but, what are job prospects for Hadoop as of now. And do you think having cloudera certification is helpful to find a job even though having no experience in Hadoop specifically.
Hi Akshay,
I would not know about the job prospects for Hadoop but the field of Big Data and Data Science is on the rise and having the knowledge of Hadoop is good.
Cloudera certification is a good way to get into Hadoop, but does not assure you a job. Also, it is not the only way to learn Hadoop as you can read in the other comments of this post.
Working on a project and putting it out on the internet so that others can see is a good way to demonstrate your Hadoop skills.
Hello Rohit,
I am looking for a good online tutor for hadoop.
As i want to learn hadoop.
But i have wasted much time and money searching good tutor.
Now this is my last chance
http://www.wiziq.com/course/21308-hadoop-big-data-training
I have found a good online hadoop tutor
but I am little bit confuse about the syllabus and the topics covered in such online course..
Please help me exploring is this online hadoop course Good for ume or not.
as I want to relate Hadoop with my daily life.
Thank you
Hi Shruti,
I can;t comment on these online courses as I am not really aware of how they work and the quality of training.
If you have already spent time and money or similar courses, I would would suggest that you stop and take a break from online courses.
Read the following tutorial from start to end:
http://developer.yahoo.com/hadoop/tutorial/
Next, pick up, Hadoop in Action – By Chuck Lam and go through it in detail. It is a pretty good book and touches the various components of Hadoop very well.
I also plan to start http://www.hadoopscreencasts.com especially to make learning Hadoop easy and fun. You can visit this site and leave your feedback.
Feel free to get in touch if you need any help.