Cloudera Certified Hadoop Developer (CCD-410)




I cleared the Cloudera Certified Hadoop Developer (CCD – 410) examination and I just wanted to list down a few suggestions for those wanting to appear for the same.

Note : If you are here looking for questions that are part of the CCD-410 test, you have come to the wrong place. However, if you are here to learn more on how the test is and how to prepare for it you have come to the right place.

If you are very serious about the Hadoop certification, I highly recommend the Cloudera Developer Training for Apache Hadoop. I had attended it and found it very useful in understanding the correct working of Hadoop. For more details on the Cloudera training you can read my blog post : Cloudera Developer Training for Apache Hadoop.

To clear this test you need to have a very good understanding of the flow of data in Hadoop, i.e. how the files are stored and read. You should be able to visualize on how the MapReduce programs interact with data and how they process them as key-value pairs.

All questions are multiple choice questions. Few questions were very tricky where all answers seemed correct out of which you are expected to select the best answer. You are not expected to write code for this test. You will however be asked questions on the different API calls for Hadoop.

Read about the difference between the new and old APIs for Hadoop. Understand the changes in YARN.

Concentrate heavily on understanding HDFS and MapReduce. Spend some time analyzing how the different daemons(NameNode, TaskTracker, JobTracker) work and interact.

The guidelines provided in the Cloudera website is very good and will help you understand the weight-age of the different topics that are part of the test. It also explains in detail all the topics that are part of the test.

Link to the Cloudera Website : Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410)

Books:

  1. Hadoop in Action
  2. Yahoo Developer Network : Hadoop Tutorial – Highly Recommended.

Do let me know if you need any further information.

All the best !

Read, FAQs if you have some basic questions related to Hadoop.




129 Comments Cloudera Certified Hadoop Developer (CCD-410)

  1. Arun Allamsetty

    Hi Rohit,

    I am planning to prepare and give the examination by the end of March. I have started going through the definitive guide and try to have a hands-on with Map-Reduce almost everyday. Will this be enough for the preparation? Though I have heard from a lot of people that the Cloudera training is good, I won’t be able to afford it owing to its high price ($1700 is it?). So any tips you can give will be deeply appreciated.

    Thanks,
    Arun Allamsetty

    Reply
    1. admin

      Hi Arun,

      The definitive guide is a very good reference material for the certification. However it gets a little difficult to keep reading the book. I would suggest you start reading the Yahoo Developer Tutorial on Hadoop and then the Definitive Guide. The course fee, which I paid was $2995 which includes the certification exam fees.

      1. Spend more time on the concepts rather than coding.
      2. Study the function of each daemon in detail.
      3. Try and visualize the file in HDFS and how it is read and processed.

      Try and go through the my blog posts on Hadoop and MapReduce:

      Introducing Hadoop Part I
      Introducing Hadoop Part II
      Introducing MapReduce – Part I

      I should be adding more to this soon.

      Please feel free to get in touch with me if you need any help.

      Reply
  2. Jay

    Hi,

    I have been a programmer in a technology called cool:Gen for 7 years. I am looking to move out of it now and learnt that Hadoop technologies has a good future (correct me if i am wrong) and my interests are predominantly with Data Analysis.

    I do not have Java experience, will Hadoop be a good career path for me in the future?

    Jay

    Reply
    1. Rohit Menon

      Hi Jay,

      I too have been a developer for a few years now and my interest in the architecture of Hadoop got me interested in learning it. The field of Data Science is definitely something to look out for, as many organizations are now leveraging on their archived data, thanks to tools like Hadoop.

      This is a good article which will give you a fair idea about Data Science and the Data Scientist : http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1

      Since you are already a programmer you should not have a problem learning the concepts of Hadoop. This should give you a fair idea about whether Hadoop really interests you. Majority users of Hadoop do most of their work tools like Hive (SQL like interface) and Pig(scripting language) which are simpler abstractions over the MapReduce model of Hadoop, making Hadoop accessible to the non-programmers too.

      Go through the articles on my blog and the Hadoop tutorial from Yahoo Developer Network and feel free to ask me any questions you have.

      Reply
  3. Anandkk

    Hello Rohit,

    I have been working on learning Hadoop concepts and coding and have few doubts right away. I understand Perl, Java can be used with Hadoop (correct me here), what else can be used? and which is preferable to use (i have done Java in my UG and it did not interest me much)

    Thanks
    Anand

    Reply
    1. Rohit Menon

      Hi Anand,

      Hadoop has a streaming API. The Streaming API allows developers to use any language to write Mappers and Reducers as long as the language can read from standard input and write to standard output.
      So languages like Ruby, Perl, Python, etc can be used to write MapReduce programs for Hadoop.

      You could also pick up Pig (a scripting language from Yahoo) which is a very good tool to query data from HDFS (Hadoop Distributed File System).

      If you are not into programming, you can use tools like Hive to query the data stored in HDFS. Hive gives you an SQL like interface to query data present in HDFS just like you would query data from tables.
      Hive Tutorial : https://cwiki.apache.org/confluence/display/Hive/Tutorial

      Do let me know if you need any information.

      Reply
  4. Chidu

    Hi Rohit,
    I have been a programmer in mainframe for around 6 years. I got interested in Hadoop last year and started my preparation to clear this certification. But I didn’t do it with full energy. I don’t have experience in Java, Distributed Programming and Data warehousing. I looked for job opening in the market, most of the jobs are asking for either 5+ years experience in Java or in Distributed programming. Is it worth to do certification? Please advise me.

    Reply
    1. Rohit Menon

      Hi Chidu,

      More than the certification I feel the need to get a hands on experience is what matters.
      If you can build something of your own and put up the details online (like code, design, etc), you can prove that you have knowledge on Hadoop.

      Go through the books and links I have mentioned and I believe that should be sufficient to get started.
      Your experience in Java should be of great help and you will grasp the concepts of MapReduce with ease.

      So, certification is not a must.

      Do let me know if you need any information and feel free to get in touch if you need any help.

      Reply
      1. Chidu

        Thanks Rohit.

        I have one question
        What font you are using for your site?
        small suggestion
        add Home link in the nav bar.

        Reply
  5. Pankaj

    Hi Rohit,

    I am mostly a C++/Python developer. I have around 8 yrs of exp. I want to transition to being Hadoop developer. It looks interesting and future looks promising.

    Self Study is my strength. 🙂
    I am right now almost decided on doing the certification. Of course a lot of groundwork will be needed I guess. But I am not worried on that.

    What is chewing me up is if the companies will look at me as a potential candidate.

    I have not dealt with Terabytes of Data as such in my career. This will be a virgin territory to me. I will have no professional exp on Hadoop to show.

    Any inputs on this?
    Also how long you took to finish the certification? I mean time from start of prep till the date of exam?

    BTW is Java really needed?
    And any books which teach the concepts and have example in Python/C++?

    Sorry for so many Questions.

    Thanks in Advance.
    Pankaj

    Reply
    1. Rohit Menon

      Hi Pankaj,

      Self study is the best way to learn anything and everything. So if that is your strength, you are already half way there 🙂
      As a C++/Python developer you can utilize the potential of Hadoop using “Streaming”.

      The certification alone will get you recognized but will not be sufficient to prove your expertise on the subject.
      My recommendation would be to build something with what you learn. Put it up online and open source it. Let the potential recruiters look at what you have built and that will show your expertise.
      Create a problem and solve it using Hadoop.

      Think of terabytes of data as several small bytes of data. So if your code can work on small bytes of data efficiently, Hadoop will take care of making it work on the terabytes of data. You will know more about this, the more you read about Hadoop. So if your demo project can demonstrate this ability that should be good enough.

      Java is not a must, but if you are a C++ developer with the years of experience you have mentioned then you should have no problem picking it up.
      Having Java in your armor is a good thing as you could use the complete potential of the MapReduce model of Hadoop. Almost 80%-85% of the jobs expect you to have a Hive or Pig experience.

      I will try to put up some examples of the using Python in Hadoop and will let you know once they are up.

      To get started in Hadoop you can refer to the following:

      Introducing Hadoop – Part I
      Introducing Hadoop – Part II
      Introducing MapReduce – Part I
      Introducing MapReduce – Part II (Code Listing)

      For a complete tutorial on Hadoop I highly recommend:
      Yahoo Developer Network – Hadoop Tutorial

      Do let me know if you need any information.

      Reply
      1. Pankaj

        Many thanks for your detailed reply Rohit.

        Infact right now I am going through your website tutorials and the Yahoo Developer Network Tutorial only. The books seem a little dense as of now.

        Can you give me an idea how long it takes to complete the prep with say 4 hrs of study everyday?

        Just an approximation is fine.

        Thanks in advance.

        Reply
        1. Rohit Menon

          Hi Pankaj,

          I am not sure how that will work because that depends on every individual.
          However, I would recommend working on the topics rather than bothering about how many hours you are putting in.

          Spend most of your time understanding the core concepts of Hadoop. Understand the functions of each daemon that is part of Hadoop.
          There won’t be any questions that would require you to write logic or code. The focus is primarily on the understading the of the Hadoop Architecture and the flow of data across the system.
          Hope this helps.

          Happy learning !

          Reply
  6. Subu S

    Hi Rohit

    I have installed Hadoop on multiple VMs and have configured a mini cluster of 5 machines – one master and 4 Slaves. I wrote a couple of programs just to test the data flow between these machines.

    I am reading Hadoop in Action and use ‘Hadoop – The Definitive Guide’ more for reference purposes. Also intend to go thru’ your blog as well as the Yahoo Dev Network tutorial.

    I have scheduled certification in April end

    Can you please advice on what more I need to study from and concentrate on?

    Should I practice more M-R programs?

    — Subu

    Reply
    1. Rohit Menon

      Hi Subu,

      The 2 books you have mentioned are really good books. Having installed Hadoop on multiple VMs would have helped you understand several aspects of Hadoop already, so I would suggest that you concentrate in understanding the individual components clearly. Understand the complete flow of how a file is stored in HDFS and how a program interacts with files on HDFS.

      Learn the complete details of how MapReduce works. Study the functions of the NameNode, Secondary NameNode, JobTracker, TaskTracker and DataNodes.

      Understand how Hadoop handles node failures, slow task execution, etc.

      No programming questions are expected from you, but understand how the programs work. You should understand what is a Map phase, a Reduce phase, etc and the signatures of their respective functions.

      The Hadoop In Action is a really good book, and I would suggest going through it well. Use the Definitive Guide to understand the concepts in detail.

      My blog posts on Hadoop should help you with the complete flow of data.
      I hope I have answered your questions.

      Do let me know if you need any information.

      Reply
  7. jagaran

    Hi Rohit, I have worked in Hadoop for almost 18 months. Wanted to be certified. Can you please guide me? How much perp time is good and what would be the best sequence and books to read for ? Appreciate your help.

    Reply
    1. Rohit Menon

      Hi Jagaran,

      Having hands on experience in Hadoop is a great advantage you have and should in most cases more than enough to clear the certification. As I have mentioned in my blog post and replies to comments, understand the basics really well.

      You should find a lot of information as part of the comments to this blog post.

      I highly recommend the Yahoo Developer Network Tutorial on Hadoop. If you need anything specific feel free to ask me and I shall try my best to help you.

      Reply
  8. Yuvaraj

    Hi Rohit.. Do you have any idea about hortonworks certification.. I searched in web and couldnt get enough information about exam topics.. could you pls help me..

    Reply
  9. Akshay

    Hi Rohit,
    Thanks for the post!
    I been in Oracle middleware for quite some time. I am very interested in learning Hadoop.I know this is a very hypothetical question to ask but, what are job prospects for Hadoop as of now. And do you think having cloudera certification is helpful to find a job even though having no experience in Hadoop specifically.

    Reply
    1. Rohit Menon

      Hi Akshay,

      I would not know about the job prospects for Hadoop but the field of Big Data and Data Science is on the rise and having the knowledge of Hadoop is good.
      Cloudera certification is a good way to get into Hadoop, but does not assure you a job. Also, it is not the only way to learn Hadoop as you can read in the other comments of this post.
      Working on a project and putting it out on the internet so that others can see is a good way to demonstrate your Hadoop skills.

      Reply
  10. Shruti Garg

    Hello Rohit,
    I am looking for a good online tutor for hadoop.
    As i want to learn hadoop.
    But i have wasted much time and money searching good tutor.
    Now this is my last chance
    http://www.wiziq.com/course/21308-hadoop-big-data-training
    I have found a good online hadoop tutor
    but I am little bit confuse about the syllabus and the topics covered in such online course..
    Please help me exploring is this online hadoop course Good for ume or not.

    as I want to relate Hadoop with my daily life.
    Thank you

    Reply
    1. Rohit Menon

      Hi Shruti,

      I can;t comment on these online courses as I am not really aware of how they work and the quality of training.
      If you have already spent time and money or similar courses, I would would suggest that you stop and take a break from online courses.

      Read the following tutorial from start to end:
      http://developer.yahoo.com/hadoop/tutorial/

      Next, pick up, Hadoop in Action – By Chuck Lam and go through it in detail. It is a pretty good book and touches the various components of Hadoop very well.
      I also plan to start http://www.hadoopscreencasts.com especially to make learning Hadoop easy and fun. You can visit this site and leave your feedback.

      Feel free to get in touch if you need any help.

      Reply
    1. Rohit Menon

      Hi Ankit,

      To put it simply, an administrator manages the Hadoop Cluster (infrastructure) and services.
      Whereas a developer, develops applications and programs that run on the cluster to solve a specific business problems/needs.

      There are tons of information available on the internet describing in detail what the roles and resposbilities of a Apache Hadoop Adminstrator and Developer. The first place I would visit is http://www.cloudera.com

      Reply
  11. ankit sharma

    thank you sir for your response…
    sir i am a 7th semester student of engineering so what i should choose hadoop developer or administrator…..
    and when parallel to my degree or after completion of my degree…..
    i have a basic knowledge of java and sql…..

    Reply
    1. SOURABH PATEL

      WHEN ONE SHOULD DO HADOOP AFTER COMPLETION OF BE OR WITH IN BETWEEN ALSO WE CAN DO IT…
      I AM PURSUING ENGINNERNG SO WHEN SHOULD I GO FOR HADOOP…
      AND WHICH DEVELOPER OR ADMIN

      Reply
    2. palash selot

      SIR I AM A ENGINEERING STUDENT SO WHICH IS BETTER FOR ME HADOOP DEVELPOER OR HADOOP ADMINISTRATOR..
      AND WHEN SHOULD I DO THIS AFTER MY ENGINEERING OR PARALLEL TO MY ENGINEERING…..

      Reply
      1. Rohit Menon

        Hi Palash,

        You would be the best person to decide this as it all depends on your interest. If you think administration of infrastructure and services is what interests you, you should go ahead with Hadoop Administration certification, else if you are inclined towards programming, you can focus on Apache Hadoop Development.

        Do let me know if you need any information.

        Reply
  12. Naveen Konduru

    Hi Rohit,

    Your blog is pretty impressive and thanks for the information on hadoop. I have been learning hadoop on my own and I’m able to install hadoop cluster (1 master and 3 data nodes) on vms. I have also played with the configuration files like changing block replications and etc. And also tired some commands to see where the data is storing( I mean on which nodes )

    I’m bascially a PHP developer I usually create websites using (PHP,Python, mysql, html, css,javascript and some frameworks).
    I have more interest to learn hadoop but I still have some basic questions on what to learn in hadoop? since there is so many tools included like hive, hbase, pig, map-reduce. So all I wanted to know is what should I learn exactly to become a Hadoop Developer. Please guide me.

    Reply
    1. Rohit Menon

      Hi Naveen,

      I am glad that you find the information in the blog useful.
      The tools you decide to learn would be based upon the problems you are trying to solve.
      MapReduce is a programming model which uses Java as the programming language.
      Hive provides a SQL like interface for Hadoop, whereas Pig is a scripting language.
      Both Pig and Hive are abastractions of MapReduce.

      So it all depends on what you are comfortable with. As you are already comfortable setting up a cluster and have experience in Python and MySql it would be easy for you to grasp Hive and Pig. Start with these two and see how it goes.

      Feel free to ask me any questions you may have.

      Reply
      1. Naveen Konduru

        Hi Rohit,

        Thanks for the response.

        Now I got an idea about those tools(Hive and Pig) like which are almost do the same job as MapReduce. But still I am wondering how can we load tons of sql data ( for example employee table which has 10 columns and no.of records.). As of now I could see only word count examples in online which is like copying the files( I mean .txt, .log files) into HDFS and querying the data using MapReduce jobs. And also I am wondering how can we put real time data directly into HDFS?

        For example:
        From user registration page we usually grab the data into mysql, oracle or someother database. So here my question is how can we store the registration information directly into hadoop HDFS without mysql or any other databases.

        Am I missing something….? what does the hbase role in hadoop?

        Reply
        1. Rohit Menon

          Hi Naveen,

          I would recommend reading up a bit on what Hadoop is and problems it solves. Hadoop is a batch processing system, although there are efforts to provide real time response to queries.
          It is not a replacement for a traditional RDBMS. You are not going to store information from a registration form directly into HDFS. Apache Hadoop is used to query larger (Terabytes to Petabytes) of data.
          If you start reading up the basics of Hadoop I am sure all these concepts will be clear.

          Do let me know if you need any information.

          Reply
          1. Naveen Konduru

            Hi Rohit,

            Thanks for your response.
            Yes, I just came to know about what hadoop can do.
            So in the hadoop developer cerification questions will there be any questions from the below this link http://developer.yahoo.com/hadoop/tutorial/module5.html

            and I am not comfortable with JAVA but I am ok with python and php.
            Do we really need to know java to pass the certification?

          2. Rohit Menon

            Hi Naveen,

            The certification exam does not ask you any Java programming level questions, but you may be asked the java classes and their usage.

  13. ankit sharma

    sir i am a last year engineering student is it the riht time to o for hadoop or i should wait for my degree to get completed

    Reply
  14. Swagatika

    Hi Ankit,
    I have registered fr Cloudera Dev Exam .Its on15th.With this much time left, can u plz tell which parts i should focus mainly upon as i dont think i can read the whole book due to short span.Also hv some basic dbts on Hadoop,if u can clear them..

    Reply
  15. swagatika

    Hi Ankit,
    I have registered fr Cloudera Dev Exam .Its on 15th.With this much time left, can u plz tell which parts i should focus mainly upon as i dont think i can read the whole book due to short span.Also hv some basic dbts on Hadoop,if u can clear them..

    Reply
  16. swagatika

    Hi Rohit,
    I have registered fr Cloudera Dev Exam .Its on 15th.With this much time left, can u plz tell which parts i should focus mainly upon as i dont think i can read the whole book due to short span.Also hv some basic dbts on Hadoop,if u can clear them..

    Reply
    1. Rohit Menon

      Hi Swagatika,

      As mentioned in my previous comments:

      1. Spend more time on the concepts rather than coding.
      2. Study the function of each daemon in detail.
      3. Try and visualize the file in HDFS and how it is read and processed.

      Try and go through the my blog posts on Hadoop and MapReduce:

      Introducing Hadoop Part I
      Introducing Hadoop Part II
      Introducing MapReduce – Part I
      Introducing MapReduce – Part II

      You just gave me a very little time to help you but I hope you do well. All the best.
      If you have any doubts please feel free to ask.

      Reply
      1. Drumal Mina

        Thank you, Rohit Menon, very much for all your work in helping us on Hadoop and certification!

        I have three questions. First, I read Swagatika’s email where he was taking the certification exam in June. Can you please respond to Swagatiak to see if he passed? This would a big help because if he passed studying your blog posts then that is what I would focus my studying. I have tried to read Tom White Hadoop Definitive Guide but its hard to understand. Your Hadoop screencasts make it more understandable and your explanation in your Hadoop and MapReduce blogs are so well written, it makes it so much easier to understand.

        Second question – for the certification exam, are you suppose to memorize things like the port numbers – like daemon datanode, namenode, job tracker, task tracker

        Third question – you recommend studying concepts over coding but what about the questions on the Ecosystem which is 8% of the exam and probably around 5 questions – are we supposed to learn Pig, Hive, Sqoop programming or do we just need to know what they do and how they are different (for example Hive is sql like, Pig handles flows, sqoop to transfer data from database)

        Again, thank you for taking so much of your time helping others. Your answers will help me focus my time and studying to prepare for certification

        Drumal

        Reply
  17. Sriram

    Hi Rohit
    I have passed Oracle Certified Java Associate (1Z0-083) last week. Now, i will attend Cloudera Developer Training on July 08th-11th.
    I want to install Hadoop framework on my MAC machine. Do you know how to install the VMware of hadoop.

    Thanks
    Sriram

    Reply
  18. Amr

    Thank Rohit,
    My question is, do i really have to have practical hadoop experience or is it enough to study it on my now?

    Reply
    1. Rohit Menon

      Hi Amr,

      If you have the interest to learn you can easily do it all by yourself.
      There are several resources that I have listed in my previous comments, blog posts and on the internet that could help you.
      Do let me know if you need any information.

      Reply
      1. Amr

        Thanks man,
        I understand that i can study hadoop myself. But my doubt is whether this self-study is gonna be enough to clear the exam or should i be a part of a real life hadoop project that i practice on regulat basis.

        Reply
  19. Vinod

    hi Rohit,

    Your blog is good and have taken a lot of guidance and information.

    Am a non-IT MBA graduate and I came to know thro’ friends about the Hadoop admin/architect.

    Am planning to do a course in Hadoop admin. Do you think, whether this software would be helpful for a non-IT graduate. Please guide & suggest if am on the wrong side.

    Reply
    1. Rohit Menon

      Hi Vinod,

      I am glad that you liked my blog. Learning Hadoop and doing something with it in the form of sample projects, will get you some attention from the people interested in hiring Hadoop Admins/developers.
      Right or wrong, I am not sure. But, if distributed computing, Data Science, Data Analysis, Big Data, etc ring a bell to you, Hadoop as a tool will definitely help you.

      Do let me know if you need any information.

      Reply
  20. swagatika

    Hi Rohit,
    Thanx fr the reply. I hv postponed my exam to 2-3 more weeks seeing my worse preparation level, Can u help with some sample q/ans which can help me to test my knowledge (in case u remember or ny other sort frm book)?

    Reply
    1. Rohit Menon

      Hi Swagatika,

      I would not have any sample questions and answers to share with you.
      The best way to prepare for this is to go through the Yahoo Hadoop Tutorial and Hadoop in Action by Chuck Lam.

      Reply
  21. Dipankar Biswas

    I can see there are two certification listed in cloudera website.

    Cloudera Certified Developer for Apache Hadoop (CCD-410)

    Cloudera Certified Developer for Apache Hadoop CDH4 Upgrade Exam (CCD-470)

    Which one should we take? Does it mean CCD-410 is obsolete?

    Reply
    1. Rohit Menon

      Hi Dipankar,

      The Cloudera website gives the difference in detail.
      In short, one is an upgrade (for people who have done a certification from Cloudera) and the other one is a stand alond certification.
      CCD-410 is not obsolete.

      Reply
  22. Rajesh

    Hi Rohit ,
    I am working on dataware housing my expertise are in informatica and oracle .Now I want to learn hadoop so please suggest among HBASE,HIVE ,PIG what should I learn.
    Also some of my friend went to some institute for training of hadoop (They are also working on database only ) institute counselor told him without knowledge of Mapreduce and HDFS you can’t do any thing in HBASE and HIVE.So you must be start from Map reduce .So please suggest on this also
    If You have any online document for this then also share me .

    Thanks
    Rajesh

    Reply
    1. Rohit Menon

      Hi Rajesh,

      You need to understand the concepts of Apache Hadoop very well. Having an understanding of MapReduce is going to be a plus. But, there are several people who work in Hive alone day in and day out.
      Learn the basics of MapReduce and HDFS. This is important as it will help you understand the inner workings of a Hadoop Cluster.

      You can find some information on my blog and also recently I have launched http://www.hadoopscreencasts.com. Hope you find it useful.

      Do let me know if you need any information.

      Reply
  23. biswojit

    I am having 7 years of experience in Middleware administration & SAP ABAP. Can I switch to Hadoop development field.
    Is it possible that employers will hire me as I don’t have any Hadoop Experience.

    Is it mandatory to put some hadoop experience to switch job.

    Reply
    1. Rohit Menon

      Hi Biswojit,

      There are no qualifications required to learn Hadoop. If you are interested go ahead and learn it. Doing sample projects will help you get a hold of how Hadoop works and will also certify your skills to the prospective employers.

      Reply
      1. biswojit

        Hi Rohit,
        Thanks for your reply as the info given by you are very useful for me to take a decision.

        I will get back to you with some doubts once I complete a thorough reading of prescribed references.

        Thanks once again

        Reply
  24. Naveen

    Hi Rohit,
    I am much happy to see a blog such…

    I have been working in data ware house support for past one year.I do not have hands on experience in any Programming language.I know basics of C# & linux

    I am interested to learn Apache hadoop .Can you please help me on where to start Should i learn Java first to understand Hadoop or is it advisable to start learning the mentioned tutorial ? What skill set should have in order to understand Hadoop.
    Thanks in Advance

    Reply
    1. Rohit Menon

      Hi Naveen,

      The tutorials are a good start. If required you can later get yourself accustomed to Java. There are several abstractions on top of MapReduce like Hive and Pig that do not use any Java which can help you do almost all of your Apache Hadoop related tasks.

      Reply
  25. asn

    I am a java developer since last 3.7 yrs…now i want to shift the paradigm…and figured learning / getting certified in hadoop could be good way.
    Please tell me if a java developer with no background or actual wok profile in hadoop will be able to find job in hadoop.

    Because my aim is to land a good job out of this cert.

    thanks in advance!!

    Reply
    1. Rohit Menon

      Hi asn,

      I can’t comment on you getting a job. But you could definitely work on good sample projects and improve your skills in turn getting good attention from prospective employers.

      Reply
  26. andy

    Rohit,

    Good post. I have some basic question?

    What is Cloudera? database or just a training company that certifies individuals in hadoop.
    In the ecosystem of Hadoop where does Cloudera fit in.

    Reply
  27. Ganesh

    Hi Rohit,

    I am glad that i found a lot of information here. I am having 8 years of Software Configuration and Build/Release management with little programming knowledge on C , Shell, Perl. Will my experience helpful to get into hadoop. What will be the opportunities to me. With this basic programming knowledge, Can i go for Hadoop training.

    Reply
    1. Rohit Menon

      Hi Ganesh,

      Your skills are sufficient to start learning Hadoop. Once you learn Hadoop, start building sample projects and show case them online. That should help people identify you.

      Reply
  28. Dhiru

    Hi Rohit,

    I have round 10 years of experience as database( Oracle PL/SQL) resource. I got training in Hadoop and start practice with Hive but whenever I shared my resume for any Hadoop requirement I got rejected due to lack of Java skill even JD contains Hadoop,Hive,pig etc. It mean you should must have Java knowledge because recruiter required those people who can write MapReduce program.

    Could you please suggest what I need to do ? or do I leave this technology and choose something else.

    Thanks
    Dhiru

    Reply
    1. Rohit Menon

      Hi Dhiru,

      I am not the right person to suggest your career path. Probably asking the prospective employer about the Job Description in detail would be a good first step. Also, if it helps you can pick up basic Java and do a bit of MapReduce as it may come in handy for your interviews. You should be learning and doing anything only if you really want to do it. A good way to prove your skills is to build sample projects and put it up online.

      Reply
  29. smatri

    Hi Rohit,

    I am new to Hadoop and got chance to work on one POC. I have data in hadoop platform and i need to pull this data in CSV format with minimum time. Can you please guide me.

    Thanks

    Reply
  30. Mani

    Hi Rohit,

    I am a Java/J2EE developer with close to 8 years of expereicne.I am looking into taking the Hadoop training course.Will investing huge amount help me in terms of Job hunt in the market or real time work experience is needed to get jobs in Hadoop

    Regards,
    Mani

    Reply
    1. Rohit Menon

      Hi Mani,

      If you are skeptical about the investment in the course, I would suggest learning Hadoop on your own using the references like Yahoo Developer Hadoop Tutorial and books like – Hadoop in Action by Chuck Lam.
      After getting started in Hadoop, you could build sample projects to and put them up on Github or something similar to showcase your knowledge to prospective employers. This should be a good start. Once you have a good grip, you could appear for the certification and that you should give you added recognition. Hope this helps.

      Reply
  31. sesh

    Your blog is good and way u replayed for comments is awesome.
    i have taken a lot of guidance and information.

    i understood clear how to prepare fr certification but only doubt is if someone cant clear in first attempt will he/she will get second attempt with free or pay.

    thx in advance and good work sir 🙂

    Reply
  32. Deepak

    Hi Rohit,

    First of all I want to say thank you, for creating such a good blog and for your valuable inputs to answer all the queries asked by so many people.

    I am new to Hadoop, having experienced in Java for more that 5 years. I read you have mentioned couple of times that to prove your hadoop skills, we should to build some project and post it online for open source.

    Can you please suggest some sample problems that can be resolved through hadoop and be a good project to demonstrate my skills.

    Besides these exactly where should I have to post this project online.

    Thanks,
    Deepak

    Reply
    1. Rohit Menon

      Hi Deepak,

      I am glad that you liked the blog.

      1. You can search for free data sets online. Just google for it and you should find good information on some large data sets.
      2. Once you have that, try and come up with some insight on the data and use Hadoop and the various components of its ecosystem (Hive, Pig, etc) to perform the analysis and insights.
      3. Blog about the data you have and the different analysis you performed on it.
      4. Upload your code to http://www.github.com
      5. Create a good LinkedIn presence and provide links to your blog posts.
      6. Join the various Hadoop groups on LinkedIn and try answering the different questions.

      The above should help you get recognized.
      Do let me know if you need any information.

      Reply
  33. kalyan

    Hi Rohit,
    Learned a lot from your blog. You are doing a great job. Coming to my case i am about to start my master thesis in a sweden university, so im pulling my head to do some research on hadoop. can you please advice me? especially handling node failure’s as information is the primary concern for the industry we would like to send our proposal.

    Waiting for your valuable advice. Thank You
    -kalyan

    Reply
    1. kalyan

      According to some jounals and published thesis i have read many of them verified and validated that as the number of nodes increases, with respect to large size of input data, execution time decreases. Aslo, as the number of instances increases, training time also increases verified through controlled experiment. As we knew it as characteristics of Hadoop itself says this.
      But, Can we think otherside of research in hadoop? pls suggest me if u have any idea’s.

      Thank You..

      Reply
    2. Rohit Menon

      Hi Kalyan,

      I am not sure I am the right person to suggest you a research area. But I am aware of companies like Cloudera and HortonWorks contribute a lot to the Hadoop source.
      The community is actively working and have released High Availability for the NameNode. Not sure if the DataNode failures is a matter of big concern. If you have any articles or posts related to this I would be interested in reading it.

      Reply
      1. kalyan

        Thank you for the suggestion. I will go through that.

        few publications-http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.226.4993&rep=rep1&type=pdf
        -https://publish.inf.ed.ac.uk/publications/thesis/online/IM100859.pdf

        Reply
  34. Ran

    Some insights from my experience with CCD-410:

    – I attended the Cloudera Hadoop course. It is NOT sufficient to pass the certification exam. The main gaps are the job lifecycle (which is covered, but not sufficiently), and YARN (which is hardly touched).
    For the job lifecycle, make SURE you read and understand the Yahoo! document, as it has a beautiful page on the topic.

    – Downloading the Cloudera VM and playing with it (running jobs, monitoring them, understanding them) really helps!

    – For the exam you need to know the role of YARN daemons, and how they map to MRv1 daemons.

    – During the exam, mostly composed of multiple choice questions, PAY. ATTENTION. TO. EVERY. OPTION. Read them thoroughly before answering. If two options seem identical, chances are they have a minor BUT VERY IMPORTANT difference (like a ‘not’ that’s hiding in there…)

    – During the exam you will notice some questions repeat themselves, in different wording. Do not be startled by that.

    Reply
  35. Abhishek Singh

    Hi Rohit,

    Thanks for this informative post, and added thanks for answering so many queries ..

    I m interested in knowing if a Haddop developer job needs ANY kind of prior programming knowledge (JAVA or otherwise) ?

    I am asking this since if this is the case, I will go for Hadoop admin training n career …

    And please tell me the difference in career prospect of the two (U have avoided answering this, but please give some opinions this time.)

    Reply
    1. Rohit Menon

      Hi Abhishek,

      The JAVA knowledge requirement is based on the job description. If the position expects you to write MapReduce programs, you may be required to show your expertise in Java.
      Analytics is a field that is gaining a lot of importance and will continue to grow. Anyone equipped with the tools and techniques that could assist in analytics would definitely be in a good position.

      Reply
  36. Patty

    Hi Rohit,
    Thanks a lot for this blog.. it is realyy very much helpful… but i am having 1 very basic question…
    i am working in hadoop from 1.5 yrs , but i have mainly worked on Pig and Hive…
    i haven’t worked to MR yet.. for further career did MR is necessary..??
    or HDFS, Pig,Hive is sufficient..??

    Reply
    1. Rohit Menon

      Hi Patty,

      There are several systems that work on Pig and Hive entirely So I am sure you can go a long way.
      That being said, you evolve when you learn things that you don’t know. So if you are getting good at Pig and Hive, move onto MapReduce (which needs Java).
      You could also move onto to other components of the Hadoop Ecosystem like (Oozie, Flume, Sqoop, etc.)
      Never stop learning…

      Reply
  37. Kevin

    I just passed the cert today! Though.. I have not seen this blog prior. What Rohit says is exactly right. They don’t ask you programming questions and they are all based conceptually. If it matters, I have not taken cloudera training course but I did take a hadoop course who’ve had extensive experience. The course itself was not enough to pass the cert as you have to know in GREAT details on how map/reduce lifecycle works. My biggest reference was Hadoop the Definiteive Guide. Reading this book wasn’t easy even w/ that course and had to Google many times to get the details. Anyways, just thought I share on how I passed the cert.

    Reply
  38. arghya B

    Dear Rohit,

    Great to find this site answering so many queries.

    I am planning to learn Hadoop Bug data development and architecture. But need to do hands on as well. Can you please refer me a simple guide how to set up a Hadoop environment in my laptop to try the features ?

    Also please let me know what possible sample projects can really be earmark to attract recruiters.

    Thanks in advance !

    Reply
  39. Janardhan

    Hi Rohit,

    I found this blog very helpful and thanks in advance. I am planning to do certification for which I am preparing the definitive guide, however it’s taking a lot of time.

    As I want to expedite the learning,
    1. I would go through the YDN tutorial.
    2. I am not sure whether to choose between Hadoop in Action or Hadoop in Practice, the later being published recently.
    Also Cloudera suggests all the three books mentioned here.

    Could you please guide me in choosing the right book?

    Much appreciated.
    -Janardhan

    Reply
    1. Rohit Menon

      Hi Janaradhan,

      I would suggest that you start the YDN tutorail. That will take you a long way. After YDN, you will be in a good position to select the book that suits you best.
      All the best!

      Reply
  40. Maroof Kazmi

    Ho Rohit Sir,

    First of all , thanks for such a wonderful and informative blog which gives an impressive insight and inspiration of Learning Hadoop.
    Sir, I completed my B.tech in CSE this year and am keenly interested in learning Hadoop .
    Is self study sufficient for me (using yahoo hadoop tutorial and the reference books) to develop sample projects and post them online?

    Thanks & Regards.

    Reply
  41. Rakesh

    Hey Rohit,

    I think this is the most useful place I came across while looking for information on Hadoop Certification.

    I am too planning to take on CCAH (Admin).

    Currently I am following the below link as my study guide:
    http://cloudera.com/content/cloudera/en/training/certification/ccah/prep.html

    Do you have any study material for this one?

    Or if you can please guide me in the right direction, would be great help!

    Keep up the good work!
    Thanks a Lot!

    Reply
    1. Rohit Menon

      Hi Rakesh,

      I am glad you found this blog useful.
      For CCAH, I would recommend the Hadoop Operations book along with Hadoop Definitive Guide.
      However, if you are new to Hadoop, I would first recommend the Yahoo Developer Network’s Hadoop Tutorial.
      It just makes the whole learning experience easier.
      Hope this helps.

      Reply
      1. Rakesh

        Hi Rohit,

        Thanks for your inputs!

        Though I have been going through these dense books, Is it good to completely read these books or shall I do selective study?

        I have worked on Hadoop VMs, Hive Pig sqoop as a part of POCs. But not any real time projects.

        Would you also recommend to set up a cluster (using VMs or otherwise) so as to practice the CCAH concepts?

        Thanks,
        Rakesh

        Reply
        1. Ankita

          Hi Rohit,

          I am working on Hadoop from past 1 year but I havn’t worked on any projects. I have done installations and small POC’s on Hadoop components.
          Now i am thinking to go for Cloudera Training and Certification.
          Would you recommend me in getting my basic concepts crystal clear.
          Will they provide me to get hands on in their 4 days training.
          Please guide me.

          Reply
          1. Rohit Menon

            Hi Ankita,

            If you have been working on Hadoop for past year, I don;t think you need to go for the course.
            Spend time reading the newer technologies being built on top of Hadoop, like, Cloudera Impala, HortonWorks Stinger intiative, etc.
            Try your own POCs on these newer abstractions.

  42. Neal

    Rohit,
    What is your take on the bigdatauniversity courses? They are FREE($0) and though some of the videos are of bad quality(hey backend developers are more content-oriented :), seems good. Also I think IBM is behind it..so should have some value passing each course.

    Regards
    Neal

    Reply
  43. Gopakumar.G

    Hi Rohit,

    Hope you are doing good. Your blog was an inspiration for me to take a decision to go for hadoop certification. I am referring Tom White – Hadoop Definitive guide. Today I have ordered for Hadoop in Action also as per your suggestion.

    I would like to know whether Couldera Hadoop Developer course is mandatory for attending CCDH exam? Please reply.

    Regards
    Gopakumar.G

    Reply
  44. Nirali

    Your blog is very good and thanks for useful information

    I am planning to give the hadoop developer examination by the end of March.
    I am fresher.

    I have started going through the Yahoo Hadoop Tutorial and hadoop action.
    I have installed Hadoop on single machine and practice map – reduce program .

    But not installed multiple vms. So it is necessary install multiple vms for examination.?

    Reply
  45. J Lake

    Hi,

    I have an ETL background. You mentioned that after getting the certification it is a good idea to do a project and post it online. What would be some good project that will display a sufficent level of expertise?

    Reply
    1. Rohit Menon

      Hi there,

      Since you have an ETL background, you should try and showcase it using Hadoop. Pig is an ETL framework designed to be used with large volumes of data stored on Hadoop.
      So, if I were you, I would search for large datasets online (they are freely available) and apply all possible ETL techniques using Pig.

      For e.g. try getting the traffic data of a city and apply some ETL on top of it and generate some simple analysis with it, like the peak hours, low traffic hours etc and put it up online, like your blog or github.
      This will help others see that you are trying stuff with what you have learned. Also explore other components like Hive, Sqoop, etc.

      You could also try building recommendation engines from datasets like movie datasets, music datasets, etc.
      Hope I was able to give you some direction. Do let me know if you need any information.

      Reply
      1. Geetha

        Hi,
        I do manual testing and i am considering to take Hadoop training. Do i have to learn java first, because i do not know any coding. pl let me know where should i start.
        Thanks
        Geetha

        Reply
  46. Veera

    Simply recommending what you learned is simply superb , I just started reading YDN , fabulous examples and it cleared lot of architecture doubts there . Really spent time over there

    Reply
  47. Rajiv

    Hi
    Hadoop in Action looks like a good book to start with Hadoop ecosystem. But one of the customer reviews on Amazon says its outdated ( http://www.amazon.com/Hadoop-Action-Chuck-Lam/product-reviews/1935182196/ref=dp_top_cm_cr_acr_txt?showViewpoints=1 ). So I was wondering if there is a better option or do you think reading this book will still help? Also if you still suggest me to read this book, what else could I read after this to stay up to date (Does reading Hadoop definitive guide after this make sense?). Thanks.

    Reply
  48. Prassanna

    Hi Rohit,

    I am very much interested in hadoop adminstration ,so by myself I studied some basics about hadoop functionality and Adminstration, thorugh online and some books like hadoop operations and definitive guide, and i installed and ran pseudo mode Hadoop in my laptop ,now i want to learn further like other important concepts of hadoop adminstration,can you please prefer me some good and useful books to learn the latest hadoop adminstration.Thanks in Advace.

    Reply
    1. Rohit Menon

      Hi Prasanna,

      Going through the official Hadoop site should help you get information on the latest developments.
      Also following organizations like Hortonworks and Cloudera and trying out their distributions will also be useful.
      Go through their documentations. It is pretty good.

      Reply
  49. vinodh

    Hi,
    I have more than a decade of java hands on exp.
    I want to do cloudera certification. question is whether we can prepare for certification on our own or training from institutes like edureka.in / edu pristine is helpful.
    regards
    vinodh

    Reply
    1. Rohit Menon

      Hi Vinodh,

      As you would see in the several comments on this blog, training is not a must.
      You can clear this certification by reading good books alone.

      All the best.

      Reply
  50. Meenakshi Sundaram

    Hi Rohit,

    Thanks for sharing your knowledge !

    I’m just starting on Hadoop and I’m planning to install a single node cluster in my windows machine
    Could you please advise how to setup a one in windows 7 OS machine ? is it possible ?

    I saw your video on ther page how to install a hadoop cluster but looks like its for Linux os and I dont have one now with me…

    I just browsed and finally Landed in the HortonWorks vmware installation …But i’m looking to install a one on my one ? could you please help me here …

    Reply
    1. Meenakshi Sundaram

      Also I’m a ETL Developer (datawarehousing ) with 7+ years of experience on the Abinitio ETL tool …I have a no Java Work experience but I have learnt Java when i was a fresher…Im a good Unix schell scripter and Good on datawarehousing side…Could you please advise me on which part of the Hadoop should i start focusing on If i have to become a Hadoop Developer ? Say Mapreduce,PIG, Hive Please advise me
      Thanks and Regards
      Meenakshi Sundaram

      Reply
      1. Rohit Menon

        If you can understand the development stack/tools provided by Hadoop like the ones you have already mentioned you should have a good head start. Do some projects to get some familiarity. That should be of good help. Also, spend time reading about all these tools extensively.

        Reply
  51. Chandan Mishra

    Hi Rohit,

    I have 7-8 months of experience in software field and now i am planing to do OCJP and hadoop with mongoDB so that i can switch to hadoop but can you please tell me, is it mandatory to do CCD and without having any experience in hadoop can i get job because i have not seen any job for fresh hadoop developer.
    Thanks in advance.

    Reply
  52. anit

    hi rohit,

    i am php web developer havig more that 2 years of experience, but now i want to boost my career as i read articles and news about the big data hadoop, i think there is a bright future in it. please guide me whether i will go for its certification or not. if yes , then from where i have to start?? and what please enlighten me some some more on its future prespective.

    Thanks 🙂

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *