Apache Hive – Getting Started

The Apache Hive™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Source : hive.apache.org

To install Apache Hive you can follow the instruction on Hadoop Screencasts – Episode 4 – Installing Apache Hive.

This post is a fast paced, instruction based tutorial that dives directly into using Hive.

Creating a database

A database can be created using the CREATE DATABASE command at the hive prompt.

Syntax:

 CREATE DATABASE <database_name> 

E.g.

hive> CREATE DATABASE test_hive_db;
OK
Time taken: 0.048 seconds

The CREATE DATABASE command creates the database under HDFS at the default location: /user/hive/warehouse
This can be verified using the DESCRIBE command.

Syntax:

DESCRIBE DATABASE <database_name>

E.g.

hive> DESCRIBE DATABASE test_hive_db;
OK
test_hive_db hdfs://localhost:54310/user/hive/warehouse/test_hive_db.db
Time taken: 0.042 seconds, Fetched: 1 row(s)

Using a database

To use a database we can use the USE command.

Syntax:

USE <database_name>

E.g.

hive> USE test_hive_db;
OK
Time taken: 0.045 seconds

Dropping a database

To drop a database we can use the DROP DATABASE command.

Syntax:

DROP DATABASE <database_name>;

E.g.

hive> DROP DATABASE test_hive_db;
OK
Time taken: 0.233 seconds

To drop a database that has tables within it, you need to use the CASCADE directive along with the DROP DATABASE command.

Syntax:

DROP DATABASE <database_name> CASCADE;

In the next post, we will be creating tables with data and performing some basic queries on them.

Leave a Reply

Your email address will not be published. Required fields are marked *