Pages

Monday, 17 May 2021

HIVE Installation on top of Hadoop

Apache Hive is an enterprise data warehouse system used to query, manage, and analyze data stored in the HDFS
 
The Hive Query Language (HiveQL) facilitates queries in a Hive command-line interface shell. Hadoop can use HiveQL as a bridge to communicate with relational database management systems and perform tasks based on SQL-like commands. 
 
To configure Apache Hive, first you need to download and unzip Hive. Then you need to customize the following files and settings:
  • Edit .bashrc file
  • Edit hive-config.sh file
  • Create Hive directories in HDFS
  • Configure hive-site.xml file
  • Initiate Derby database 

Step 1: Download and Untar Hive  

Download the compressed Hive files using wget command

wget https://downloads.apache.org/hive/hive-3.1.2/apa
che-hive-3.1.2-bin.tar.gz

Once the download process is complete, untar the compressed Hive package:

tar xzf apache-hive-3.1.2-bin.tar.gz

Step 2: Configure Hive Environment Variables (bashrc)

Edit the .bashrc shell configuration file using using nano

hduser@rk-virtual-machine:~$ nano .bashrc
 

Save and exit the .bashrc file once you add the Hivevariables.

hduser@rk-virtual-machine:~$ source ~/.bashrc 

Step 3: Edit hive-config.sh file

hduser@rk-virtual-machine:~$ cd apache-hive-3.1.2-bin/
hduser@rk-virtual-machine:~/apache-hive-3.1.2-bin$ cd bin
hduser@rk-virtual-machine:~/apache-hive-3.1.2-bin/bin$ ls

 

Step 4: Create Hive Directories in HDFS

Create two separate directories to store data in the HDFS layer:

  1. The temporary, tmp directory is going to store the intermediate results of Hive processes.
  2. The warehouse directory is going to store the  Hive related tables

1.Create tmp Directory

Create a tmp directory within the HDFS storage layer. This directory is going to store the intermediary data Hive sends to the HDFS:

hdfs dfs -mkdir /tmp

Add write and execute permissions to tmp group members:

hdfs dfs -chmod g+w /tmp

Check if the permissions were added correctly:

hdfs dfs -ls /

The output confirms that users now have write and execute permissions.


2.Create warehouse Directory

Create the warehouse directory within the /user/hive/ parent directory:

hdfs dfs -mkdir -p /user/hive/warehouse

Add write and execute permissions to warehouse group members:

hdfs dfs -chmod g+w /user/hive/warehouse

Check if the permissions were added correctly:

hdfs dfs -ls /user/hive

The output confirms that users now have write and execute permissions.

Step 5: Configure hive-site.xml File (Optional)

Use the following command to locate the correct file:

hduser@rk-virtual-machine:~/apache-hive-3.1.2-bin$ cd conf/

Use the hive-default.xml.template to create the hive-site.xml file:

cp hive-default.xml.template hive-site.xml

Access the hive-site.xml file using the nano text editor:

sudo nano hive-site.xml

add this file in the beginning 
  <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:,databaseName=$HIVE_HOME/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore </description>
  </property>
 
                                  (OR)
 
<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>
 
and add another file in middle as shown 
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

Note : After running the command schematool –initSchema –dbType derby we need to set guava jar as shown below then you need to open the file hive-site.xml and then we need identify the error usually the error may occur at 3224 line of the code you need to set by deleting the error and then run the command 

schematool -initSchema -dbType derby

Step 6: Initiate Derby Database

Initiate the Derby database, from the Hive bin directory using the schematool command:

hduser@rk-virtual-machine:~/apache-hive-3.1.2-bin/bin$ 
schematool -initSchema -dbType derby

The process can take a few moments to complete.

The schematool command has initiated the Derby database.

Derby is the default metadata store for Hive. If you plan to use a different database solution, such as MySQL or PostgreSQL, you can specify a database type in the hive-site.xml file.

How to Fix guava Incompatibility Error in Hive

If the Derby database does not successfully initiate,  you might receive an error with the following content:

“Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V”

This error indicates that there is most likely an incompatibility issue between Hadoop and Hive guava versions.

Locate the guava jar file in the Hive lib directory:

ls $HIVE_HOME/lib

Location of Hive guava jar file.

Locate the guava jar file in the Hadoop lib directory as well:

ls $HADOOP_HOME/share/hadoop/hdfs/lib

Location of Hadoop guava jar file.

The two listed versions are not compatible and are causing the error. Remove the existing guava file from the Hive lib directory:

rm $HIVE_HOME/lib/guava-19.0.jar

Copy the guava file from the Hadoop lib directory to the Hive lib directory:

cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-27.0-jre.jar $HIVE_HOME/lib/

Use the schematool command once again to initiate the Derby database:

$HIVE_HOME/bin/schematool –initSchema –dbType derby

Launch Hive Client Shell on Ubuntu

Start the Hive command-line interface using the following commands:

cd $HIVE_HOME/bin
hive

You are now able to issue SQL-like commands and directly interact with HDFS.


HIVE Installation Screenshots on top of Hadoop 

  

At the beginning add this code : 

1.Put the following at the beginning of hive-site.xml

  <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:,databaseName=$HIVE_HOME/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore </description>
  </property>
 

At the middle add this code : 
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
 

How to Fix guava Incompatibility Error in Hive:

If Still error occurs then change the file hive-site.xml 
which is located at 

ctrl +w
ctrl + t - go to line number 3224 


Launch Hive Client Shell on Ubuntu:

if error occurs in installation of hive :

This error occurs when hive-shell started before metastore_db 

service. To avoid this just delete or move your metastore_db 

and try the below command.

$ mv metastore_db metastore_db.tmp

$ schematool -dbType derby -initSchema

$./bin/hive

 

 


No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...