Title:
Installation of HADOOP in PSEUDO DISTRIBUTION MODE.
Objective:
Pseudo distribution is the distribution mode of operation of Hadoop and it runs on a single node ( a node is your machine).
Requirements:
Software Requirements:
- Oracle Virtual Box
- Ubuntu Desktop OS (64bit)
- Hadoop-3.1.0
- OpenJdk version-8
- SSH
Hardware Requirements:
- Minimum RAM required: 4GB (Suggested: 8GB)
- Minimum Free Disk Space: 25GB
- Minimum Processor i3 or above
Analysis:
Pseudo-Distributed mode stands between the standalone mode and fully distributed mode on a production level cluster. It is used to simulate the actual cluster. It simulated 2 node — a master and a slave by running JVM process. it gives you a fully-fledged test environment. HDFS is used for storage using some portion of your disk space and YARN needs to run to manage resources on this Hadoop installation.
Installation Procedure In UBUNTU:
1. open terminal
2. sudo apt update
Install OpenSSH on Ubuntu
3.sudo apt install openssh-server openssh-client -y
Enable Passwordless SSH for Hadoop User
4.ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
/*Use the
cat
command to store the public key as authorized_keys in the ssh directory*/
5.cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
/*Set the permissions for your user with the
chmod
command*/
6.chmod 0600 ~/.ssh/authorized_keys
/*
The new user is now able to SSH without needing to enter a password every time*/
7.ssh localhost
8.logout
Java Installation
9. sudo apt install openjdk-8-jdk
(java home : /usr/lib/jvm/Java-8-openjdk-amd64)
Installation of HADOOP
Download the Hadoop file from hadoop. apache.org
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
/* file extraction */
10. tar -zxvf hadoop-3.3.4.tar.gz
/* creatio of hadoop home directory */
11. sudo mkdir /usr/lib/hadoop3
/* change ownership to hadoop3 */
12. sudo chown username /use/lib/hadoop3
/* Move extracted file to hadoop home directory */
13. sudo mv hadoop-3.3.4/* /usr/lib/hadoop3
14 cd /usr/lib/hadoop3
cd /home/username (change directory to ubuntu home)
pwd (present working directory)
Setting of paths
15. sudo gedit ~/.bashrc
( set the HADOOP_PREFIX & JAVA_HOME at the end of the file)
export HADOOP_PREFIX=/usr/lib/hadoop3
export PATH =$PATH:$HADOOP_PREFIX/bin
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH =$PATH:$JAVA_HOME/bin
/* execution of bash profile
16. exec bash
/* Reloading of bash profile
17. source ~/.bashrc
18 cd /usr/lib/hadoop3/etc/hadoop
19. ls (find hadoop-env.sh)
20. sudo gedit hadoop-env.sh (optional)
Edit the XML files in hadoop
21. cd /usr/lib/hadoop3/etc/hadoop
Configure All the Following files by using gedit
sudo gedit core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
sudo gedit hdfs-site.xml
sudo gedit mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADO
OP_YARN_HOME,HADOOP_MAPRED_HOME
</value>
</property>
</configuration>
RUNNING of HADOOP SERVICES
22. hadoop namenode -format
/* starting of the dfs services - Its starts Name Node, Data Node, Secondary Name
Node */
23. sbin/start-dfs.sh
/*Starting of Yarn Services - Its starts Resource manager and Node Manager */
24. sbin/start-yarn.sh
23. jps
Limitations:
Pseudo Distribution mode is the partial distribution mode of operation of Hadoop and it runs on a single node ( a node is your machine) where HDFS and YARN services will run in the individual JVMs but resides in the Same System.
Conclusion:
The Hadoop daemons run on a local machine, thus simulating a cluster on a small scale.Different Hadoop daemons run in different JVM instances, but on a single machine. HDFS is used instead of local FS. As far as pseudo-distributed setup is concerned, you need to set at least following 4 properties along with JAVA_HOME which are core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml.