Running a MapReduce WordCount Example in STANDALONE MODE.
software Requirements:
Oracle Virtual Box
Ubuntu Desktop OS (64bit)
Hadoop-3.1.0
OpenJdk version-8
Minimum RAM required: 4GB (Suggested: 8GB)
Minimum Free Disk Space: 25GB
Minimum Processor i3 or above
To following diagram summarizes the flow of Map reduce algorithm
1. The input data can be divided into n number of chunks depending upon the amount of
data and processing capacity of individual unit.
2. Next, it is passed to the mapper functions. Please note that all the chunks are processed
simultaneously at the same time, which embraces the parallel processing of data.
3. After that, shuffling happens which leads to aggregation of similar patterns.
4. Finally, reducers combine them all to get a consolidated output as per the logic.
5. This algorithm embraces scalability as depending on the size of the input data, we can keep increasing the number of the parallel processing units.
Installation steps:
bda@bda-VirtualBox:~$ ls
Desktop Downloads hadoop-3.3.4 Music output1 Public Templates
Documents examples.desktop hadoop-3.3.4.tar.gz out2 Pictures sam.txt Videos
bda@bda-VirtualBox:~$ cd /usr/lib/hadoop3/
bda@bda-VirtualBox:/usr/lib/hadoop3$ ls
bin etc include lib libexec LICENSE-binary licenses-binary LICENSE.txt NOTICE-binary NOTICE.txt README.txt sbin share
bda@bda-VirtualBox:/usr/lib/hadoop3$ cd bin/
bda@bda-VirtualBox:/usr/lib/hadoop3/bin$ ls
container-executor hadoop hadoop.cmd hdfs hdfs.cmd mapred mapred.cmd oom-listener test-container-executor yarn yarn.cmd
bda@bda-VirtualBox:~$ ls
welcome to hadoop
class hadoop is
good hadoop is
bad
input.txt
bda@bda-VirtualBox:~/mapin$ cat input.txt
welcome to hadoop
class hadoop is
good hadoop is
bad
bda@bda-VirtualBox:~$ ls
Desktop Downloads hadoop-3.3.4 mapin Music Public Templates
Documents examples.desktop hadoop-3.3.4.tar.gz mapout Pictures sam.txt Videos
part-r-00000 _SUCCESS
3 hadoop
Output:
Limitations:
Standalone mode is the default mode of operation of Hadoop and it runs on a single node ( a node is your machine). HDFS and YARN doesn't run on standalone mode.
Conclusion:
Standalone Mode is the default operation of Hadoop Eco System where the hadoop services will run in the Single JVM. As in this experiment basic Java installation and extraction of the Hadoop files are sufficient to run the Hadoop services and Mapreduce wordcount Program.
No comments:
Post a Comment