Pages

Saturday 25 March 2023

Hadoop Pipes

Hadoop Pipes allows C++ code to use Hadoop DFS and map/reduce. In many ways, the approach will be similar to Hadoop streaming, but using Writable serialization to convert the types into bytes that are sent to the process via a socket.


The class org.apache.hadoop.mapred.pipes.Submitter has a public static method to submit a job as a JobConf and a main method that takes an application and optional configuration file, input directories, and output directory. 
bin/hadoop pipes \
 [-input inputDir] \
 [-output outputDir] \
 [-jar applicationJarFile] \
 [-inputformat class] \
 [-map class] \
 [-partitioner class] \
 [-reduce class] \
 [-writer class] \
 [-program program url] \ 
 [-conf configuration file] \
 [-D property=value] \
 [-fs local|namenode:port] \
 [-jt local|jobtracker:port] \
 [-files comma separated list of files] \ 
 [-libjars comma separated list of jars] \
 [-archives comma separated list of archives]

Hadoop Pipes has a generic Java class for handling the mapper and reducer (PipesMapRunner and PipesReducer). They fork off the application program and communicate with it over a socket. The communication is handled by the C++ wrapper library and the PipesMapRunner and PipesReducer

No comments:

Post a Comment

Friends-of-friends-Map Reduce program

Program to illustrate FOF Map Reduce: import java.io.IOException; import java.util.*; import org.apache.hadoop.conf.Configuration; import or...