Working with Hadoop Streaming

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer. For example:

shell> $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar  -input myInputDirs -output myOutputDir -mapper /bin/cat -reducer /bin/wc

If you using the tar package from Apache Hadoop. You can find the hadoop-streaming.jar in $HADOOP_HOME/contrib/streaming/hadoop-streaming-xxx.jar

Leave a Reply