I have a problem where I need to chain
Mapper >> Reducer >> Reducer
This is my data:
And finally I want something like this:
Dept1 >> Total_Salary_Dept_1
One major issue is my first reducer is not getting called when I use multiple files as input.
The second issue is that I can't pass that output to next reducer. (ChainReducer can't chain 2 reducers)
I was using this as a reference but quickly realized it won't help.
I found this link where, in one of the comments the author says this: "In Hadoop 2.X series, internally you can chain mappers before reducer with ChainMapper and chain Mappers after reducer with ChainReducer."
Does this mean I will have a structure like this:
Chain Mapper(mapper 1) --> Chain Reducer(reducer 1) --> ChainMapper(unnecessary mapper) --> Chain Reducer(rreducer 2)
And if this is the case then how exactly is the data handed off from Reducer 1 to Mapper 2?
Can someone help me out?
This is my code so far.
package Aggregate;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.chain.ChainMapper;
import org.apache.hadoop.mapreduce.lib.chain.ChainReducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.map.InverseMapper;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class Sales extends Configured implements Tool{
public static class CollectionMapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] vals = value.toString().split(",");
context.write(new Text(vals[0]), new Text(vals[1]));
public static class DeptSalaryJoiner extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
ArrayList<String> DeptSal = new ArrayList<>();
for (Text val : values) {
context.write(new Text(DeptSal.get(0)), new Text(DeptSal.get(1)));
public static class SalaryAggregator extends Reducer<Text, Text, Text, IntWritable>{
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Integer totalSal = 0;
for (Text val : values) {
Integer salary = new Integer(val.toString());
totalSal += salary;
context.write(key, new IntWritable(totalSal));
public static void main(String[] args) throws Exception {
int exitFlag = ToolRunner.run(new Sales(), args);
public int run(String[] args) throws Exception {
String input1 = "./emp.csv";
String input2 = "./dept.csv";
String output = "./DeptAggregate";
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Sales");
Configuration mapConf = new Configuration(false);
ChainMapper.addMapper(job, CollectionMapper.class, LongWritable.class, Text.class, Text.class, Text.class, mapConf);
Configuration reduce1Conf = new Configuration(false);
ChainReducer.setReducer(job, DeptSalaryJoiner.class, Text.class, Text.class, Text.class, Text.class, reduce1Conf);
Configuration reduce2Conf = new Configuration(false);
ChainReducer.setReducer(job, SalaryAggregator.class, Text.class, Text.class, Text.class, IntWritable.class, reduce2Conf);
FileInputFormat.addInputPath(job, new Path(input1));
FileInputFormat.addInputPath(job, new Path(input2));
try {
File f = new File(output);
} catch (Exception e) {
FileOutputFormat.setOutputPath(job, new Path(output));
return job.waitForCompletion(true) ? 0 : 1;
I have successfully installed hadoop 3.0.0 stand alone to run on Ubuntu 16.04.
I created a jar using the following code from Apache hadoop tutorial.
import java.io.IOException
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WDCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
context.write(word, one);
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write(key, result);
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Creating WDCount.jar was successful with no errors
Then I created Input and Output folders and Made a text file with a phrase in and saved it as fileo1.txt in the input folder.
I created this text to run hadoop on the WDCount.jar
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/WDCount.jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Output
When I run the code I get this message;
Exception in thread "main" java.lang.ClassNotFoundException: /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:232)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Can anyone tell me what is wrong?
Include name of the class file containing main method after jar name
usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/WDCount.jar WDCount /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Output
I am wondering to see this issue while parsing the file by Mapper. My code is pretty simple, I am taking the data by "::" separated file line.
For example (input):
1::Toy Story (1995)::2077
Using below snip code of mapper which I usually doing in my practice
String tokens[]= value.toString().split("::");
int empId = Integer.parseInt(tokens[0]) ;
int count = Integer.parseInt(tokens[2]) ;
Technically line should split as below.
1 Toy Story (1995) 2077
tokens[0] tokens[1] tokens[2]
So, If I am looking for tokens[0] and tokens[2] then also why job is picking tokens[1], which is throwing below NumberFormatException exception and this is expected exception if I am trying to parse char to int. Could you please help me out from this.
17/09/05 19:06:49 INFO mapreduce.Job: Task Id : attempt_1500305785265_0095_m_000000_2, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "1::Toy Story (1995)::2077"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at com.dataflair.comparableTest.ValueSortExp$MapTask.map(ValueSortExp.java:93)
at com.dataflair.comparableTest.ValueSortExp$MapTask.map(ValueSortExp.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
import java.io.IOException;
import java.nio.ByteBuffer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.IntWritable.Comparator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class ValueSortExp2 {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration(true);
String arguments[] = new GenericOptionsParser(conf, args).getRemainingArgs();
Job job = new Job(conf, "Test commond");
// Setup MapReduce
// Specify key / value
// Input
FileInputFormat.addInputPath(job, new Path(arguments[0]));
// Output
FileOutputFormat.setOutputPath(job, new Path(arguments[1]));
* // Delete output if exists FileSystem hdfs = FileSystem.get(conf); if
* (hdfs.exists(outputDir)) hdfs.delete(outputDir, true);
* // Execute job int code = job.waitForCompletion(true) ? 0 : 1;
* System.exit(code);
// Execute job
int code = job.waitForCompletion(true) ? 0 : 1;
/*public static class IntComparator extends WritableComparator {
public IntComparator() {
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt();
Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt();
return v1.compareTo(v2) * (-1);
public static class MapTask extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
public void map(LongWritable key,Text value, Context context) throws IOException, InterruptedException {
String tokens[]= value.toString().split("::");
int empId = Integer.parseInt(tokens[0]) ;
int count = Integer.parseInt(tokens[2]) ;
context.write(new IntWritable(count), new IntWritable(empId));
public static class ReduceTask extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
public void reduce(IntWritable key, Iterable<IntWritable> list, Context context)
throws java.io.IOException, InterruptedException {
for (IntWritable value : list) {
context.write(key, value);
1::Toy Story (1995)::2077
10::GoldenEye (1995)::888
100::City Hall (1996)::128
1000::Curdled (1996)::20
I'm trying to use the basic word count as defined here. Is it possible that when the IntSumReducer does context.write, that context.write could be passed to a second reducer or output class that would reduce/change the final list given by the IntSumReducer down to a single largest frequency?
I am quite new to Hadoop/MapReduce and the concept of jobs in Java so I'm uncertain how exactly I would need to modify the default WordCount to comply to make that possible. Could I write a second Reducer function and place it inside of the same job? How would I do that? How would I signal that there is another reducer to be run after IntSumReducer?
Base WordCount:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
context.write(word, one);
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write(key, result);
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
What you're looking for is called a Combiner in hadoop, which does some semi-reduction before emitting the output to a final reducer class. For more info on it click here.
I am learning Hadoop and tried executing my Mapreduce program. All Map tasks and Reducer tasks are completed fine, but Reducer Writing Mapper Output into Output file. It means Reduce function not at all invoked. My sample input is like below
and the expected output is like below
1 a,b,c
2 s,d
Below is my Program.
package patentcitation;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MyJob
public static class Mymapper extends Mapper <Text, Text, Text, Text>
public void map (Text key, Text value, Context context) throws IOException, InterruptedException
context.write(key, value);
public static class Myreducer extends Reducer<Text,Text,Text,Text>
StringBuilder str = new StringBuilder();
public void reduce(Text key, Iterable<Text> value, Context context) throws IOException, InterruptedException
for(Text x : value)
if(str.length() > 0)
context.write(key, new Text(str.toString()));
public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "PatentCitation");
FileSystem fs = FileSystem.get(conf);
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
if(fs.exists(new Path(args[1]))){
//If exist delete the output path
fs.delete(new Path(args[1]),true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Same question is asked here, I used the Iterable value in my reduce function as the answer suggested in that thread. But that doesnt fix the issue. I cannot comment there since my reputation score is low. So created the new Thread
Kindly help me where am doing wrong.
You have made few mistakes in your program. Following are the mistakes:
In the driver, following statement should be called before instantiating the Job class:
In reducer, you should put the StringBuilder inside the reduce() function.
I have modified your code as below and I got the output:
E:\hdp\hadoop-\bin>hadoop fs -cat /out/part-r-00000
1 c,b,a
2 d,s
Modified code:
package patentcitation;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class MyJob
public static class Mymapper extends Mapper <Text, Text, Text, Text>
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
context.write(key, value);
public static class Myreducer extends Reducer<Text,Text,Text,Text>
public void reduce(Text key, Iterable<Text> value, Context context) throws IOException, InterruptedException
StringBuilder str = new StringBuilder();
for(Text x : value)
if(str.length() > 0)
context.write(key, new Text(str.toString()));
public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "PatentCitation");
FileSystem fs = FileSystem.get(conf);
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
/*if(fs.exists(new Path(args[1]))){
//If exist delete the output path
fs.delete(new Path(args[1]),true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
I get the following error when I execute my alphabet count program.
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at com.example.AlphabetCount$Map.map(AlphabetCount.java:40)
Command used to run: ./bin/hadoop jar /home/ubuntu/Documents/AlphabetCount.jar input output
I have browsed and checked the first eight links when I google using the error message. I have implemented their advice and yet the error message appears. Can you help me out, please?
package com.example;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class AlphabetCount {
public static class Map1 extends
Mapper<LongWritable, Text, Text, IntWritable> {
private Text alphabet = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
byte[] byteArray = line.getBytes();
int sum = 0;
for (int i = 0; i < byteArray.length; i++) {
if ((byteArray[i] == 'a') || (byteArray[i] == 'A')) {
sum += 1;
context.write(alphabet, new IntWritable(sum));
public static class Reduce1 extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> value,
Context context) throws IOException, InterruptedException {
final Text alphabet = new Text();
int sum = 0;
while (value.hasNext()) {
sum = sum + value.next().get();
context.write(alphabet, new IntWritable(sum));
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
Update: (Solution) the above code works! I was getting the error because the jar I was executing was different from the jar I was updating with the above code! I had initially exported the jar (with erroneous code) from eclipse to location x and subsequently I was updating the code in location y but still executing the jar from location x! damn!
Try specifying your input and output formats classes in your main method and also the input key format of your Mapper. You should have something similar to this :
public class AlphabetCount {
public static class Map1 extends
Mapper<Text, Text, Text, IntWritable> {
public void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
public static class Reduce1 extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> value,
Context context) throws IOException, InterruptedException {
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf);