Hadoop Reducer Not Writing anything despite writing to context

Hadoop Reducer Not Writing anything despite writing to context - java

I run the exported jar as a mapreduce job hadoop and 0 bytes are being written the to output file.
LOGS
2022-10-22 21:38:19,004 INFO mapreduce.Job: map 100% reduce 100%
2022-10-22 21:38:19,012 INFO mapreduce.Job: Job job_1666492742770_0009 completed successfully
2022-10-22 21:38:19,159 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=1134025
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=446009085
HDFS: Number of bytes written=0
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=4
Launched reduce tasks=1
Rack-local map tasks=4
Total time spent by all maps in occupied slots (ms)=38622
Total time spent by all reduces in occupied slots (ms)=6317
Total time spent by all map tasks (ms)=38622
Total time spent by all reduce tasks (ms)=6317
Total vcore-milliseconds taken by all map tasks=38622
Total vcore-milliseconds taken by all reduce tasks=6317
Total megabyte-milliseconds taken by all map tasks=39548928
Total megabyte-milliseconds taken by all reduce tasks=6468608
Map-Reduce Framework
Map input records=3208607
Map output records=0
Map output bytes=0
Map output materialized bytes=24
Input split bytes=424
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=24
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=505
CPU time spent (ms)=9339
Physical memory (bytes) snapshot=2058481664
Virtual memory (bytes) snapshot=2935365632
Total committed heap usage (bytes)=1875378176
Peak Map Physical memory (bytes)=501469184
Peak Map Virtual memory (bytes)=643743744
Peak Reduce Physical memory (bytes)=206155776
Peak Reduce Virtual memory (bytes)=384512000
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=446008661
File Output Format Counters
Bytes Written=0
any help appreciated!
Map Function :
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException {
String line = Value.toString();
if(line.length() == 0 && !line.contains("MAX")) {
String date = line.substring(14,21);
float temp_Max;
float temp_Min;
try {
temp_Max = Float.parseFloat(line.substring(104,108).trim());
}catch(NumberFormatException e) {
temp_Max = Float.parseFloat(line.substring(104,107).trim());
}
try {
temp_Min = Float.parseFloat(line.substring(112,117).trim());
}catch(NumberFormatException e) {
temp_Min = Float.parseFloat(line.substring(112,116).trim());
}
if(temp_Max > 35.0) {
context.write(new Text("Hot Day" + date), new FloatWritable(temp_Max));
}
if(temp_Min < 10) {
context.write(new Text("Cold Day" + date), new FloatWritable(temp_Min));
}
}
}
Reducer Function:
public static class MaxMinTemperatureReducer extends Reducer<Text, Text, Text, FloatWritable> {
FloatWritable res = new FloatWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
float sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
res.set(sum);
LogManager lgmngr = LogManager.getLogManager();
// lgmngr now contains a reference to the log manager.
Logger log = lgmngr.getLogger(Logger.GLOBAL_LOGGER_NAME);
// Getting the global application level logger
// from the Java Log Manager
log.log(Level.INFO, "LOL_PLS_WORK",res.toString());
context.write(key,res);
}
}
Main:
Configuration conf = new Configuration();
Job job = Job.getInstance(conf,"weather example");
job.setJarByClass(MyMaxMin.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MaxMinTemperatureMapper.class);
job.setReducerClass(MaxMinTemperatureReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path OutputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
OutputPath.getFileSystem(conf).delete(OutputPath, true);
System.exit(job.waitForCompletion(true) ? 0 : 1);

As per your mapper code:
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException {
String line = Value.toString();
if(line.length() == 0 && !line.contains("MAX")) {
line.length() == 0 you are discarding any input that isn't blank. You want line.length() != 0.

Related

Map reduce job executes but does not produce output

Looking for some help please.
The mapreduce job executes but no output is produced. It is a simple program to count the total number of words in a file. I began very simple to ensure that it works with a txt file which has one row with the following content:
tiny country second largest country second tiny food exporter second
second second
Unfortunately it does not, any suggestion about where to look next would be appreciated. I have cut and pasted the last bit of the output log.
File System Counters
FILE: Number of bytes read=890
FILE: Number of bytes written=947710
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=87
Map output materialized bytes=95
Input split bytes=198
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=95
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=7
Total committed heap usage (bytes)=468713472
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=82
File Output Format Counters
Bytes Written=97
Process finished with exit code 0
public class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] datas = line.split("\t");
for(String data: datas) {
Text outputKey = new Text(data);
IntWritable outputValue = new IntWritable();
context.write(outputKey, outputValue);
}
}
}
public class Reduce extends Reducer<Text, IntWritable, Text,
IntWritable> {
#Override
public void reduce(final Text outputKey,
final Iterable<IntWritable> values,
final Context context)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
context.write(outputKey, new IntWritable(sum));
}
}
public class Main extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf());
job.setJobName("WordCount");
job.setJarByClass(Main.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
Path inputFilePath = new Path("/Users/francesco/input/input.txt");
Path outputFilePath = new Path("/Users/francesco/output/first");
FileInputFormat.addInputPath(job, inputFilePath);
FileOutputFormat.setOutputPath(job, outputFilePath);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new Main(), args);
System.exit(exitCode);
}
}

You don't set any IntWritable value to emit in your mapper:
IntWritable outputValue = new IntWritable();
Need to replace by:
IntWritable outputValue = new IntWritable(1);

WordCount job in Cloudera is successful but output of reducer is the same as output of mapper

This program is written in Cloudera. This is the driver class I have created.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount2
{
public static void main(String[] args) throws Exception
{
if(args.length < 2)
{
System.out.println("Enter input and output path correctly ");
System.exit(-1);//exit if error occurs
}
Configuration conf = new Configuration();
#SuppressWarnings("deprecation")
Job job = new Job(conf,"WordCount2");
//Define MapReduce job
//
//job.setJobName("WordCount2");// job name created
job.setJarByClass(WordCount2.class); //Jar file will be created
//Set input/ouptput paths
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//Set input/output Format
job.setInputFormatClass(TextInputFormat.class);// input format is of TextInput Type
job.setOutputFormatClass(TextOutputFormat.class); // output format is of TextOutputType
//set Mapper and Reducer class
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
//Set output key-value types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//submit job
System.exit(job.waitForCompletion(true)?0:1);// If job is completed exit successfully, else throw error
}
}
Below is the code for Mapper class.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
#Override
public void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens())
{
String word= tokenizer.nextToken();
context.write(new Text(word), new IntWritable(1));
}
}
}
//----------Reducer Class-----------
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordReducer extends Reducer <Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key,Iterator<IntWritable> values,Context context)
throws IOException, InterruptedException
{
int sum = 0;
while(values.hasNext())
{
sum += values.next().get();
}
context.write(key, new IntWritable(sum));
}
}
Below is command line logs
[cloudera#quickstart workspace]$ hadoop jar wordcount2.jar WordCount2 /user/training/soni.txt /user/training/sonioutput2
18/04/23 07:17:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/04/23 07:17:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/04/23 07:17:25 INFO input.FileInputFormat: Total input paths to process : 1
18/04/23 07:17:25 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
18/04/23 07:17:26 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
18/04/23 07:17:26 INFO mapreduce.JobSubmitter: number of splits:1
18/04/23 07:17:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523897572171_0005
18/04/23 07:17:27 INFO impl.YarnClientImpl: Submitted application application_1523897572171_0005
18/04/23 07:17:27 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1523897572171_0005/
18/04/23 07:17:27 INFO mapreduce.Job: Running job: job_1523897572171_0005
18/04/23 07:17:45 INFO mapreduce.Job: Job job_1523897572171_0005 running in uber mode : false
18/04/23 07:17:45 INFO mapreduce.Job: map 0% reduce 0%
18/04/23 07:18:01 INFO mapreduce.Job: map 100% reduce 0%
18/04/23 07:18:16 INFO mapreduce.Job: map 100% reduce 100%
18/04/23 07:18:17 INFO mapreduce.Job: Job job_1523897572171_0005 completed successfully
18/04/23 07:18:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=310
FILE: Number of bytes written=251053
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=250
HDFS: Number of bytes written=188
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=14346
Total time spent by all reduces in occupied slots (ms)=12546
Total time spent by all map tasks (ms)=14346
Total time spent by all reduce tasks (ms)=12546
Total vcore-milliseconds taken by all map tasks=14346
Total vcore-milliseconds taken by all reduce tasks=12546
Total megabyte-milliseconds taken by all map tasks=14690304
Total megabyte-milliseconds taken by all reduce tasks=12847104
Map-Reduce Framework
Map input records=7
Map output records=29
Map output bytes=246
Map output materialized bytes=310
Input split bytes=119
Combine input records=0
Combine output records=0
Reduce input groups=19
Reduce shuffle bytes=310
Reduce input records=29
Reduce output records=29
Spilled Records=58
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1095
CPU time spent (ms)=4680
Physical memory (bytes) snapshot=407855104
Virtual memory (bytes) snapshot=3016044544
Total committed heap usage (bytes)=354553856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=131
File Output Format Counters
Bytes Written=188
[cloudera#quickstart workspace]$
Below is Input Data present input file soni.txt:
Hi How are you
I am fine
What about you
What are you doing these days
How is your job going
How is your family
My family is great
Following Output is received in part-r-00000 file:
family 1
family 1
fine 1
going 1
great 1
is 1
is 1
is 1
job 1
these 1
you 1
you 1
you 1
your 1
your 1
But, I think this should not be the correct output. It should give exact count of words.

Your reduce method signature is wrong, thus it is never called. You need to override this one from Reducer class:
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context) throws IOException, InterruptedException;
It is an Iterable not an Iterator
Try this:
#Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}

Reducer is not being called in hadoop mapreduce job

I have two mapper classes which simply create key-value pairs my main logic is supposed to be in the reducer part.I am trying to compare data from two different text files.
My mapper class is
public static class Map extends
Mapper<LongWritable, Text, Text, Text> {
private String ky,vl="a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl = tokens[1].trim();
ky = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky),new Text(vl));
}
}
My second mapper is
public static class Map2 extends
Mapper<LongWritable, Text, Text, Text> {
private String ky2,vl2 = "a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl2 = tokens[1].trim();
ky2 = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky2),new Text(vl2));
}
}
Reducer class is
public static class Reduce extends
Reducer<Text, Text, Text, Text> {
private String rslt = "l";
public void reduce(Text key, Iterator<Text> values,Context context) throws IOException, InterruptedException {
int count = 0;
while(values.hasNext()){
count++;
}
rslt = Integer.toString(count);
if(count>1){
context.write(key,new Text(rslt));
}
}
}
And my main method is
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(CompareTwoFiles.class);
job.setJobName("Compare Two Files and Identify the Difference");
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]),
TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, new Path(args[1]),
TextInputFormat.class, Map2.class);
job.waitForCompletion(true);
output
File System Counters
FILE: Number of bytes read=361621
FILE: Number of bytes written=1501806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=552085
HDFS: Number of bytes written=150962
HDFS: Number of read operations=28
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=10783
Map output records=10783
Map output bytes=150962
Map output materialized bytes=172540
Input split bytes=507
Combine input records=0
Combine output records=0
Reduce input groups=7985
Reduce shuffle bytes=172540
Reduce input records=10783
Reduce output records=10783
Spilled Records=21566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=12
Total committed heap usage (bytes)=928514048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=150962

SequenceFile is not created in hadoop

I am writing a MapReduce job to test some calculations. I split my input into maps so that each map does part of the calculus, the result will be a list of (X,y) pairs which I want to flush into a SequenceFile.
The map part goes well but when the Reducer kicks in I get this error: Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://172.16.199.132:9000/user/hduser/FractalJob_1452257628594_410365359/out/reduce-out.
Another observation would be that this error appears only when I use more then map.
UPDATED Here is my Mapper and Reducer code.
public static class RasterMapper extends Mapper<IntWritable, IntWritable, IntWritable, IntWritable> {
private int imageS;
private static Complex mapConstant;
#Override
public void setup(Context context) throws IOException {
imageS = context.getConfiguration().getInt("image.size", -1);
mapConstant = new Complex(context.getConfiguration().getDouble("constant.re", -1),
context.getConfiguration().getDouble("constant.im", -1));
}
#Override
public void map(IntWritable begin, IntWritable end, Context context) throws IOException, InterruptedException {
for (int x = (int) begin.get(); x < end.get(); x++) {
for (int y = 0; y < imageS; y++) {
float hue = 0, brighness = 0;
int icolor = 0;
Complex z = new Complex(2.0 * (x - imageS / 2) / (imageS / 2),
1.33 * (y - imageS / 2) / (imageS / 2));
icolor = startCompute(generateZ(z), 0);
if (icolor != -1) {
brighness = 1f;
}
hue = (icolor % 256) / 255.0f;
Color color = Color.getHSBColor(hue, 1f, brighness);
try {
context.write(new IntWritable(x + y * imageS), new IntWritable(color.getRGB()));
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
private static Complex generateZ(Complex z) {
return (z.times(z)).plus(mapConstant);
}
private static int startCompute(Complex z, int color) {
if (z.abs() > 4) {
return color;
} else if (color >= 255) {
return -1;
} else {
color = color + 1;
return startCompute(generateZ(z), color);
}
}
}
public static class ImageReducer extends Reducer<IntWritable, IntWritable, WritableComparable<?>, Writable> {
private SequenceFile.Writer writer;
#Override
protected void cleanup(Context context) throws IOException, InterruptedException {
writer.close();
}
#Override
public void setup(Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
Path outDir = new Path(conf.get(FileOutputFormat.OUTDIR));
Path outFile = new Path(outDir, "pixels-out");
Option optPath = SequenceFile.Writer.file(outFile);
Option optKey = SequenceFile.Writer.keyClass(IntWritable.class);
Option optVal = SequenceFile.Writer.valueClass(IntWritable.class);
Option optCom = SequenceFile.Writer.compression(CompressionType.NONE);
try {
writer = SequenceFile.createWriter(conf, optCom, optKey, optPath, optVal);
} catch (Exception e) {
e.printStackTrace();
}
}
#Override
public void reduce (IntWritable key, Iterable<IntWritable> value, Context context) throws IOException, InterruptedException {
try{
writer.append(key, value.iterator().next());
} catch (Exception e) {
e.printStackTrace();
}
}
}
I hope you guys can help me out.
Thank you!
EDIT:
Job failed as tasks failed. failedMaps:1 failedReduces:0
Looking better at the logs I noticed I think that the issue come from the way I feed my data to the maps.I split my image size into several sequence files so that the maps can read it from there and compute the colors for the pixels in that area.
This is the way I create the files :
try {
int offset = 0;
// generate an input file for each map task
for (int i = 0; i < mapNr; ++i) {
final Path file = new Path(input, "part" + i);
final IntWritable begin = new IntWritable(offset);
final IntWritable end = new IntWritable(offset + imgSize / mapNr);
offset = (int) end.get();
Option optPath = SequenceFile.Writer.file(file);
Option optKey = SequenceFile.Writer.keyClass(IntWritable.class);
Option optVal = SequenceFile.Writer.valueClass(IntWritable.class);
Option optCom = SequenceFile.Writer.compression(CompressionType.NONE);
SequenceFile.Writer writer = SequenceFile.createWriter(conf, optCom, optKey, optPath, optVal);
try {
writer.append(begin, end);
} catch (Exception e) {
e.printStackTrace();
} finally {
writer.close();
}
System.out.println("Wrote input for Map #" + i);
}
Log file:
16/01/10 19:06:04 INFO client.RMProxy: Connecting to ResourceManager at /172.16.199.132:8032
16/01/10 19:06:07 INFO input.FileInputFormat: Total input paths to process : 4
16/01/10 19:06:07 INFO mapreduce.JobSubmitter: number of splits:4
16/01/10 19:06:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1452444283951_0007
16/01/10 19:06:08 INFO impl.YarnClientImpl: Submitted application application_1452444283951_0007
16/01/10 19:06:08 INFO mapreduce.Job: The url to track the job: http://172.16.199.132:8088/proxy/application_1452444283951_0007/
16/01/10 19:06:08 INFO mapreduce.Job: Running job: job_1452444283951_0007
16/01/10 19:06:19 INFO mapreduce.Job: Job job_1452444283951_0007 running in uber mode : false
16/01/10 19:06:20 INFO mapreduce.Job: map 0% reduce 0%
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000002_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000000_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_0, Status : FAILED
16/01/10 19:07:07 INFO mapreduce.Job: map 25% reduce 0%
16/01/10 19:07:08 INFO mapreduce.Job: map 50% reduce 0%
16/01/10 19:07:10 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_1, Status : FAILED
16/01/10 19:07:11 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_1, Status : FAILED
16/01/10 19:07:25 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_r_000000_0, Status : FAILED
16/01/10 19:07:32 INFO mapreduce.Job: map 100% reduce 0%
16/01/10 19:07:32 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_2, Status : FAILED
16/01/10 19:07:32 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_2, Status : FAILED
16/01/10 19:07:33 INFO mapreduce.Job: map 50% reduce 0%
16/01/10 19:07:43 INFO mapreduce.Job: map 75% reduce 0%
16/01/10 19:07:44 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_r_000000_1, Status : FAILED
16/01/10 19:07:50 INFO mapreduce.Job: map 100% reduce 100%
16/01/10 19:07:51 INFO mapreduce.Job: Job job_1452444283951_0007 failed with state FAILED due to: Task failed task_1452444283951_0007_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/01/10 19:07:51 INFO mapreduce.Job: Counters: 40
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=3048165
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=765
HDFS: Number of bytes written=0
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=9
Failed reduce tasks=2
Killed reduce tasks=1
Launched map tasks=12
Launched reduce tasks=3
Other local map tasks=8
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=239938
Total time spent by all reduces in occupied slots (ms)=34189
Total time spent by all map tasks (ms)=239938
Total time spent by all reduce tasks (ms)=34189
Total vcore-seconds taken by all map tasks=239938
Total vcore-seconds taken by all reduce tasks=34189
Total megabyte-seconds taken by all map tasks=245696512
Total megabyte-seconds taken by all reduce tasks=35009536
Map-Reduce Framework
Map input records=3
Map output records=270000
Map output bytes=2160000
Map output materialized bytes=2700018
Input split bytes=441
Combine input records=0
Spilled Records=270000
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=538
CPU time spent (ms)=5520
Physical memory (bytes) snapshot=643928064
Virtual memory (bytes) snapshot=2537975808
Total committed heap usage (bytes)=408760320
File Input Format Counters
Bytes Read=324
Constructing image...
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://172.16.199.132:9000/user/hduser/FractalJob_1452445557585_342741171/out/pixels-out
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
at FractalJob.generateFractal(FractalJob.j..
This is the configuration:
conf.setInt("image.size", imgSize);
conf.setDouble("constant.re", FractalJob.constant.re());
conf.setDouble("constant.im", FractalJob.constant.im());
Job job = Job.getInstance(conf);
job.setJobName(FractalJob.class.getSimpleName());
job.setJarByClass(FractalJob.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setMapperClass(RasterMapper.class);
job.setReducerClass(ImageReducer.class);
job.setNumReduceTasks(1);
job.setSpeculativeExecution(false);
final Path input = new Path(filePath, "in");
final Path output = new Path(filePath, "out");
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, output);

You don't need to worry about creating your own sequence files. MapReduce has an output format that does it automatically.
So, in your driver class you would use:
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
and then in the reducer you'd write:
context.write(key, values.iterator().next());
and delete all of the setup method.
As a kind of aside, it doesn't look like you need a reducer at all. If you're not doing any calculations in the reducer and you're not doing anything with grouping (which I presume you're not), then why not just delete it? job.setOutputFormatClass(SequenceFileOutputFormat.class) will write your mapper output to sequence files.
If you do only want one output file, set
job.setNumReduceTasks(1);
And provided your final data isn't > 1 block size, you'll get the output you want.
It's worth noting that you're currently only outputting one value per key - you should ensure that you want that, and include a loop in the reducer to iterate over the values if you don't.

Hadoop - a reducer is not being initiated

I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). This is the code.
Mapper, reducer and the main driver:
public class RPhase2 extends Configured implements Tool
{
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, IntWritable, RPhase2Value>
{
public void map(LongWritable key, Text value,
OutputCollector<IntWritable, RPhase2Value> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String[] parts = line.split(" +");
// key format <rid1>
IntWritable mapKey = new IntWritable(Integer.valueOf(parts[0]));
// value format <rid2, dist>
RPhase2Value np2v = new RPhase2Value(Integer.valueOf(parts[1]), Float.valueOf(parts[2]));
System.out.println("############### key: " + mapKey.toString() + " np2v: " + np2v.toString());
output.collect(mapKey, np2v);
}
}
public static class Reduce extends MapReduceBase
implements Reducer<IntWritable, RPhase2Value, NullWritable, Text>
{
int numberOfPartition;
int knn;
class Record {...}
class RecordComparator implements Comparator<Record> {...}
public void configure(JobConf job)
{
numberOfPartition = job.getInt("numberOfPartition", 2);
knn = job.getInt("knn", 3);
System.out.println("########## configuring!");
}
public void reduce(IntWritable key, Iterator<RPhase2Value> values,
OutputCollector<NullWritable, Text> output,
Reporter reporter) throws IOException
{
//initialize the pq
RecordComparator rc = new RecordComparator();
PriorityQueue<Record> pq = new PriorityQueue<Record>(knn + 1, rc);
System.out.println("Phase 2 is at reduce");
System.out.println("########## key: " + key.toString());
// For each record we have a reduce task
// value format <rid1, rid2, dist>
while (values.hasNext())
{
RPhase2Value np2v = values.next();
int id2 = np2v.getFirst().get();
float dist = np2v.getSecond().get();
Record record = new Record(id2, dist);
pq.add(record);
if (pq.size() > knn)
pq.poll();
}
while(pq.size() > 0)
{
output.collect(NullWritable.get(), new Text(key.toString() + " " + pq.poll().toString()));
//break; // only ouput the first record
}
} // reduce
} // Reducer
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), RPhase2.class);
conf.setJobName("RPhase2");
conf.setMapOutputKeyClass(IntWritable.class);
conf.setMapOutputValueClass(RPhase2Value.class);
conf.setOutputKeyClass(NullWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
int numberOfPartition = 0;
List<String> other_args = new ArrayList<String>();
for(int i = 0; i < args.length; ++i)
{
try {
if ("-m".equals(args[i])) {
//conf.setNumMapTasks(Integer.parseInt(args[++i]));
++i;
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else if ("-p".equals(args[i])) {
numberOfPartition = Integer.parseInt(args[++i]);
conf.setInt("numberOfPartition", numberOfPartition);
} else if ("-k".equals(args[i])) {
int knn = Integer.parseInt(args[++i]);
conf.setInt("knn", knn);
System.out.println(knn + "~ hi");
} else {
other_args.add(args[i]);
}
conf.setNumReduceTasks(numberOfPartition * numberOfPartition);
//conf.setNumReduceTasks(1);
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " + args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RPhase2(), args);
}
} // RPhase2
When I run this the mapper is successful but the job terminates suddenly, and the reducer never instantiated. Moreover, no errors are ever printed (even in the log files). I know that also because the print statements in the configuration of the Reducer never get printed. Output:
15/06/15 14:00:37 INFO mapred.LocalJobRunner: map task executor complete.
15/06/15 14:00:38 INFO mapreduce.Job: map 100% reduce 0%
15/06/15 14:00:38 INFO mapreduce.Job: Job job_local833125918_0001 completed successfully
15/06/15 14:00:38 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=12505456
FILE: Number of bytes written=14977422
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11408
HDFS: Number of bytes written=8724
HDFS: Number of read operations=216
HDFS: Number of large read operations=0
HDFS: Number of write operations=99
Map-Reduce Framework
Map input records=60
Map output records=60
Input split bytes=963
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=14
Total committed heap usage (bytes)=1717567488
File Input Format Counters
Bytes Read=2153
File Output Format Counters
Bytes Written=1645
What I have done so far:
I have been looking at similar questions, and I found the most frequent problem is not configuring the output formats when the output of the mapper and reducer are different which is done in the code above: conf.setMapOutputKeyClass(Class); conf.setMapOutputValueClass(Class);
In another post I found a suggestion to change reduce(..., Iterator <...>, ...) to (..., Iterable <...>, ...) which gave me trouble compiling. I could no longer use .getNext() and .next() methods as well as got this error:
error: Reduce is not abstract and does not override abstract method reduce(IntWritable,Iterator,OutputCollector,Reporter) in Reducer
If anyone has any hints or suggestions on what I can try to find what the issue is I would be very appreciative!
Just a note that I have posted a question about my problem before in here (Hadoop kNN join algorithm stuck at map 100% reduce 0%) but it did not get enough attention so I wanted to re-ask this from a different perspective. You could use this link for more details on my log files.

I have figured out the problem and it was something silly. If you notice in the code above, numberOfPartition is set to 0 before the arguments are read, and the number of reducers are set to numberOfPartition * numberOfPartition. I, as the user did not change the number of partitions parameter (mostly because I simply copy pasted the argument line from their provided README) so that's why the reducer never even started.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop Reducer Not Writing anything despite writing to context - java

Related

Map reduce job executes but does not produce output

WordCount job in Cloudera is successful but output of reducer is the same as output of mapper

Reducer is not being called in hadoop mapreduce job

SequenceFile is not created in hadoop

Hadoop - a reducer is not being initiated

Categories

Resources