Looking for some help please.
The mapreduce job executes but no output is produced. It is a simple program to count the total number of words in a file. I began very simple to ensure that it works with a txt file which has one row with the following content:
tiny country second largest country second tiny food exporter second
second second
Unfortunately it does not, any suggestion about where to look next would be appreciated. I have cut and pasted the last bit of the output log.
File System Counters
FILE: Number of bytes read=890
FILE: Number of bytes written=947710
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=87
Map output materialized bytes=95
Input split bytes=198
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=95
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=7
Total committed heap usage (bytes)=468713472
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=82
File Output Format Counters
Bytes Written=97
Process finished with exit code 0
public class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] datas = line.split("\t");
for(String data: datas) {
Text outputKey = new Text(data);
IntWritable outputValue = new IntWritable();
context.write(outputKey, outputValue);
}
}
}
public class Reduce extends Reducer<Text, IntWritable, Text,
IntWritable> {
#Override
public void reduce(final Text outputKey,
final Iterable<IntWritable> values,
final Context context)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
context.write(outputKey, new IntWritable(sum));
}
}
public class Main extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf());
job.setJobName("WordCount");
job.setJarByClass(Main.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
Path inputFilePath = new Path("/Users/francesco/input/input.txt");
Path outputFilePath = new Path("/Users/francesco/output/first");
FileInputFormat.addInputPath(job, inputFilePath);
FileOutputFormat.setOutputPath(job, outputFilePath);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new Main(), args);
System.exit(exitCode);
}
}
You don't set any IntWritable value to emit in your mapper:
IntWritable outputValue = new IntWritable();
Need to replace by:
IntWritable outputValue = new IntWritable(1);
Related
I run the exported jar as a mapreduce job hadoop and 0 bytes are being written the to output file.
LOGS
2022-10-22 21:38:19,004 INFO mapreduce.Job: map 100% reduce 100%
2022-10-22 21:38:19,012 INFO mapreduce.Job: Job job_1666492742770_0009 completed successfully
2022-10-22 21:38:19,159 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=1134025
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=446009085
HDFS: Number of bytes written=0
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=4
Launched reduce tasks=1
Rack-local map tasks=4
Total time spent by all maps in occupied slots (ms)=38622
Total time spent by all reduces in occupied slots (ms)=6317
Total time spent by all map tasks (ms)=38622
Total time spent by all reduce tasks (ms)=6317
Total vcore-milliseconds taken by all map tasks=38622
Total vcore-milliseconds taken by all reduce tasks=6317
Total megabyte-milliseconds taken by all map tasks=39548928
Total megabyte-milliseconds taken by all reduce tasks=6468608
Map-Reduce Framework
Map input records=3208607
Map output records=0
Map output bytes=0
Map output materialized bytes=24
Input split bytes=424
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=24
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=505
CPU time spent (ms)=9339
Physical memory (bytes) snapshot=2058481664
Virtual memory (bytes) snapshot=2935365632
Total committed heap usage (bytes)=1875378176
Peak Map Physical memory (bytes)=501469184
Peak Map Virtual memory (bytes)=643743744
Peak Reduce Physical memory (bytes)=206155776
Peak Reduce Virtual memory (bytes)=384512000
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=446008661
File Output Format Counters
Bytes Written=0
any help appreciated!
Map Function :
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException {
String line = Value.toString();
if(line.length() == 0 && !line.contains("MAX")) {
String date = line.substring(14,21);
float temp_Max;
float temp_Min;
try {
temp_Max = Float.parseFloat(line.substring(104,108).trim());
}catch(NumberFormatException e) {
temp_Max = Float.parseFloat(line.substring(104,107).trim());
}
try {
temp_Min = Float.parseFloat(line.substring(112,117).trim());
}catch(NumberFormatException e) {
temp_Min = Float.parseFloat(line.substring(112,116).trim());
}
if(temp_Max > 35.0) {
context.write(new Text("Hot Day" + date), new FloatWritable(temp_Max));
}
if(temp_Min < 10) {
context.write(new Text("Cold Day" + date), new FloatWritable(temp_Min));
}
}
}
Reducer Function:
public static class MaxMinTemperatureReducer extends Reducer<Text, Text, Text, FloatWritable> {
FloatWritable res = new FloatWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
float sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
res.set(sum);
LogManager lgmngr = LogManager.getLogManager();
// lgmngr now contains a reference to the log manager.
Logger log = lgmngr.getLogger(Logger.GLOBAL_LOGGER_NAME);
// Getting the global application level logger
// from the Java Log Manager
log.log(Level.INFO, "LOL_PLS_WORK",res.toString());
context.write(key,res);
}
}
Main:
Configuration conf = new Configuration();
Job job = Job.getInstance(conf,"weather example");
job.setJarByClass(MyMaxMin.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MaxMinTemperatureMapper.class);
job.setReducerClass(MaxMinTemperatureReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path OutputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
OutputPath.getFileSystem(conf).delete(OutputPath, true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
As per your mapper code:
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException {
String line = Value.toString();
if(line.length() == 0 && !line.contains("MAX")) {
line.length() == 0 you are discarding any input that isn't blank. You want line.length() != 0.
I have two mapper classes which simply create key-value pairs my main logic is supposed to be in the reducer part.I am trying to compare data from two different text files.
My mapper class is
public static class Map extends
Mapper<LongWritable, Text, Text, Text> {
private String ky,vl="a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl = tokens[1].trim();
ky = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky),new Text(vl));
}
}
My second mapper is
public static class Map2 extends
Mapper<LongWritable, Text, Text, Text> {
private String ky2,vl2 = "a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl2 = tokens[1].trim();
ky2 = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky2),new Text(vl2));
}
}
Reducer class is
public static class Reduce extends
Reducer<Text, Text, Text, Text> {
private String rslt = "l";
public void reduce(Text key, Iterator<Text> values,Context context) throws IOException, InterruptedException {
int count = 0;
while(values.hasNext()){
count++;
}
rslt = Integer.toString(count);
if(count>1){
context.write(key,new Text(rslt));
}
}
}
And my main method is
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(CompareTwoFiles.class);
job.setJobName("Compare Two Files and Identify the Difference");
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]),
TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, new Path(args[1]),
TextInputFormat.class, Map2.class);
job.waitForCompletion(true);
output
File System Counters
FILE: Number of bytes read=361621
FILE: Number of bytes written=1501806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=552085
HDFS: Number of bytes written=150962
HDFS: Number of read operations=28
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=10783
Map output records=10783
Map output bytes=150962
Map output materialized bytes=172540
Input split bytes=507
Combine input records=0
Combine output records=0
Reduce input groups=7985
Reduce shuffle bytes=172540
Reduce input records=10783
Reduce output records=10783
Spilled Records=21566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=12
Total committed heap usage (bytes)=928514048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=150962
I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). This is the code.
Mapper, reducer and the main driver:
public class RPhase2 extends Configured implements Tool
{
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, IntWritable, RPhase2Value>
{
public void map(LongWritable key, Text value,
OutputCollector<IntWritable, RPhase2Value> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String[] parts = line.split(" +");
// key format <rid1>
IntWritable mapKey = new IntWritable(Integer.valueOf(parts[0]));
// value format <rid2, dist>
RPhase2Value np2v = new RPhase2Value(Integer.valueOf(parts[1]), Float.valueOf(parts[2]));
System.out.println("############### key: " + mapKey.toString() + " np2v: " + np2v.toString());
output.collect(mapKey, np2v);
}
}
public static class Reduce extends MapReduceBase
implements Reducer<IntWritable, RPhase2Value, NullWritable, Text>
{
int numberOfPartition;
int knn;
class Record {...}
class RecordComparator implements Comparator<Record> {...}
public void configure(JobConf job)
{
numberOfPartition = job.getInt("numberOfPartition", 2);
knn = job.getInt("knn", 3);
System.out.println("########## configuring!");
}
public void reduce(IntWritable key, Iterator<RPhase2Value> values,
OutputCollector<NullWritable, Text> output,
Reporter reporter) throws IOException
{
//initialize the pq
RecordComparator rc = new RecordComparator();
PriorityQueue<Record> pq = new PriorityQueue<Record>(knn + 1, rc);
System.out.println("Phase 2 is at reduce");
System.out.println("########## key: " + key.toString());
// For each record we have a reduce task
// value format <rid1, rid2, dist>
while (values.hasNext())
{
RPhase2Value np2v = values.next();
int id2 = np2v.getFirst().get();
float dist = np2v.getSecond().get();
Record record = new Record(id2, dist);
pq.add(record);
if (pq.size() > knn)
pq.poll();
}
while(pq.size() > 0)
{
output.collect(NullWritable.get(), new Text(key.toString() + " " + pq.poll().toString()));
//break; // only ouput the first record
}
} // reduce
} // Reducer
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), RPhase2.class);
conf.setJobName("RPhase2");
conf.setMapOutputKeyClass(IntWritable.class);
conf.setMapOutputValueClass(RPhase2Value.class);
conf.setOutputKeyClass(NullWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
int numberOfPartition = 0;
List<String> other_args = new ArrayList<String>();
for(int i = 0; i < args.length; ++i)
{
try {
if ("-m".equals(args[i])) {
//conf.setNumMapTasks(Integer.parseInt(args[++i]));
++i;
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else if ("-p".equals(args[i])) {
numberOfPartition = Integer.parseInt(args[++i]);
conf.setInt("numberOfPartition", numberOfPartition);
} else if ("-k".equals(args[i])) {
int knn = Integer.parseInt(args[++i]);
conf.setInt("knn", knn);
System.out.println(knn + "~ hi");
} else {
other_args.add(args[i]);
}
conf.setNumReduceTasks(numberOfPartition * numberOfPartition);
//conf.setNumReduceTasks(1);
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " + args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RPhase2(), args);
}
} // RPhase2
When I run this the mapper is successful but the job terminates suddenly, and the reducer never instantiated. Moreover, no errors are ever printed (even in the log files). I know that also because the print statements in the configuration of the Reducer never get printed. Output:
15/06/15 14:00:37 INFO mapred.LocalJobRunner: map task executor complete.
15/06/15 14:00:38 INFO mapreduce.Job: map 100% reduce 0%
15/06/15 14:00:38 INFO mapreduce.Job: Job job_local833125918_0001 completed successfully
15/06/15 14:00:38 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=12505456
FILE: Number of bytes written=14977422
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11408
HDFS: Number of bytes written=8724
HDFS: Number of read operations=216
HDFS: Number of large read operations=0
HDFS: Number of write operations=99
Map-Reduce Framework
Map input records=60
Map output records=60
Input split bytes=963
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=14
Total committed heap usage (bytes)=1717567488
File Input Format Counters
Bytes Read=2153
File Output Format Counters
Bytes Written=1645
What I have done so far:
I have been looking at similar questions, and I found the most frequent problem is not configuring the output formats when the output of the mapper and reducer are different which is done in the code above: conf.setMapOutputKeyClass(Class); conf.setMapOutputValueClass(Class);
In another post I found a suggestion to change reduce(..., Iterator <...>, ...) to (..., Iterable <...>, ...) which gave me trouble compiling. I could no longer use .getNext() and .next() methods as well as got this error:
error: Reduce is not abstract and does not override abstract method reduce(IntWritable,Iterator,OutputCollector,Reporter) in Reducer
If anyone has any hints or suggestions on what I can try to find what the issue is I would be very appreciative!
Just a note that I have posted a question about my problem before in here (Hadoop kNN join algorithm stuck at map 100% reduce 0%) but it did not get enough attention so I wanted to re-ask this from a different perspective. You could use this link for more details on my log files.
I have figured out the problem and it was something silly. If you notice in the code above, numberOfPartition is set to 0 before the arguments are read, and the number of reducers are set to numberOfPartition * numberOfPartition. I, as the user did not change the number of partitions parameter (mostly because I simply copy pasted the argument line from their provided README) so that's why the reducer never even started.
I am having a strange problem with a Hadoop Map/Reduce job. The job submits correctly, runs, but produces incorrect/strange results. It seems as if the mapper and reducer are not run at all. The input file is transformed from:
12
16
132
654
132
12
to
0 12
4 16
8 132
13 654
18 132
23 12
I assume the first column are the generated keys for pairs before the mapper, but neither mapper nor reducer seem to run. The job ran fine when I used the old API.
Source for the job is provided below. I am using Hortonworks as the platform.
public class HadoopAnalyzer
{
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception
{
JobConf conf = new JobConf(HadoopAnalyzer.class);
conf.setJobName("wordcount");
conf.set("mapred.job.tracker", "192.168.229.128:50300");
conf.set("fs.default.name", "hdfs://192.168.229.128:8020");
conf.set("fs.defaultFS", "hdfs://192.168.229.128:8020");
conf.set("hbase.master", "192.168.229.128:60000");
conf.set("hbase.zookeeper.quorum", "192.168.229.128");
conf.set("hbase.zookeeper.property.clientPort", "2181");
System.out.println("Executing job.");
Job job = new Job(conf, "job");
job.setInputFormatClass(InputFormat.class);
job.setOutputFormatClass(OutputFormat.class);
job.setJarByClass(HadoopAnalyzer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
TextInputFormat.addInputPath(job, new Path("/user/usr/in"));
TextOutputFormat.setOutputPath(job, new Path("/user/usr/out"));
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.waitForCompletion(true);
System.out.println("Done.");
}
}
Maybe I am missing something obvious, but can anyone shed some light on what might be going wrong here?
The output is as expected because you used the following,
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
Which should have been --
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
You extended the Mapper and Reducer classes with Map and Reduce but didn't use them in your job.
I'm new on hadoop.
I have a MapReduce job which is supposed to get an input from Hdfs and write the output of the reducer to Hbase. I haven't found any good example.
Here's the code, the error runing this example is Type mismatch in map, expected ImmutableBytesWritable recieved IntWritable.
Mapper Class
public static class AddValueMapper extends Mapper < LongWritable,
Text, ImmutableBytesWritable, IntWritable > {
/* input <key, line number : value, full line>
* output <key, log key : value >*/
public void map(LongWritable key, Text value,
Context context)throws IOException,
InterruptedException {
byte[] key;
int value, pos = 0;
String line = value.toString();
String p1 , p2 = null;
pos = line.indexOf("=");
//Key part
p1 = line.substring(0, pos);
p1 = p1.trim();
key = Bytes.toBytes(p1);
//Value part
p2 = line.substring(pos +1);
p2 = p2.trim();
value = Integer.parseInt(p2);
context.write(new ImmutableBytesWritable(key),new IntWritable(value));
}
}
Reducer Class
public static class AddValuesReducer extends TableReducer<
ImmutableBytesWritable, IntWritable, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
long total =0;
// Loop values
while(values.iterator().hasNext()){
total += values.iterator().next().get();
}
// Put to HBase
Put put = new Put(key.get());
put.add(Bytes.toBytes("data"), Bytes.toBytes("total"),
Bytes.toBytes(total));
Bytes.toInt(key.get()), total));
context.write(key, put);
}
}
I had a similar job only with HDFS and works fine.
Edited 18-06-2013. The college project finished successfully two years ago. For job configuration (driver part) check correct answer.
Here is the code which will solve your problem
Driver
HBaseConfiguration conf = HBaseConfiguration.create();
Job job = new Job(conf,"JOB_NAME");
job.setJarByClass(yourclass.class);
job.setMapperClass(yourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Intwritable.class);
FileInputFormat.setInputPaths(job, new Path(inputPath));
TableMapReduceUtil.initTableReducerJob(TABLE,
yourReducer.class, job);
job.setReducerClass(yourReducer.class);
job.waitForCompletion(true);
Mapper&Reducer
class yourMapper extends Mapper<LongWritable, Text, Text,IntWritable> {
//#overide map()
}
class yourReducer
extends
TableReducer<Text, IntWritable,
ImmutableBytesWritable>
{
//#override reduce()
}
Not sure why the HDFS version works: normaly you have to set the input format for the job, and FileInputFormat is an abstract class. Perhaps you left some lines out? such as
job.setInputFormatClass(TextInputFormat.class);
The best and fastest way to BulkLoad data in HBase is use of HFileOutputFormat and CompliteBulkLoad utility.
You will find a sample code here:
Hope this will be useful :)
public void map(LongWritable key, Text value,
Context context)throws IOException,
InterruptedException {
change this to immutableBytesWritable, intwritable.
I am not sure..hope it works