Implementing join in Hadoop.Javanullpointerexception encountered - java

I have 3 files say File1, File2 and File3. File1 and File2 are in the same directory of HDFS and File3 is in a different directory. The format of the files is as below:
File 1:
V1 V2 V3 V4 V5 V6 V7 V8 V9 (V1-V9 are attributes)
V2+V3 is the key combination
File 2:
V1 V2 V3 V4 V5 V6 V7 V8 V9 (same format as File1)
V2+V3 is the key combination
File 3:
T1 T2 T3 T4 (T1-T4 variables)
Here T2+T3 is the common key as compared to V2+V3 in Files 1 and 2.
Required output after join:
Case 1:
Matching records (I need to get V9 and T4 based on common key)
(Is there any process to get (V2+V3) and V9 also?)
Case 2:
Non matching records
Now through MapReduce, I want to read files from both of these directories separately using two mappers and get the output by a single reducer.
Please find the code below (run using test sample file with few records) and let me know where is the possible error.
13/12/03 08:23:04 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/12/03 08:23:04 WARN snappy.LoadSnappy: Snappy native library not loaded
13/12/03 08:23:04 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/03 08:23:05 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/03 08:23:05 INFO mapred.JobClient: Running job: job_201311220353_0068
13/12/03 08:23:06 INFO mapred.JobClient: map 0% reduce 0%
13/12/03 08:23:27 INFO mapred.JobClient: map 25% reduce 0%
13/12/03 08:23:28 INFO mapred.JobClient: map 50% reduce 0%
13/12/03 08:25:58 INFO mapred.JobClient: map 50% reduce 16%
13/12/03 08:26:00 INFO mapred.JobClient: map 100% reduce 16%
13/12/03 08:26:16 INFO mapred.JobClient: map 100% reduce 33%
13/12/03 08:26:23 INFO mapred.JobClient: Task Id : attempt_201311220353_0068_r_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:388)
at org.apache.hadoop.io.Text.set(Text.java:178)
at org.apache.hadoop.io.Text.<init>(Text.java:81)
at StockAnalyzer$StockAnalysisReducer.reduce(StockAnalyzer.java:82)
at StockAnalyzer$StockAnalysisReducer.reduce(StockAnalyzer.java:1)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/12/03 08:28:32 INFO mapred.JobClient: map 100% reduce 33%
13/12/03 08:28:36 INFO mapred.JobClient: map 100% reduce 0%
13/12/03 08:28:39 INFO mapred.JobClient: Job complete: job_201311220353_0068
13/12/03 08:28:39 INFO mapred.JobClient: Counters: 24
13/12/03 08:28:39 INFO mapred.JobClient: Job Counters
13/12/03 08:28:39 INFO mapred.JobClient: Launched reduce tasks=4
13/12/03 08:28:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=342406
13/12/03 08:28:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/03 08:28:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/12/03 08:28:39 INFO mapred.JobClient: Launched map tasks=4
13/12/03 08:28:39 INFO mapred.JobClient: Data-local map tasks=4
13/12/03 08:28:39 INFO mapred.JobClient: Failed reduce tasks=1
13/12/03 08:28:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=307424
13/12/03 08:28:39 INFO mapred.JobClient: File Input Format Counters
13/12/03 08:28:39 INFO mapred.JobClient: Bytes Read=0
13/12/03 08:28:39 INFO mapred.JobClient: FileSystemCounters
13/12/03 08:28:39 INFO mapred.JobClient: HDFS_BYTES_READ=3227
13/12/03 08:28:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=228636
13/12/03 08:28:39 INFO mapred.JobClient: Map-Reduce Framework
13/12/03 08:28:39 INFO mapred.JobClient: Map output materialized bytes=940
13/12/03 08:28:39 INFO mapred.JobClient: Map input records=36
13/12/03 08:28:39 INFO mapred.JobClient: Spilled Records=36
13/12/03 08:28:39 INFO mapred.JobClient: Map output bytes=844
13/12/03 08:28:39 INFO mapred.JobClient: Total committed heap usage (bytes)=571555840
13/12/03 08:28:39 INFO mapred.JobClient: CPU time spent (ms)=10550
13/12/03 08:28:39 INFO mapred.JobClient: Map input bytes=1471
13/12/03 08:28:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=1020
13/12/03 08:28:39 INFO mapred.JobClient: Combine input records=0
13/12/03 08:28:39 INFO mapred.JobClient: Combine output records=0
13/12/03 08:28:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=690450432
13/12/03 08:28:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2663706624
13/12/03 08:28:39 INFO mapred.JobClient: Map output records=36
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.lib.MultipleInputs;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class StockAnalyzer extends Configured implements Tool
{
public static class StockAnalysisMapper1 extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
{
private String Commonkey, Stockadj, FileTag = "f1~";
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output, Reporter reporter)
throws IOException
{
String values[] = value.toString().split(",");
Commonkey = values[1].trim()+values[2].trim();
Stockadj = values[8].trim();
output.collect(new Text(Commonkey), new Text(FileTag + Stockadj));
}
}
public static class StockAnalysisMapper2 extends MapReduceBase implements Mapper <LongWritable, Text, Text, Text>
{
private String Commonkey, Dividend, FileTag = "f2~";
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String values[] = value.toString().split(",");
Commonkey = values[1].trim()+values[2].trim();
Dividend = values[3].trim();
output.collect(new Text(Commonkey), new Text(FileTag + Dividend));
}
}
public static class StockAnalysisReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{
private String Stockadj=null;
private String Dividend=null;
public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter reporter)
throws IOException
{
while (values.hasNext())
{
String currValue = values.next().toString();
String splitVals[] = currValue.split("~");
if (splitVals[0].equals("f1"))
{
Stockadj = splitVals[1] != null ? splitVals[1].trim(): "Stockadj";
}
else if (splitVals[0].equals("f2"))
{
Dividend = splitVals[2] != null ? splitVals[2].trim(): "Dividend";
}
output.collect(new Text(Stockadj), new Text(Dividend));
}
}
}
public int run(String [] arguments) throws Exception
{
JobConf conf = new JobConf(StockAnalyzer.class);
conf.setJobName("Stock Analysis");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(StockAnalysisMapper1.class);
conf.setMapperClass(StockAnalysisMapper2.class);
conf.setReducerClass(StockAnalysisReducer.class);
Path Mapper1InputPath = new Path(arguments[0]);
Path Mapper2InputPath = new Path(arguments[1]);
Path OutputPath = new Path(arguments[2]);
MultipleInputs.addInputPath(conf,Mapper1InputPath,
TextInputFormat.class,StockAnalysisMapper1.class);
MultipleInputs.addInputPath(conf, Mapper2InputPath,
TextInputFormat.class,StockAnalysisMapper2.class);
FileOutputFormat.setOutputPath(conf, OutputPath);
JobClient.runJob(conf);
return 0;
}
public static void main(String [] arguments) throws Exception
{
int res = ToolRunner.run(new Configuration(),new StockAnalyzer(), arguments);
System.exit(res);
}
}

Related

Hadoop Reducer Not Writing anything despite writing to context

I run the exported jar as a mapreduce job hadoop and 0 bytes are being written the to output file.
LOGS
2022-10-22 21:38:19,004 INFO mapreduce.Job: map 100% reduce 100%
2022-10-22 21:38:19,012 INFO mapreduce.Job: Job job_1666492742770_0009 completed successfully
2022-10-22 21:38:19,159 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=1134025
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=446009085
HDFS: Number of bytes written=0
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=4
Launched reduce tasks=1
Rack-local map tasks=4
Total time spent by all maps in occupied slots (ms)=38622
Total time spent by all reduces in occupied slots (ms)=6317
Total time spent by all map tasks (ms)=38622
Total time spent by all reduce tasks (ms)=6317
Total vcore-milliseconds taken by all map tasks=38622
Total vcore-milliseconds taken by all reduce tasks=6317
Total megabyte-milliseconds taken by all map tasks=39548928
Total megabyte-milliseconds taken by all reduce tasks=6468608
Map-Reduce Framework
Map input records=3208607
Map output records=0
Map output bytes=0
Map output materialized bytes=24
Input split bytes=424
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=24
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=505
CPU time spent (ms)=9339
Physical memory (bytes) snapshot=2058481664
Virtual memory (bytes) snapshot=2935365632
Total committed heap usage (bytes)=1875378176
Peak Map Physical memory (bytes)=501469184
Peak Map Virtual memory (bytes)=643743744
Peak Reduce Physical memory (bytes)=206155776
Peak Reduce Virtual memory (bytes)=384512000
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=446008661
File Output Format Counters
Bytes Written=0
any help appreciated!
Map Function :
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException {
String line = Value.toString();
if(line.length() == 0 && !line.contains("MAX")) {
String date = line.substring(14,21);
float temp_Max;
float temp_Min;
try {
temp_Max = Float.parseFloat(line.substring(104,108).trim());
}catch(NumberFormatException e) {
temp_Max = Float.parseFloat(line.substring(104,107).trim());
}
try {
temp_Min = Float.parseFloat(line.substring(112,117).trim());
}catch(NumberFormatException e) {
temp_Min = Float.parseFloat(line.substring(112,116).trim());
}
if(temp_Max > 35.0) {
context.write(new Text("Hot Day" + date), new FloatWritable(temp_Max));
}
if(temp_Min < 10) {
context.write(new Text("Cold Day" + date), new FloatWritable(temp_Min));
}
}
}
Reducer Function:
public static class MaxMinTemperatureReducer extends Reducer<Text, Text, Text, FloatWritable> {
FloatWritable res = new FloatWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
float sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
res.set(sum);
LogManager lgmngr = LogManager.getLogManager();
// lgmngr now contains a reference to the log manager.
Logger log = lgmngr.getLogger(Logger.GLOBAL_LOGGER_NAME);
// Getting the global application level logger
// from the Java Log Manager
log.log(Level.INFO, "LOL_PLS_WORK",res.toString());
context.write(key,res);
}
}
Main:
Configuration conf = new Configuration();
Job job = Job.getInstance(conf,"weather example");
job.setJarByClass(MyMaxMin.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MaxMinTemperatureMapper.class);
job.setReducerClass(MaxMinTemperatureReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path OutputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
OutputPath.getFileSystem(conf).delete(OutputPath, true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
As per your mapper code:
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException {
String line = Value.toString();
if(line.length() == 0 && !line.contains("MAX")) {
line.length() == 0 you are discarding any input that isn't blank. You want line.length() != 0.

WordCount job in Cloudera is successful but output of reducer is the same as output of mapper

This program is written in Cloudera. This is the driver class I have created.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount2
{
public static void main(String[] args) throws Exception
{
if(args.length < 2)
{
System.out.println("Enter input and output path correctly ");
System.exit(-1);//exit if error occurs
}
Configuration conf = new Configuration();
#SuppressWarnings("deprecation")
Job job = new Job(conf,"WordCount2");
//Define MapReduce job
//
//job.setJobName("WordCount2");// job name created
job.setJarByClass(WordCount2.class); //Jar file will be created
//Set input/ouptput paths
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//Set input/output Format
job.setInputFormatClass(TextInputFormat.class);// input format is of TextInput Type
job.setOutputFormatClass(TextOutputFormat.class); // output format is of TextOutputType
//set Mapper and Reducer class
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
//Set output key-value types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//submit job
System.exit(job.waitForCompletion(true)?0:1);// If job is completed exit successfully, else throw error
}
}
Below is the code for Mapper class.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
#Override
public void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens())
{
String word= tokenizer.nextToken();
context.write(new Text(word), new IntWritable(1));
}
}
}
//----------Reducer Class-----------
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordReducer extends Reducer <Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key,Iterator<IntWritable> values,Context context)
throws IOException, InterruptedException
{
int sum = 0;
while(values.hasNext())
{
sum += values.next().get();
}
context.write(key, new IntWritable(sum));
}
}
Below is command line logs
[cloudera#quickstart workspace]$ hadoop jar wordcount2.jar WordCount2 /user/training/soni.txt /user/training/sonioutput2
18/04/23 07:17:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/04/23 07:17:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/04/23 07:17:25 INFO input.FileInputFormat: Total input paths to process : 1
18/04/23 07:17:25 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
18/04/23 07:17:26 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
18/04/23 07:17:26 INFO mapreduce.JobSubmitter: number of splits:1
18/04/23 07:17:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523897572171_0005
18/04/23 07:17:27 INFO impl.YarnClientImpl: Submitted application application_1523897572171_0005
18/04/23 07:17:27 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1523897572171_0005/
18/04/23 07:17:27 INFO mapreduce.Job: Running job: job_1523897572171_0005
18/04/23 07:17:45 INFO mapreduce.Job: Job job_1523897572171_0005 running in uber mode : false
18/04/23 07:17:45 INFO mapreduce.Job: map 0% reduce 0%
18/04/23 07:18:01 INFO mapreduce.Job: map 100% reduce 0%
18/04/23 07:18:16 INFO mapreduce.Job: map 100% reduce 100%
18/04/23 07:18:17 INFO mapreduce.Job: Job job_1523897572171_0005 completed successfully
18/04/23 07:18:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=310
FILE: Number of bytes written=251053
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=250
HDFS: Number of bytes written=188
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=14346
Total time spent by all reduces in occupied slots (ms)=12546
Total time spent by all map tasks (ms)=14346
Total time spent by all reduce tasks (ms)=12546
Total vcore-milliseconds taken by all map tasks=14346
Total vcore-milliseconds taken by all reduce tasks=12546
Total megabyte-milliseconds taken by all map tasks=14690304
Total megabyte-milliseconds taken by all reduce tasks=12847104
Map-Reduce Framework
Map input records=7
Map output records=29
Map output bytes=246
Map output materialized bytes=310
Input split bytes=119
Combine input records=0
Combine output records=0
Reduce input groups=19
Reduce shuffle bytes=310
Reduce input records=29
Reduce output records=29
Spilled Records=58
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1095
CPU time spent (ms)=4680
Physical memory (bytes) snapshot=407855104
Virtual memory (bytes) snapshot=3016044544
Total committed heap usage (bytes)=354553856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=131
File Output Format Counters
Bytes Written=188
[cloudera#quickstart workspace]$
Below is Input Data present input file soni.txt:
Hi How are you
I am fine
What about you
What are you doing these days
How is your job going
How is your family
My family is great
Following Output is received in part-r-00000 file:
family 1
family 1
fine 1
going 1
great 1
is 1
is 1
is 1
job 1
these 1
you 1
you 1
you 1
your 1
your 1
But, I think this should not be the correct output. It should give exact count of words.
Your reduce method signature is wrong, thus it is never called. You need to override this one from Reducer class:
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context) throws IOException, InterruptedException;
It is an Iterable not an Iterator
Try this:
#Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}

Hadoop Reduce input records=0

I am new to Hadoop, my map-reduce code works but it does not produce any output. Here it is the info of map-reduce:
16/09/20 13:11:40 INFO mapred.JobClient: Job complete: job_201609081210_0078
16/09/20 13:11:40 INFO mapred.JobClient: Counters: 28
16/09/20 13:11:40 INFO mapred.JobClient: Map-Reduce Framework
16/09/20 13:11:40 INFO mapred.JobClient: Spilled Records=0
16/09/20 13:11:40 INFO mapred.JobClient: Map output materialized bytes=1362
16/09/20 13:11:40 INFO mapred.JobClient: Reduce input records=0
16/09/20 13:11:40 INFO mapred.JobClient: Virtual memory (bytes) snapshot=466248720384
16/09/20 13:11:40 INFO mapred.JobClient: Map input records=852032443
16/09/20 13:11:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=29964
16/09/20 13:11:40 INFO mapred.JobClient: Map output bytes=0
16/09/20 13:11:40 INFO mapred.JobClient: Reduce shuffle bytes=1362
16/09/20 13:11:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=57472311296
16/09/20 13:11:40 INFO mapred.JobClient: Reduce input groups=0
16/09/20 13:11:40 INFO mapred.JobClient: Combine output records=0
16/09/20 13:11:40 INFO mapred.JobClient: Reduce output records=0
16/09/20 13:11:40 INFO mapred.JobClient: Map output records=0
16/09/20 13:11:40 INFO mapred.JobClient: Combine input records=0
16/09/20 13:11:40 INFO mapred.JobClient: CPU time spent (ms)=2375210
16/09/20 13:11:40 INFO mapred.JobClient: Total committed heap usage (bytes)=47554494464
16/09/20 13:11:40 INFO mapred.JobClient: File Input Format Counters
16/09/20 13:11:40 INFO mapred.JobClient: Bytes Read=15163097088
16/09/20 13:11:40 INFO mapred.JobClient: FileSystemCounters
16/09/20 13:11:40 INFO mapred.JobClient: HDFS_BYTES_READ=15163127052
16/09/20 13:11:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=13170190
16/09/20 13:11:40 INFO mapred.JobClient: FILE_BYTES_READ=6
16/09/20 13:11:40 INFO mapred.JobClient: Job Counters
16/09/20 13:11:40 INFO mapred.JobClient: Launched map tasks=227
16/09/20 13:11:40 INFO mapred.JobClient: Launched reduce tasks=1
16/09/20 13:11:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=759045
16/09/20 13:11:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/09/20 13:11:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1613259
16/09/20 13:11:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/09/20 13:11:40 INFO mapred.JobClient: Data-local map tasks=227
16/09/20 13:11:40 INFO mapred.JobClient: File Output Format Counters
16/09/20 13:11:40 INFO mapred.JobClient: Bytes Written=0
Here it is the code for the code that launches the mapreduce job:
import java.io.File;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class mp{
public static void main(String[] args) throws Exception {
Job job1 = new Job();
job1.setJarByClass(mp.class);
FileInputFormat.addInputPath(job1, new Path(args[0]));
String oFolder = args[0] + "/output";
FileOutputFormat.setOutputPath(job1, new Path(oFolder));
job1.setMapperClass(TransMapper1.class);
job1.setReducerClass(TransReducer1.class);
job1.setMapOutputKeyClass(LongWritable.class);
job1.setMapOutputValueClass(DnaWritable.class);
job1.setOutputKeyClass(LongWritable.class);
job1.setOutputValueClass(Text.class);
}
}
And here it is the mapper class (TransMapper1):
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TransMapper1 extends Mapper<LongWritable, Text, LongWritable, DnaWritable> {
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
LongWritable bamWindow = new LongWritable(Long.parseLong(tokenizer.nextToken()));
LongWritable read = new LongWritable(Long.parseLong(tokenizer.nextToken()));
LongWritable refWindow = new LongWritable(Long.parseLong(tokenizer.nextToken()));
IntWritable chr = new IntWritable(Integer.parseInt(tokenizer.nextToken()));
DoubleWritable dist = new DoubleWritable(Double.parseDouble(tokenizer.nextToken()));
DnaWritable dnaW = new DnaWritable(bamWindow,read,refWindow,chr,dist);
context.write(bamWindow,dnaW);
}
}
And this is the Reducer class (TransReducer1):
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TransReducer1 extends Reducer<LongWritable, DnaWritable, LongWritable, Text> {
#Override
public void reduce(LongWritable key, Iterable<DnaWritable> values, Context context) throws IOException, InterruptedException {
ArrayList<DnaWritable> list = new ArrayList<DnaWritable>();
double minDist = Double.MAX_VALUE;
for (DnaWritable value : values) {
long bamWindow = value.getBamWindow().get();
long read = value.getRead().get();
long refWindow = value.getRefWindow().get();
int chr = value.getChr().get();
double dist = value.getDist().get();
if (dist > minDist)
continue;
else
if (dist < minDist)
list.clear();
list.add(new DnaWritable(bamWindow,read,refWindow,chr,dist));
minDist = Math.min(minDist, value.getDist().get());
}
for(int i = 0; i < list.size(); i++){
context.write(new LongWritable(list.get(i).getRead().get()),new Text(new DnaWritable(list.get(i).getBamWindow(),list.get(i).getRead(),list.get(i).getRefWindow(),list.get(i).getChr(),list.get(i).getDist()).toString()));
}
}
}
And this is the DnaWritable class (I didnot put import section to short it little bit):
public class DnaWritable implements Writable {
LongWritable bamWindow;
LongWritable read;
LongWritable refWindow;
IntWritable chr;
DoubleWritable dist;
public DnaWritable(LongWritable bamWindow, LongWritable read, LongWritable refWindow, IntWritable chr, DoubleWritable dist){
this.bamWindow = bamWindow;
this.read = read;
this.refWindow = refWindow;
this.chr = chr;
this.dist = dist;
}
public DnaWritable(long bamWindow, long read, long refWindow, int chr, double dist){
this.bamWindow = new LongWritable(bamWindow);
this.read = new LongWritable(read);
this.refWindow = new LongWritable(refWindow);
this.chr = new IntWritable(chr);
this.dist = new DoubleWritable(dist);
}
#Override
public void write(DataOutput dataOutput) throws IOException {
bamWindow.write(dataOutput);
read.write(dataOutput);
refWindow.write(dataOutput);
chr.write(dataOutput);
dist.write(dataOutput);
}
#Override
public void readFields(DataInput dataInput) throws IOException {
bamWindow.readFields(dataInput);
read.readFields(dataInput);
refWindow.readFields(dataInput);
chr.readFields(dataInput);
dist.readFields(dataInput);
}
}
Any help would be really appreciated.. Thank you
Can you change your DnaWritable class to and test the same.(handle NPE)
package com.hadoop.intellipaat;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
public class DnaWritable implements Writable {
private Long bamWindow;
private Long read;
private Long refWindow;
private Integer chr;
private Double dist;
public DnaWritable(Long bamWindow, Long read, Long refWindow, Integer chr, Double dist) {
super();
this.bamWindow = bamWindow;
this.read = read;
this.refWindow = refWindow;
this.chr = chr;
this.dist = dist;
}
#Override
public void write(DataOutput out) throws IOException {
out.writeLong(bamWindow);
out.writeLong(read);
out.writeLong(refWindow);
out.writeInt(chr);
out.writeDouble(dist);
}
#Override
public void readFields(DataInput in) throws IOException {
this.bamWindow = in.readLong();
this.read = in.readLong();
this.refWindow = in.readLong();
this.chr = in.readInt();
this.dist = in.readDouble();
}
}
I don't think you have submitted your job at all to the cluster. there is no job1.submit() or job1.waitForCompletion(true) in your main class.
////submit the job to hadoop
if (!job1.waitForCompletion(true))
return;
also there is a correction required in your main method.
Job job1 = new Job(); //new Job() constructor is deprecated now.
below is the correct one to create a job object
Configuration conf = new Configuration();
Job job1 = Job.getInstance(conf, "Your Program name");
I think you have not properly implemented write(DataOutput out) and readFields(DataInput in) methods in your DnaWritable class.
Consider also implements ComparableWritable as follows, also add the no-args constructor.
public class DnaWritable implements Writable WritableComparable<DnaWritable> {
//Consider add a non-args constructor
public DnaWritable(){
}
//Add this static method as well
public static DnaWritable read(DataInput in) throws IOException {
DnaWritable dnaWritable = new DnaWritable();
dnaWritable.readFields(in);
return dnaWritable;
}
#Override
public int compareTo(DnaWritable dnaWritable) {
//Put your comparison logic there.
}
}
If that still failing, I the log4.properties so you can see if there is any underlying error you are not seeing.
src/main/resources
hadoop.root.logger=DEBUG, console
log4j.rootLogger=INFO, stdout
# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

SequenceFile is not created in hadoop

I am writing a MapReduce job to test some calculations. I split my input into maps so that each map does part of the calculus, the result will be a list of (X,y) pairs which I want to flush into a SequenceFile.
The map part goes well but when the Reducer kicks in I get this error: Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://172.16.199.132:9000/user/hduser/FractalJob_1452257628594_410365359/out/reduce-out.
Another observation would be that this error appears only when I use more then map.
UPDATED Here is my Mapper and Reducer code.
public static class RasterMapper extends Mapper<IntWritable, IntWritable, IntWritable, IntWritable> {
private int imageS;
private static Complex mapConstant;
#Override
public void setup(Context context) throws IOException {
imageS = context.getConfiguration().getInt("image.size", -1);
mapConstant = new Complex(context.getConfiguration().getDouble("constant.re", -1),
context.getConfiguration().getDouble("constant.im", -1));
}
#Override
public void map(IntWritable begin, IntWritable end, Context context) throws IOException, InterruptedException {
for (int x = (int) begin.get(); x < end.get(); x++) {
for (int y = 0; y < imageS; y++) {
float hue = 0, brighness = 0;
int icolor = 0;
Complex z = new Complex(2.0 * (x - imageS / 2) / (imageS / 2),
1.33 * (y - imageS / 2) / (imageS / 2));
icolor = startCompute(generateZ(z), 0);
if (icolor != -1) {
brighness = 1f;
}
hue = (icolor % 256) / 255.0f;
Color color = Color.getHSBColor(hue, 1f, brighness);
try {
context.write(new IntWritable(x + y * imageS), new IntWritable(color.getRGB()));
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
private static Complex generateZ(Complex z) {
return (z.times(z)).plus(mapConstant);
}
private static int startCompute(Complex z, int color) {
if (z.abs() > 4) {
return color;
} else if (color >= 255) {
return -1;
} else {
color = color + 1;
return startCompute(generateZ(z), color);
}
}
}
public static class ImageReducer extends Reducer<IntWritable, IntWritable, WritableComparable<?>, Writable> {
private SequenceFile.Writer writer;
#Override
protected void cleanup(Context context) throws IOException, InterruptedException {
writer.close();
}
#Override
public void setup(Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
Path outDir = new Path(conf.get(FileOutputFormat.OUTDIR));
Path outFile = new Path(outDir, "pixels-out");
Option optPath = SequenceFile.Writer.file(outFile);
Option optKey = SequenceFile.Writer.keyClass(IntWritable.class);
Option optVal = SequenceFile.Writer.valueClass(IntWritable.class);
Option optCom = SequenceFile.Writer.compression(CompressionType.NONE);
try {
writer = SequenceFile.createWriter(conf, optCom, optKey, optPath, optVal);
} catch (Exception e) {
e.printStackTrace();
}
}
#Override
public void reduce (IntWritable key, Iterable<IntWritable> value, Context context) throws IOException, InterruptedException {
try{
writer.append(key, value.iterator().next());
} catch (Exception e) {
e.printStackTrace();
}
}
}
I hope you guys can help me out.
Thank you!
EDIT:
Job failed as tasks failed. failedMaps:1 failedReduces:0
Looking better at the logs I noticed I think that the issue come from the way I feed my data to the maps.I split my image size into several sequence files so that the maps can read it from there and compute the colors for the pixels in that area.
This is the way I create the files :
try {
int offset = 0;
// generate an input file for each map task
for (int i = 0; i < mapNr; ++i) {
final Path file = new Path(input, "part" + i);
final IntWritable begin = new IntWritable(offset);
final IntWritable end = new IntWritable(offset + imgSize / mapNr);
offset = (int) end.get();
Option optPath = SequenceFile.Writer.file(file);
Option optKey = SequenceFile.Writer.keyClass(IntWritable.class);
Option optVal = SequenceFile.Writer.valueClass(IntWritable.class);
Option optCom = SequenceFile.Writer.compression(CompressionType.NONE);
SequenceFile.Writer writer = SequenceFile.createWriter(conf, optCom, optKey, optPath, optVal);
try {
writer.append(begin, end);
} catch (Exception e) {
e.printStackTrace();
} finally {
writer.close();
}
System.out.println("Wrote input for Map #" + i);
}
Log file:
16/01/10 19:06:04 INFO client.RMProxy: Connecting to ResourceManager at /172.16.199.132:8032
16/01/10 19:06:07 INFO input.FileInputFormat: Total input paths to process : 4
16/01/10 19:06:07 INFO mapreduce.JobSubmitter: number of splits:4
16/01/10 19:06:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1452444283951_0007
16/01/10 19:06:08 INFO impl.YarnClientImpl: Submitted application application_1452444283951_0007
16/01/10 19:06:08 INFO mapreduce.Job: The url to track the job: http://172.16.199.132:8088/proxy/application_1452444283951_0007/
16/01/10 19:06:08 INFO mapreduce.Job: Running job: job_1452444283951_0007
16/01/10 19:06:19 INFO mapreduce.Job: Job job_1452444283951_0007 running in uber mode : false
16/01/10 19:06:20 INFO mapreduce.Job: map 0% reduce 0%
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000002_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000000_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_0, Status : FAILED
16/01/10 19:07:07 INFO mapreduce.Job: map 25% reduce 0%
16/01/10 19:07:08 INFO mapreduce.Job: map 50% reduce 0%
16/01/10 19:07:10 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_1, Status : FAILED
16/01/10 19:07:11 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_1, Status : FAILED
16/01/10 19:07:25 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_r_000000_0, Status : FAILED
16/01/10 19:07:32 INFO mapreduce.Job: map 100% reduce 0%
16/01/10 19:07:32 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_2, Status : FAILED
16/01/10 19:07:32 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_2, Status : FAILED
16/01/10 19:07:33 INFO mapreduce.Job: map 50% reduce 0%
16/01/10 19:07:43 INFO mapreduce.Job: map 75% reduce 0%
16/01/10 19:07:44 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_r_000000_1, Status : FAILED
16/01/10 19:07:50 INFO mapreduce.Job: map 100% reduce 100%
16/01/10 19:07:51 INFO mapreduce.Job: Job job_1452444283951_0007 failed with state FAILED due to: Task failed task_1452444283951_0007_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/01/10 19:07:51 INFO mapreduce.Job: Counters: 40
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=3048165
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=765
HDFS: Number of bytes written=0
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=9
Failed reduce tasks=2
Killed reduce tasks=1
Launched map tasks=12
Launched reduce tasks=3
Other local map tasks=8
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=239938
Total time spent by all reduces in occupied slots (ms)=34189
Total time spent by all map tasks (ms)=239938
Total time spent by all reduce tasks (ms)=34189
Total vcore-seconds taken by all map tasks=239938
Total vcore-seconds taken by all reduce tasks=34189
Total megabyte-seconds taken by all map tasks=245696512
Total megabyte-seconds taken by all reduce tasks=35009536
Map-Reduce Framework
Map input records=3
Map output records=270000
Map output bytes=2160000
Map output materialized bytes=2700018
Input split bytes=441
Combine input records=0
Spilled Records=270000
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=538
CPU time spent (ms)=5520
Physical memory (bytes) snapshot=643928064
Virtual memory (bytes) snapshot=2537975808
Total committed heap usage (bytes)=408760320
File Input Format Counters
Bytes Read=324
Constructing image...
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://172.16.199.132:9000/user/hduser/FractalJob_1452445557585_342741171/out/pixels-out
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
at FractalJob.generateFractal(FractalJob.j..
This is the configuration:
conf.setInt("image.size", imgSize);
conf.setDouble("constant.re", FractalJob.constant.re());
conf.setDouble("constant.im", FractalJob.constant.im());
Job job = Job.getInstance(conf);
job.setJobName(FractalJob.class.getSimpleName());
job.setJarByClass(FractalJob.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setMapperClass(RasterMapper.class);
job.setReducerClass(ImageReducer.class);
job.setNumReduceTasks(1);
job.setSpeculativeExecution(false);
final Path input = new Path(filePath, "in");
final Path output = new Path(filePath, "out");
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, output);
You don't need to worry about creating your own sequence files. MapReduce has an output format that does it automatically.
So, in your driver class you would use:
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
and then in the reducer you'd write:
context.write(key, values.iterator().next());
and delete all of the setup method.
As a kind of aside, it doesn't look like you need a reducer at all. If you're not doing any calculations in the reducer and you're not doing anything with grouping (which I presume you're not), then why not just delete it? job.setOutputFormatClass(SequenceFileOutputFormat.class) will write your mapper output to sequence files.
If you do only want one output file, set
job.setNumReduceTasks(1);
And provided your final data isn't > 1 block size, you'll get the output you want.
It's worth noting that you're currently only outputting one value per key - you should ensure that you want that, and include a loop in the reducer to iterate over the values if you don't.

HADOOP: After 100% map and 100% reduce throws java.lang.NumberFormatException

I am running the shortest path algorithm using map-reduce framework through Hadoop-1.2.1 on the graph on a graph (~1 Million vertex, 10 Million edges). The code I am using is working fine, as I have tested it on small data set. But when I am running it over this large data set , the code runs for a while and going till 100% Map, 100% Reduce, but gets stuck after that and throws " java.lang.NumberFormatException"
13/11/26 09:27:52 INFO output.FileOutputCommitter: Saved output of task 'attempt_local849927259_0001_r_000000_0' to /home/hduser/Desktop/final65440004050210
13/11/26 09:27:52 INFO mapred.LocalJobRunner: reduce > reduce
13/11/26 09:27:52 INFO mapred.Task: Task 'attempt_local849927259_0001_r_000000_0' done.
13/11/26 09:27:53 INFO mapred.JobClient: map 100% reduce 100%
13/11/26 09:27:53 INFO mapred.JobClient: Job complete: job_local849927259_0001
13/11/26 09:27:53 INFO mapred.JobClient: Counters: 20
13/11/26 09:27:53 INFO mapred.JobClient: File Output Format Counters
13/11/26 09:27:53 INFO mapred.JobClient: Bytes Written=52398725
13/11/26 09:27:53 INFO mapred.JobClient: FileSystemCounters
13/11/26 09:27:53 INFO mapred.JobClient: FILE_BYTES_READ=988857216
13/11/26 09:27:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1230974329
13/11/26 09:27:53 INFO mapred.JobClient: File Input Format Counters
13/11/26 09:27:53 INFO mapred.JobClient: Bytes Read=39978936
13/11/26 09:27:53 INFO mapred.JobClient: Map-Reduce Framework
13/11/26 09:27:53 INFO mapred.JobClient: Reduce input groups=1137931
13/11/26 09:27:53 INFO mapred.JobClient: Map output materialized bytes=163158951
13/11/26 09:27:53 INFO mapred.JobClient: Combine output records=0
13/11/26 09:27:53 INFO mapred.JobClient: Map input records=570075
13/11/26 09:27:53 INFO mapred.JobClient: Reduce shuffle bytes=0
13/11/26 09:27:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/11/26 09:27:53 INFO mapred.JobClient: Reduce output records=1137931
13/11/26 09:27:53 INFO mapred.JobClient: Spilled Records=21331172
13/11/26 09:27:53 INFO mapred.JobClient: Map output bytes=150932554
13/11/26 09:27:53 INFO mapred.JobClient: CPU time spent (ms)=0
13/11/26 09:27:53 INFO mapred.JobClient: Total committed heap usage (bytes)=1638268928
13/11/26 09:27:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/11/26 09:27:53 INFO mapred.JobClient: Combine input records=0
13/11/26 09:27:53 INFO mapred.JobClient: Map output records=6084261
13/11/26 09:27:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=202
13/11/26 09:27:53 INFO mapred.JobClient: Reduce input records=6084261
13/11/26 09:27:55 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/26 09:27:55 INFO input.FileInputFormat: Total input paths to process : 1
13/11/26 09:27:56 INFO mapred.JobClient: Running job: job_local2046662654_0002
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Waiting for map tasks
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Starting task: attempt_local2046662654_0002_m_000000_0
13/11/26 09:27:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#43c319b9
13/11/26 09:27:56 INFO mapred.MapTask: Processing split: file:/home/hduser/Desktop/final65440004050210/part-r-00000:0+33554432
13/11/26 09:27:56 INFO mapred.MapTask: io.sort.mb = 100
13/11/26 09:27:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/26 09:27:56 INFO mapred.MapTask: record buffer = 262144/327680
13/11/26 09:27:56 INFO mapred.MapTask: Starting flush of map output
13/11/26 09:27:56 INFO mapred.MapTask: Finished spill 0
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Starting task: attempt_local2046662654_0002_m_000001_0
13/11/26 09:27:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#4c6b851b
13/11/26 09:27:56 INFO mapred.MapTask: Processing split: file:/home/hduser/Desktop/final65440004050210/part-r-00000:33554432+18438093
13/11/26 09:27:56 INFO mapred.MapTask: io.sort.mb = 100
13/11/26 09:27:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/26 09:27:56 INFO mapred.MapTask: record buffer = 262144/327680
13/11/26 09:27:56 INFO mapred.MapTask: Starting flush of map output
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Map task executor complete.
13/11/26 09:27:56 WARN mapred.LocalJobRunner: job_local2046662654_0002
java.lang.Exception: java.lang.NumberFormatException: For input string: "UNMODED"
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NumberFormatException: For input string: "UNMODED"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at graph.Dijkstra$TheMapper.map(Dijkstra.java:42)
at graph.Dijkstra$TheMapper.map(Dijkstra.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
13/11/26 09:27:57 INFO mapred.JobClient: map 0% reduce 0%
13/11/26 09:27:57 INFO mapred.JobClient: Job complete: job_local2046662654_0002
13/11/26 09:27:57 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.FileNotFoundException: File /home/hduser/Desktop/final65474682682135/part-r-00000 does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
at graph.Dijkstra.run(Dijkstra.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at graph.Dijkstra.main(Dijkstra.java:181)
Code:
package graph;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Iterator;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class Dijkstra extends Configured implements Tool {
public static String OUT = "outfile";
public static String IN = "inputlarger";
public static class TheMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//From slide 20 of Graph Algorithms with MapReduce (by Jimmy Lin, Univ # Maryland)
//Key is node n
//Value is D, Points-To
//For every point (or key), look at everything it points to.
//Emit or write to the points to variable with the current distance + 1
Text word = new Text();
String line = value.toString();//looks like 1 0 2:3:
String[] sp = line.split(" ");//splits on space
/* String[] bh = null;
for(int i=0; i<sp.length-2;i++){
bh[i] = sp[i+2];
}*/
int distanceadd = Integer.parseInt(sp[1]) + 1;
String[] PointsTo = sp[2].split(":");
//System.out.println("Pont4");
for(int i=0; i<PointsTo.length; i++){
word.set("VALUE "+distanceadd);//tells me to look at distance value
context.write(new LongWritable(Integer.parseInt(PointsTo[i])), word);
word.clear();
}
//pass in current node's distance (if it is the lowest distance)
//System.out.println("Pont3");
word.set("VALUE "+sp[1]);
context.write( new LongWritable( Integer.parseInt( sp[0] ) ), word );
word.clear();
word.set("NODES "+sp[2]);//tells me to append on the final tally
context.write( new LongWritable( Integer.parseInt( sp[0] ) ), word );
word.clear();
}
}
public static class TheReducer extends Reducer<LongWritable, Text, LongWritable, Text> {
public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
//From slide 20 of Graph Algorithms with MapReduce (by Jimmy Lin, Univ # Maryland)
//The key is the current point
//The values are all the possible distances to this point
//we simply emit the point and the minimum distance value
System.out.println("in reuduce");
String nodes = "UNMODED";
Text word = new Text();
int lowest = 10009;//start at infinity
for (Text val : values) {//looks like NODES/VALUES 1 0 2:3:, we need to use the first as a key
String[] sp = val.toString().split(" ");//splits on space
//look at first value
if(sp[0].equalsIgnoreCase("NODES")){
//System.out.println("Pont1");
nodes = null;
nodes = sp[1];
}else if(sp[0].equalsIgnoreCase("VALUE")){
//System.out.println("Pont2");
int distance = Integer.parseInt(sp[1]);
lowest = Math.min(distance, lowest);
}
}
word.set(lowest+" "+nodes);
context.write(key, word);
word.clear();
}
}
//Almost exactly from http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html
public int run(String[] args) throws Exception {
//http://code.google.com/p/joycrawler/source/browse/NetflixChallenge/src/org/niubility/learning/knn/KNNDriver.java?r=242
getConf().set("mapred.textoutputformat.separator", " ");//make the key -> value space separated (for iterations)
//set in and out to args.
//IN = args[0];
//OUT = args[1];
IN = "/home/hduser/Desktop/youCP3.txt";
OUT = "/home/hduser/Desktop/final";
String infile = IN;
String outputfile = OUT + System.nanoTime();
boolean isdone = false;
boolean success = false;
HashMap <Integer, Integer> _map = new HashMap<Integer, Integer>();
while(isdone == false){
Job job = new Job(getConf());
job.setJarByClass(Dijkstra.class);
job.setJobName("Dijkstra");
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(TheMapper.class);
job.setReducerClass(TheReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(infile));
FileOutputFormat.setOutputPath(job, new Path(outputfile));
success = job.waitForCompletion(true);
//remove the input file
//http://eclipse.sys-con.com/node/1287801/mobile
if(infile != IN){
String indir = infile.replace("part-r-00000", "");
Path ddir = new Path(indir);
FileSystem dfs = FileSystem.get(getConf());
dfs.delete(ddir, true);
}
infile = outputfile+"/part-r-00000";
outputfile = OUT + System.nanoTime();
//do we need to re-run the job with the new input file??
//http://www.hadoop-blog.com/2010/11/how-to-read-file-from-hdfs-in-hadoop.html
isdone = true;//set the job to NOT run again!
Path ofile = new Path(infile);
FileSystem fs = FileSystem.get(new Configuration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(ofile)));
HashMap<Integer, Integer> imap = new HashMap<Integer, Integer>();
String line=br.readLine();
while (line != null){
//each line looks like 0 1 2:3:
//we need to verify node -> distance doesn't change
String[] sp = line.split(" ");
int node = Integer.parseInt(sp[0]);
int distance = Integer.parseInt(sp[1]);
imap.put(node, distance);
line=br.readLine();
}
if(_map.isEmpty()){
//first iteration... must do a second iteration regardless!
isdone = false;
}else{
//http://www.java-examples.com/iterate-through-values-java-hashmap-example
//http://www.javabeat.net/articles/33-generics-in-java-50-1.html
Iterator<Integer> itr = imap.keySet().iterator();
while(itr.hasNext()){
int key = itr.next();
int val = imap.get(key);
if(_map.get(key) != val){
//values aren't the same... we aren't at convergence yet
isdone = false;
}
}
}
if(isdone == false){
_map.putAll(imap);//copy imap to _map for the next iteration (if required)
}
}
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new Dijkstra(), args));
}
}
small working dataset:
1 0 2:3:
2 10000 1:4:5:
3 10000 1:
4 10000 2:5:
4 10000 6:
5 10000 2:4:
6 10000 4:
6 10000 7:
7 10000 6:

Categories

Resources