Getting NullPointerException while running a normal Mapreduce program in Eclipse - java

I am getting NullPointerException when trying to execute a simple MapReduce program. I am unable to understand where the problem is?
package MapReduce.HadMapReduce;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class RecCount extends Configured implements Tool {
public int run(String[] arg0) throws Exception {
Job job = Job.getInstance(getConf());
FileInputFormat.setInputPaths(job, new Path("C:\\singledeck.txt"));
FileOutputFormat.setOutputPath(job, new Path("C:\\temp123"));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
System.exit(ToolRunner.run(new RecCount(), args));
}
}
Error is:
Exception in thread "main" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:483)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:731)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:489)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:530)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:507)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at MapReduce.HadMapReduce.RecCount.run(RecCount.java:22)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at MapReduce.HadMapReduce.RecCount.main(RecCount.java:26)
This is the logic which is happening behind the scenes:
ToolRunner is calling the below run method and this method calls its other run method (which is pasted right below this) where the configuration is set if it is null.
public static int run(Tool tool, String[] args) throws Exception {
return run(tool.getConf(), tool, args);
}
public static int run(Configuration conf, Tool tool, String[] args) throws Exception {
if (conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
// set the configuration back, so that Tool can configure itself
tool.setConf(conf);
// get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
In the last statement above run method is called which I implemented because I implemented Tool interface. I don't see any error in my code. Please let me know if you can find any!
Can someone please explain what is the problem with my code?

Related

Not able to use CompositetextinputFormat in Mapside Join

I am trying to implement Map-side join using CompositeTextInoutFormat. However I am getting below errors in Map reduce job which I am unable to resolve,.
1. In the below code I am getting error while using Compose method and also getting an error while setting inputformat Class. The error says as below.
The method compose(String, Class, Path...) in
the type CompositeInputFormat is not applicable for the arguments
(String, Class, Path[])
Can someone please help
package Hadoop.MR.Practice;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.join.CompositeInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
//import org.apache.hadoop.mapred.join.CompositeInputFormat;
public class MapJoinJob implements Tool{
private Configuration conf;
public Configuration getConf() {
return conf;
}
public void setConf(Configuration conf) {
this.conf = conf;
}
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "MapSideJoinJob");
job.setJarByClass(this.getClass());
Path[] inputs = new Path[] { new Path(args[0]), new Path(args[1])};
String join = CompositeInputFormat.compose("inner", KeyValueTextInputFormat.class, inputs);
job.getConfiguration().set("mapreduce.join.expr", join);
job.setInputFormatClass(CompositeInputFormat.class);
job.setMapperClass(MapJoinMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
//Configuring reducer
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setNumReduceTasks(0);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
MapJoinJob mjJob = new MapJoinJob();
ToolRunner.run(conf, mjJob, args);
}
I would say your problem is likely related to mixing hadoop APIs. You can see that your imports are mixing mapred and mapreduce.
For example, you're trying to use org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat with org.apache.hadoop.mapred.join.CompositeInputFormat which is unlikely to work.
You should choose one (probably mapreduce i would say) and make sure everything is using the same API.

getting error while executing wordcount program in hadoop

I'm trying to execute the WordCount program for hadoop in eclipse and i'm getting following error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at WordCount.run(WordCount.java:22)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at WordCount.main(WordCount.java:35)
I have copied the code from internet and the code seems fine however for reference i'm pasting the code here:
WordCount.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount extends Configured implements Tool{
public int run(String[] args) throws Exception
{
//creating a JobConf object and assigning a job name for identification purposes
JobConf conf = new JobConf(getConf(), WordCount.class);
conf.setJobName("WordCount");
//Setting configuration object with the Data Type of output Key and Value
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
//Providing the mapper and reducer class names
conf.setMapperClass(WordCountMapper.class);
conf.setReducerClass(WordCountReducer.class);
//We wil give 2 arguments at the run time, one in input path and other is output path
Path inp = new Path(args[0]);
Path out = new Path(args[1]);
//the hdfs input and output directory to be fetched from the command line
FileInputFormat.addInputPath(conf, inp);
FileOutputFormat.setOutputPath(conf, out);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
// this main function will call run method defined above.
int res = ToolRunner.run(new Configuration(), new WordCount(),args);
System.exit(res);
}
}
Seems you are not passing proper command line arguments.

cannot find HibInputFormat class. Getting excetion classDef not found

hduser#akshay-Lenovo-G580:~$ hadoop jar /home/hduser/HipiDemo.jar HelloWorld sampleimages.hib sampleimages_average
Warning: $HADOOP_HOME is deprecated.
Exception in thread "main" java.lang.NoClassDefFoundError: org/hipi/imagebundle/mapreduce/HibInputFormat
at HelloWorld.run(HelloWorld.java:44)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at HelloWorld.main(HelloWorld.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.hipi.imagebundle.mapreduce.HibInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
my Code:
import hipi.image.FloatImage;
import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.hipi.image.HipiImageHeader;
import org.hipi.imagebundle.mapreduce.HibInputFormat;
public class HelloWorld extends Configured implements Tool {
public static class HelloWorldMapper extends Mapper<HipiImageHeader, FloatImage, IntWritable, FloatImage> {
public void map(HipiImageHeader key, FloatImage value, Context context)
throws IOException, InterruptedException {
}
}
public static class HelloWorldReducer extends Reducer<IntWritable, FloatImage, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<FloatImage> values, Context context)
throws IOException, InterruptedException {
}
}
public int run(String[] args) throws Exception {
// Check input arguments
if (args.length != 2) {
System.out.println("Usage: helloWorld <input HIB> <output directory>");
System.exit(0);
}
// Initialize and configure MapReduce job
//Job job = Job.getInstance();
Job job = new Job(getConf(), "Employee Salary");
// Set input format class which parses the input HIB and spawns map tasks
job.setInputFormatClass(HibInputFormat.class);
// Set the driver, mapper, and reducer classes which express the computation
job.setJarByClass(HelloWorld.class);
job.setMapperClass(HelloWorldMapper.class);
job.setReducerClass(HelloWorldReducer.class);
// Set the types for the key/value pairs passed to/from map and reduce layers
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(FloatImage.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
// Set the input and output paths on the HDFS
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Execute the MapReduce job and block until it complets
boolean success = job.waitForCompletion(true);
// Return success or failure
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new HelloWorld(), args);
System.exit(0);
}
}
add the jar containing the class HibInputFormat to your classpath.
Or if you use line commands while compiling:
ex :
javac -classpath /lib/jarContainingTheClass.jar /examples/HelloWorld.java

Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.MaxTemperatureWithCompression

I am trying to run a MapReduce job using(using new API) on Hadoop 2.7.1 using command line. I have followed the below steps. No error in compiling and creating a jar file.
javac -cp `hadoop classpath` MaxTemperatureWithCompression.java -d /Users/gangadharkadam/hadoopdata/build
jar -cvf MaxTemperatureWithCompression.jar /Users/gangadharkadam/hadoopdata/build
hadoop jar MaxTemperatureWithCompression.jar org.myorg.MaxTemperatureWithCompression user/ncdc/input /user/ncdc/output
Error Messages-
Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.MaxTemperatureWithCompression
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Java Code-
package org.myorg;
//Standard Java Classes
import java.io.IOException;
import java.util.regex.Pattern;
//extends the class Configured, and implements the Tool utility class
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.GenericOptionsParser;
//send debugging messages from inside the mapper and reducer classes
import org.apache.log4j.Logger;
//Job class in order to create, configure, and run an instance of your MapReduce
import org.apache.hadoop.mapreduce.Job;
//extend the Mapper class with your own Map class and add your own processing instructions
import org.apache.hadoop.mapreduce.Mapper;
//extend it to create and customize your own Reduce class
import org.apache.hadoop.mapreduce.Reducer;
//Path class to access files in HDFS
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
//pass required paths using the FileInputFormat and FileOutputFormat classes
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//Writable objects for writing, reading,and comparing values during map and reduce processing
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.GzipCodec;
public class MaxTemperatureWithCompression extends Configured implements Tool {
private static final Logger LOG = Logger.getLogger(MaxTemperatureWithCompression.class);
//main menhod to invoke the toolrunner to create instance of MaxTemperatureWithCompression
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new MaxTemperatureWithCompression(), args);
System.exit(res);
}
//call the run method to configure the job
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperatureWithCompression <input path> " + "<output path>");
System.exit(-1);
}
Job job = Job.getInstance(getConf(), "MaxTemperatureWithCompression");
//set the jar to use based on the class
job.setJarByClass(MaxTemperatureWithCompression.class);
//set the input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//set the output key and value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//set the compressionformat
/*[*/FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);/*]*/
//set the mapper and reducer class
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
return job.waitForCompletion(true) ? 0 : 1;
}
//mapper
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException,InterruptedException {
String line = value.toString();
String year = line.substring(15,19);
int airTemperature;
if (line.charAt(87) == '+') {
airTemperature = Integer.parseInt(line.substring(88, 92));
}
else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92,93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
//reducer
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
}
I see few posts on the same issue but those couldn't help me to resolve this issue. Any help on resolving this is highly appreciated. Thanks in advance.

Java.lang.ClassNotFoundException error while trying to read a word document in mapreduce using Apache POI

I am trying to read a word document file in my mapreduce program, for which I have used an user defined fileInputFormat classes as WordDocxInputFormat and WordDocxInputFormatRecordReader. In the WordDocxInputFormatRecordReader class I am using Apache POI to read the word .docx file. But I am getting a java.lang.ClassNotFoundException run time error.
I am using Eclipse and Hadoop-0.20.2 in windows 7 platform.
I have defined my CLASSPATH as : JAVA_HOME\lib;C:\cygwin\home\bmohanty6\poijars\;
In C:\cygwin\home\bmohanty6\poijars\ I have kept below jar files(in the attached image ) needed for POI and also added them into Project->property->libraries->add external jar.
I am getting error as
13/09/17 12:35:26 INFO mapred.JobClient: Task Id : attempt_201309101108_0040_m_000000_2, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.poi.xwpf.usermodel.XWPFDocument
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at WordDocxInputFormat$WordDocxInputFormatRecordReader.next(WordDocxInputFormat.java:112)
at WordDocxInputFormat$WordDocxInputFormatRecordReader.next(WordDocxInputFormat.java:1)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Here is my WordDocxInputFormat.class
import java.io.IOException;
import java.util.Arrays;
import java.io.FileInputStream;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.DataOutputBuffer;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.xwpf.extractor.*;
/**
* Reads complete documents in Binary format.
*/
public class WordDocxInputFormat
extends FileInputFormat<Text, Text> {
public WordDocxInputFormat() {
super();
}
protected boolean isSplitable(FileSystem fs, Path filename) {
return false;
}
#Override
public RecordReader<Text, Text> getRecordReader(
InputSplit split, JobConf job, Reporter reporter) throws IOException {
return new WordDocxInputFormatRecordReader((FileSplit) split, job);
}
/**
* WordDocxInputFormatRecordReader class to read through a given binary document
* Outputs the filename along with the complete document
*/
public class WordDocxInputFormatRecordReader
implements RecordReader<Text, Text> {
private final FileSplit fileSplit;
private final Configuration conf;
private boolean processed = false;
public WordDocxInputFormatRecordReader(FileSplit fileSplit, Configuration conf)
throws IOException {
this.fileSplit = fileSplit;
this.conf = conf;
}
#Override
public Text createKey() {
return new Text();
}
#Override
public Text createValue() {
return new Text();
}
#Override
public long getPos() throws IOException {
return this.processed ? this.fileSplit.getLength() : 0;
}
#Override
public float getProgress() throws IOException {
return this.processed ? 1.0f : 0.0f;
}
#Override
public boolean next(Text key, Text value) throws IOException {
if (!this.processed) {
Path file = this.fileSplit.getPath();
try{
XWPFDocument docx = new XWPFDocument(new FileInputStream(file.toString()));
XWPFWordExtractor we = new XWPFWordExtractor(docx);
key.set(file.getName());
value.set(we.getText());
}
catch(IOException ex) {
Logger.getLogger(WordDocxInputFormatRecordReader.class.getName()).log(Level.SEVERE, null, ex);
}
this.processed = true;
return true;
}
else {
return false;
}
}
#Override
public void close() throws IOException {
}
}
}
"But I am getting a java.lang.ClassNotFoundException run time error"
If you are getting this error and runtime but not when you compile, then you are almost definitely using a different setup when you run the program as to when you compile. i.e when you compile the compiler looks in the area with the jars, but when you run the program it is looking somewhere else and not finding them hence why your error would only appear at runtime. If this is indeed your issue I have a couple of suggestions:
If you are using eclipse as your tag suggests check your build path, I don't know how eclipse works with compiling and running so you may need to check into that.
Alternatively if you will be using the jars regularly you can try adding them into the external lib of your jvm as this tutorial here shows and then ensure you use that jvm for compiling and running. Placing them in the area shown in the tutorial allows the compiler to check for the jars automatically at compile and runtime when that jvm is used.
Hope this helps,
Good luck!

Categories

Resources