Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.MaxTemperatureWithCompression - java

I am trying to run a MapReduce job using(using new API) on Hadoop 2.7.1 using command line. I have followed the below steps. No error in compiling and creating a jar file.
javac -cp `hadoop classpath` MaxTemperatureWithCompression.java -d /Users/gangadharkadam/hadoopdata/build
jar -cvf MaxTemperatureWithCompression.jar /Users/gangadharkadam/hadoopdata/build
hadoop jar MaxTemperatureWithCompression.jar org.myorg.MaxTemperatureWithCompression user/ncdc/input /user/ncdc/output
Error Messages-
Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.MaxTemperatureWithCompression
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Java Code-
package org.myorg;
//Standard Java Classes
import java.io.IOException;
import java.util.regex.Pattern;
//extends the class Configured, and implements the Tool utility class
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.GenericOptionsParser;
//send debugging messages from inside the mapper and reducer classes
import org.apache.log4j.Logger;
//Job class in order to create, configure, and run an instance of your MapReduce
import org.apache.hadoop.mapreduce.Job;
//extend the Mapper class with your own Map class and add your own processing instructions
import org.apache.hadoop.mapreduce.Mapper;
//extend it to create and customize your own Reduce class
import org.apache.hadoop.mapreduce.Reducer;
//Path class to access files in HDFS
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
//pass required paths using the FileInputFormat and FileOutputFormat classes
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//Writable objects for writing, reading,and comparing values during map and reduce processing
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.GzipCodec;
public class MaxTemperatureWithCompression extends Configured implements Tool {
private static final Logger LOG = Logger.getLogger(MaxTemperatureWithCompression.class);
//main menhod to invoke the toolrunner to create instance of MaxTemperatureWithCompression
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new MaxTemperatureWithCompression(), args);
System.exit(res);
}
//call the run method to configure the job
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperatureWithCompression <input path> " + "<output path>");
System.exit(-1);
}
Job job = Job.getInstance(getConf(), "MaxTemperatureWithCompression");
//set the jar to use based on the class
job.setJarByClass(MaxTemperatureWithCompression.class);
//set the input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//set the output key and value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//set the compressionformat
/*[*/FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);/*]*/
//set the mapper and reducer class
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
return job.waitForCompletion(true) ? 0 : 1;
}
//mapper
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException,InterruptedException {
String line = value.toString();
String year = line.substring(15,19);
int airTemperature;
if (line.charAt(87) == '+') {
airTemperature = Integer.parseInt(line.substring(88, 92));
}
else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92,93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
//reducer
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
}
I see few posts on the same issue but those couldn't help me to resolve this issue. Any help on resolving this is highly appreciated. Thanks in advance.

Related

Hadoop run command java.lang.ClassNotFoundException

I have successfully installed hadoop 3.0.0 stand alone to run on Ubuntu 16.04.
I created a jar using the following code from Apache hadoop tutorial.
import java.io.IOException
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WDCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WDCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Creating WDCount.jar was successful with no errors
Then I created Input and Output folders and Made a text file with a phrase in and saved it as fileo1.txt in the input folder.
I created this text to run hadoop on the WDCount.jar
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/WDCount.jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Output
When I run the code I get this message;
Exception in thread "main" java.lang.ClassNotFoundException: /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:232)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Can anyone tell me what is wrong?
Include name of the class file containing main method after jar name
usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/WDCount.jar WDCount /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Output

cannot find HibInputFormat class. Getting excetion classDef not found

hduser#akshay-Lenovo-G580:~$ hadoop jar /home/hduser/HipiDemo.jar HelloWorld sampleimages.hib sampleimages_average
Warning: $HADOOP_HOME is deprecated.
Exception in thread "main" java.lang.NoClassDefFoundError: org/hipi/imagebundle/mapreduce/HibInputFormat
at HelloWorld.run(HelloWorld.java:44)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at HelloWorld.main(HelloWorld.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.hipi.imagebundle.mapreduce.HibInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
my Code:
import hipi.image.FloatImage;
import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.hipi.image.HipiImageHeader;
import org.hipi.imagebundle.mapreduce.HibInputFormat;
public class HelloWorld extends Configured implements Tool {
public static class HelloWorldMapper extends Mapper<HipiImageHeader, FloatImage, IntWritable, FloatImage> {
public void map(HipiImageHeader key, FloatImage value, Context context)
throws IOException, InterruptedException {
}
}
public static class HelloWorldReducer extends Reducer<IntWritable, FloatImage, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<FloatImage> values, Context context)
throws IOException, InterruptedException {
}
}
public int run(String[] args) throws Exception {
// Check input arguments
if (args.length != 2) {
System.out.println("Usage: helloWorld <input HIB> <output directory>");
System.exit(0);
}
// Initialize and configure MapReduce job
//Job job = Job.getInstance();
Job job = new Job(getConf(), "Employee Salary");
// Set input format class which parses the input HIB and spawns map tasks
job.setInputFormatClass(HibInputFormat.class);
// Set the driver, mapper, and reducer classes which express the computation
job.setJarByClass(HelloWorld.class);
job.setMapperClass(HelloWorldMapper.class);
job.setReducerClass(HelloWorldReducer.class);
// Set the types for the key/value pairs passed to/from map and reduce layers
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(FloatImage.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
// Set the input and output paths on the HDFS
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Execute the MapReduce job and block until it complets
boolean success = job.waitForCompletion(true);
// Return success or failure
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new HelloWorld(), args);
System.exit(0);
}
}
add the jar containing the class HibInputFormat to your classpath.
Or if you use line commands while compiling:
ex :
javac -classpath /lib/jarContainingTheClass.jar /examples/HelloWorld.java

Error opening job jar: file in hdfs

I have been trying to fix this one but not sure what is the mistake I make here! Can you please help me on this! Thanks a lot in advance!
My program:
package hadoopbook;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
//Mapper
public static class WcMapperDemo extends Mapper<LongWritable, Text, Text, IntWritable>{
Text MapKey = new Text();
IntWritable MapValue = new IntWritable();
public void map(LongWritable key, Text Value, Context Context) throws IOException, InterruptedException{
String Record = Value.toString();
String[] Words = Record.split(",");
for (String Word:Words){
MapKey.set(Word);
MapValue.set(1);
Context.write(MapKey, MapValue);
}
}
}
//Reducer
public static class WcReducerDemo extends Reducer<Text, IntWritable, Text, IntWritable>{
IntWritable RedValue = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> Values, Context Context) throws IOException, InterruptedException{
int sum = 0;
for (IntWritable Value:Values){
sum = sum + Value.get();
}
RedValue.set(sum);
Context.write(key, RedValue);
}
}
//Driver
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration Conf = new Configuration();
Job Job = new Job(Conf, "Word Count Job");
Job.setJarByClass(WordCount.class);
Job.setMapperClass(WcMapperDemo.class);
Job.setReducerClass(WcReducerDemo.class);
Job.setMapOutputKeyClass(Text.class);
Job.setMapOutputValueClass(IntWritable.class);
Job.setOutputKeyClass(Text.class);
Job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(Job, new Path (args[0]));
FileOutputFormat.setOutputPath(Job, new Path (args[1]));
System.exit(Job.waitForCompletion(true) ? 0:1);
}
}
Jar file is placed on hdfs in the below location:
/user/cloudera/Programs/WordCount.jar
Permissions are:
rw-rw-rw-
Input file is placed in below location:
/user/cloudera/Input/Words.txt
Permissions are:
rw-rw-rw-
Output folder is as below:
/user/cloudera/Output
When I am trying to run this:
[cloudera#localhost ~]$ hadoop jar /user/cloudera/Programs/WordCount.jar hadoopbook.WordCount /user/cloudera/Input/Words.txt /user/cloudera/Output
After this I get an error and I am stuck here!
Exception in thread "main" java.io.IOException: Error opening job jar: /user/cloudera/Programs/WordCount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:135)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:127)
at java.util.jar.JarFile.<init>(JarFile.java:135)
at java.util.jar.JarFile.<init>(JarFile.java:72)
at org.apache.hadoop.util.RunJar.main(RunJar.java:133)
Jar needs to be present in the local file system (it should not be present in HDFS.) and you need to have entire package name for the main class.

Hadoop Java Error : Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/myorg/WordCount)

I am new to hadoop. I followed the maichel-noll tutorial to set up hadoop in single node.I tried running WordCount program. This is the code I used:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
This is what I get when I try running it.
hduser#aswin-HP-Pavilion-15-Notebook-PC:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt
Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/myorg/WordCount)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:788)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:447)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
Can anyone please help me.
My class path :
hduser#aswin-HP-Pavilion-15-Notebook-PC:/usr/local/hadoop$ hadoop classpath
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/lib/jvm/java-7-openjdk-i386/lib/tools.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
try this,
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
#Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
output.collect(value, new IntWritable(1));
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
then run command
bin/hadoop jar WordCount.jar WordCount /hdfs_Input_filename /output_filename
if your code is in particular package then you have to mention package name with class name
bin/hadoop jar WordCount.jar PakageName.WordCount /hdfs_Input_filename /output_filename
This may sound crazy. I added package org.myorg; to my code and compiled it again. I placed the class files in org/myorg folder and created the jar file using them. Then I ran using the jar wc.jar org.myorg.WordCount command and it got executed successfully. It would be nice if someone could explain me how it actually ran :D . Any way, thanks a lot for helping me guys.
try explicitly including the nested classes(i.e. TokenizerMapper and IntSumReducer) in you jar file. Here is how I did it:
jar cvf WordCount.jar WordCount.class WordCount\$TokenizerMapper.class WordCount\$IntSumReducer.class
there is something wrong with packaging.
you should try this:
jar cf wc.jar WordCount*.class
notice there is a symbol '*'
You are using package in your class. So your command should be
bin/hadoop jar wc.jar org.myorg.WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt
I think you made a mistake here :
/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt
please change it to :
/usr/local/hadoop$ bin/hadoop jar wc.jar org.myorg.WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt
that should work.
#Aswin Alagappan
: Reason is
a jar file cotains your path in it. JVM cannot find your class in the jar file becase it is in the "jar\org\myorg" path. Understand?
The answer of Kishore, allowed me to go in the right direction,
if it’s possible i want to confirm this, reporting what I did about an experiiment with java code on moltiplication of sparse matrix :
1) Source code (downloaded from https://github.com/marufaytekin/MatrixMultiply/tree/master/src/main/java/com/lendap/hadoop), and saved in /home/hduser/playground/src/matrixMult
2) Downloaded datasets (matrix M and N from https://github.com/marufaytekin/MatrixMultiply/tree/master/input), and then saved in HDFS, with the following path : /user/hduser/inMatrix
3) Compilation with hadoop classes, with creation of java Classes in playground/classes5 :
javac -classpath $HADOOP_HOME/share/hadoop/common/lib/activation-1.1.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.1.jar:/usr/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/* -d playground/classes5 playground/src/matrixMult/*
4) Creation of jar file MatrixMultiply.jar with the following command :
jar -cvf playground/MatrixMultiply.jar -C playground/classes5/ .
5) hadoop mapReduce command (from the $HADOOP_HOME path, that in my case is /usr/hadoop/hadoop-2.7.1$
hadoop jar /home/hduser/playground/MatrixMultiply.jar com.lendap.hadoop.MatrixMultiply /user/hduser/inMatrix/ outputMatrix
6) Correct execution of mapreduce job on my 4 nodes cluster. Here, part of the final output :
0,375,890.0
0,376,1005.0
0,377,1377.0
0,378,604.0
0,379,924.0
0,38,476.0
0,380,621.0
0,381,730.0
990,225,542.0
990,226,639.0
990,227,466.0
990,228,406.0
990,229,343.0
990,23,397.0
990,230,794.0

ClassNotFoundException when running WordCount example in Eclipse

I'm trying to run the exemplary code for WordCount map/reduce job. I'm running it on Hadoop 1.2.1. and I'm running it from my Eclipse. Here is the code I try to run:
package mypackage;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "maprfs://,y_address");
conf.set("fs.default.name", "hdfs://my_address");
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Unfortunatelly, running this code ends up with the following error:
13/11/04 13:27:53 INFO mapred.JobClient: Task Id :
attempt_201310311611_0005_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
com.rf.hadoopspikes.WordCount$Map at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at
org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
I understand that the WordClass cannot be found but I have no idea how to make this work.
Any ideas?
When running this directly from Eclipse, you need to make sure the classes have been bundled into a Jar file (for which hadoop then copies up to HDFS). Your error most probably relates to the fact that your Jar hasn't been built, or at runtime the classes are being run from the output directory and not the bundled jar.
Try and export the classes into a jar file, and then run your WordCount class from that Jar file. You could also look into using the Eclipse Hadoop plugin that i think handles all this form you. Final option would be to bundle the jar and then launch from the command line (as outlined in the various Hadoop tutorials)

Categories

Resources