Hadoop run command java.lang.ClassNotFoundException - java

I have successfully installed hadoop 3.0.0 stand alone to run on Ubuntu 16.04.
I created a jar using the following code from Apache hadoop tutorial.
import java.io.IOException
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WDCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WDCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Creating WDCount.jar was successful with no errors
Then I created Input and Output folders and Made a text file with a phrase in and saved it as fileo1.txt in the input folder.
I created this text to run hadoop on the WDCount.jar
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/WDCount.jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Output
When I run the code I get this message;
Exception in thread "main" java.lang.ClassNotFoundException: /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:232)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Can anyone tell me what is wrong?

Include name of the class file containing main method after jar name
usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/WDCount.jar WDCount /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Input /usr/local/hadoop/share/hadoop/mapreduce/Wordcount/Output

Related

Hadoop MapReduce Jobs for Highest Frequency

I'm trying to use the basic word count as defined here. Is it possible that when the IntSumReducer does context.write, that context.write could be passed to a second reducer or output class that would reduce/change the final list given by the IntSumReducer down to a single largest frequency?
I am quite new to Hadoop/MapReduce and the concept of jobs in Java so I'm uncertain how exactly I would need to modify the default WordCount to comply to make that possible. Could I write a second Reducer function and place it inside of the same job? How would I do that? How would I signal that there is another reducer to be run after IntSumReducer?
Base WordCount:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}`
What you're looking for is called a Combiner in hadoop, which does some semi-reduction before emitting the output to a final reducer class. For more info on it click here.

java.lang.VerifyError with Hadoop

I'm working in a java project using Hadoop and I have a java.lang.VerifyError and I don't know how to resolve it. I saw people with the same type of question but without answer or the solution are not working in my case.
My class :
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
public class GetStats {
public static List<Statistique> stats; // class with one String an one int
public static class TokenizerMapper extends
Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
if (key.toString().contains("HEAD")
|| key.toString().contains("POST")
|| key.toString().contains("GET")
|| key.toString().contains("OPTIONS")
|| key.toString().contains("CONNECT"))
GetStats.stats.add(new Statistique(key.toString().replace("\"", ""), sum));
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
System.out.println("Start wc");
stats = new ArrayList<>();
// File file = new File("err.txt");
// FileOutputStream fos = new FileOutputStream(file);
// PrintStream ps = new PrintStream(fos);
// System.setErr(ps);
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(GetStats.class);
job.setMapperClass(TokenizerMapper.class);
// job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("input"));
job.setOutputFormatClass(NullOutputFormat.class);
job.waitForCompletion(true);
System.out.println(stats);
System.out.println("End");
}
}
and the error :
Exception in thread "main" java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapred/JobTrackerInstrumentation.create(Lorg/apache/hadoop/mapred/JobTracker;Lorg/apache/hadoop/mapred/JobConf;)Lorg/apache/hadoop/mapred/JobTrackerInstrumentation; #5: invokestatic
Reason:
Type 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' (current frame, stack[2]) is not assignable to 'org/apache/hadoop/metrics2/MetricsSystem'
Current Frame:
bci: #5
flags: { }
locals: { 'org/apache/hadoop/mapred/JobTracker', 'org/apache/hadoop/mapred/JobConf' }
stack: { 'org/apache/hadoop/mapred/JobTracker', 'org/apache/hadoop/mapred/JobConf', 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' }
Bytecode:
0000000: 2a2b b200 03b8 0004 b0
at org.apache.hadoop.mapred.LocalJobRunner.<init>(LocalJobRunner.java:573)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:494)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:479)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:563)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:561)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:549)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at hadoop.GetStats.main(GetStats.java:79)
Do you have any idea ? if you need something more to help me just ask.
I solved my problem.
The imported jar was good, but another version (probably older one) which I had tried earlier, was also in the project folder. When I called the class, it appears that older version of the jar in was used. Also, that jar was before the one I wanted in the class path. I deleted the older jar from the project folder and it worked.

Error opening job jar: file in hdfs

I have been trying to fix this one but not sure what is the mistake I make here! Can you please help me on this! Thanks a lot in advance!
My program:
package hadoopbook;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
//Mapper
public static class WcMapperDemo extends Mapper<LongWritable, Text, Text, IntWritable>{
Text MapKey = new Text();
IntWritable MapValue = new IntWritable();
public void map(LongWritable key, Text Value, Context Context) throws IOException, InterruptedException{
String Record = Value.toString();
String[] Words = Record.split(",");
for (String Word:Words){
MapKey.set(Word);
MapValue.set(1);
Context.write(MapKey, MapValue);
}
}
}
//Reducer
public static class WcReducerDemo extends Reducer<Text, IntWritable, Text, IntWritable>{
IntWritable RedValue = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> Values, Context Context) throws IOException, InterruptedException{
int sum = 0;
for (IntWritable Value:Values){
sum = sum + Value.get();
}
RedValue.set(sum);
Context.write(key, RedValue);
}
}
//Driver
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration Conf = new Configuration();
Job Job = new Job(Conf, "Word Count Job");
Job.setJarByClass(WordCount.class);
Job.setMapperClass(WcMapperDemo.class);
Job.setReducerClass(WcReducerDemo.class);
Job.setMapOutputKeyClass(Text.class);
Job.setMapOutputValueClass(IntWritable.class);
Job.setOutputKeyClass(Text.class);
Job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(Job, new Path (args[0]));
FileOutputFormat.setOutputPath(Job, new Path (args[1]));
System.exit(Job.waitForCompletion(true) ? 0:1);
}
}
Jar file is placed on hdfs in the below location:
/user/cloudera/Programs/WordCount.jar
Permissions are:
rw-rw-rw-
Input file is placed in below location:
/user/cloudera/Input/Words.txt
Permissions are:
rw-rw-rw-
Output folder is as below:
/user/cloudera/Output
When I am trying to run this:
[cloudera#localhost ~]$ hadoop jar /user/cloudera/Programs/WordCount.jar hadoopbook.WordCount /user/cloudera/Input/Words.txt /user/cloudera/Output
After this I get an error and I am stuck here!
Exception in thread "main" java.io.IOException: Error opening job jar: /user/cloudera/Programs/WordCount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:135)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:127)
at java.util.jar.JarFile.<init>(JarFile.java:135)
at java.util.jar.JarFile.<init>(JarFile.java:72)
at org.apache.hadoop.util.RunJar.main(RunJar.java:133)
Jar needs to be present in the local file system (it should not be present in HDFS.) and you need to have entire package name for the main class.

Hadoop : Reducer class not called even with Overrides

I was trying a mapreduce wordcount code in hadoop, but the reducer class is never called and the program terminates after running the mapper class.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I have even Overridden the classes as required.
IDE : Eclipse Luna
Hadoop : version 2.5
A Job object forms the specification of the job and gives you control over how the job is run. When we run this job on a Hadoop cluster, we will package the code into a JAR file (which Hadoop will distribute around the cluster).
Rather than explicitly specify the name of the JAR file, we can pass a class in the Job's setJarByClass() method, which Hadoop will use to locate the relevant JAR file by looking for the JAR file containing this class.
I do not see the statement in the main method. Hence, include this and then compile and run the code.
job.setJarByClass(WordCount.class);

ClassNotFoundException when running WordCount example in Eclipse

I'm trying to run the exemplary code for WordCount map/reduce job. I'm running it on Hadoop 1.2.1. and I'm running it from my Eclipse. Here is the code I try to run:
package mypackage;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "maprfs://,y_address");
conf.set("fs.default.name", "hdfs://my_address");
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Unfortunatelly, running this code ends up with the following error:
13/11/04 13:27:53 INFO mapred.JobClient: Task Id :
attempt_201310311611_0005_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
com.rf.hadoopspikes.WordCount$Map at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at
org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
I understand that the WordClass cannot be found but I have no idea how to make this work.
Any ideas?
When running this directly from Eclipse, you need to make sure the classes have been bundled into a Jar file (for which hadoop then copies up to HDFS). Your error most probably relates to the fact that your Jar hasn't been built, or at runtime the classes are being run from the output directory and not the bundled jar.
Try and export the classes into a jar file, and then run your WordCount class from that Jar file. You could also look into using the Eclipse Hadoop plugin that i think handles all this form you. Final option would be to bundle the jar and then launch from the command line (as outlined in the various Hadoop tutorials)

Categories

Resources