getting error while executing wordcount program in hadoop - java

I'm trying to execute the WordCount program for hadoop in eclipse and i'm getting following error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at WordCount.run(WordCount.java:22)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at WordCount.main(WordCount.java:35)
I have copied the code from internet and the code seems fine however for reference i'm pasting the code here:
WordCount.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount extends Configured implements Tool{
public int run(String[] args) throws Exception
{
//creating a JobConf object and assigning a job name for identification purposes
JobConf conf = new JobConf(getConf(), WordCount.class);
conf.setJobName("WordCount");
//Setting configuration object with the Data Type of output Key and Value
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
//Providing the mapper and reducer class names
conf.setMapperClass(WordCountMapper.class);
conf.setReducerClass(WordCountReducer.class);
//We wil give 2 arguments at the run time, one in input path and other is output path
Path inp = new Path(args[0]);
Path out = new Path(args[1]);
//the hdfs input and output directory to be fetched from the command line
FileInputFormat.addInputPath(conf, inp);
FileOutputFormat.setOutputPath(conf, out);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
// this main function will call run method defined above.
int res = ToolRunner.run(new Configuration(), new WordCount(),args);
System.exit(res);
}
}

Seems you are not passing proper command line arguments.

Related

Solving java.lang.ArrayindexOutOfBoundsException

Can anyone help me solve this error please?
package bigdata.tp1;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.log4j.BasicConfigurator;
public class WordCount {
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
Configuration conf = new Configuration();
Path inputPath=new Path(args[0]);
Path outputPath=new Path(args[1]);// ligne 16
Job job = Job.getInstance(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(MyMapper.class);
job.setCombinerClass(MyReducer.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, inputPath);
FileOutputFormat.setOutputPath(job,outputPath);
if (!job.waitForCompletion(true))
return;
}
}
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
Index 1 out of bounds for length 1 at
bigdata.tp1.WordCount.main(WordCount.java:16)
Looks like you're tying to access your second comandline parameter while only supplying one parameter.
Check the parameter count before accessing those.
I forgot to add the second argument to the program's arguments, which is: src / main / resources / output.
So under Run-> Edit Settings ... → + → Application) I have to add under the Program Arguments src/main/resources/output
If you expect 2 program arguments then you need to check them first.
public static void main(String[] args) throws Exception {
if(args.length != 2){
throw new IllegalArgumentException("Exact 2 arguments needed");
}
//your code here
}

Hadoop Java Class cannot be found

Exception in thread "main" java.lang.ClassNotFoundException:WordCount-> so many answers relate to this issue and it seems like I am definitely missing a small point again which took me hours to figure.
I will try to be as clear as possible about the paths, code itself and other possible solutions I tried and did not work.
I am kinda sure about my correctly configuring Hadoop as everything was working up until the last stage.
But still posting the details:
Environment variables and paths
>
HADOOP VARIABLES START
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_INSTALL=/usr/local/hadoop
export HADOOP_CLASSPATH=/usr/lib/jvm/java-8-oracle/lib/tools.jar
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$JAVA_HOME/bin
#HADOOP VARIABLES END
The Class itself:
package com.cloud.hw03;
/**
* Hello world!
*
*/
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "wordcount");
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setJarByClass(WordCount.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
what I did for compiling and running:
Created jar file in the same folder with my WordCount maven project (eclipse-workspace)
$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf WordCount.jar WordCount*.class
Running the program: (i already created a directory and copied input-output files in hdfs)
hadoop jar WordCount.jar WordCount /input/inputfile01 /input/outputfile01
result is: Exception in thread "main" java.lang.ClassNotFoundException: WordCount
Since I am in the same directory with WordCount.class and i created my jar file in that same directory, i am not specifying full path to WordCount, so i am running the above 2nd command in this directory :
I already added job.setJarByClass(WordCount.class); to the code so no help. I would appreciate your spending time on answering!
I am sure I am doing something unexpected again and cannot figure it out for 4 hrs
The Wordcount example code on the Hadoop site does not use a package
Since you do have one, you would run the fully qualified class. The exact same way as a regular Java application
hadoop jar WordCount.jar com.cloud.hw03.WordCount
Also, if you actually had a Maven project, then hadoop com.sun.tools.javac.Main is not correct. You would actually use Maven to compile and create the JAR with all the classes, not only WordCount* files
For example, from the folder with the pom.xml
mvn package
Otherwise, you need to be in the parent directory
hadoop com.sun.tools.javac.Main ./com/cloud/hw03/WordCount.java
And run the jar cf command also from that directory

Hadoop Mapreduce word count Program

I am new to Hadoop. Below is my code. I am getting the following error message when i run the Jar.
Input file (wordcount.txt) => this file is stored in "/home/cloudera/SK_JAR/jsonFile/wordcount.txt" path
Hello Hadoop, Goodbye Hadoop.
package com.main;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// job.waitForCompletion(true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Following is the error message.Can some one please help me on this?
let me know if you guys need more details..
hadoop jar Wordcount.jar WordCount '/home/cloudera/SK_JAR/jsonFile/wordcount.txt' output
Error in laoding args file.java.io.FileNotFoundException: WordCount (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at com.main.mainClass.main(mainClass.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
hadoop jar Wordcount.jar WordCount
Your main class is part of a package com.maintherefore com.main.WordCount is needed to start your class.
You can open your JAR file as a ZIP file to verify if you can find com/main/WordCount$Map.class within it. If it is not there, then Eclipse is building your JAR wrong.
I might suggest you learn Maven, Gradle, or SBT to build a JAR rather than your IDE. In production, these are the tools commonly used to bundle Hadoop JAR files.
It seems main class file was not mentioned with your runnable jar file.
If you are using Eclipse to create jar file then follow below steps to create Wordcount.jar.
Right click on Project -> Export JAR file -> Click Next - > Uncheck
all other resources. Then provide path for exporting .jar file ->
Click Next -> Keep options selected -> Click Next and Finish. reference

Compilation of Hadoop Java program with additional dependencies

I'm attempting to build a Hadoop program, the purpose of which is to cat files that I've previously uploaded to HDFS, based largely on this tutorial, the program looks like this:
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
public class ReadHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null ;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}
It seems to me that the tutorial is flawed, because- according to my understanding- IOUtils is part of the apache.commons library. However, although I added the following line to the program I've been trying to deploy:
import org.apache.commons.compress.utils.IOUtils;
I'm still met with the following error:
That is:
FileSystemCat.java:37: error: cannot find symbol
IOUtils.copyBytes(in, System.out, 4096, false);
^
symbol: method copyBytes(InputStream,PrintStream,int,boolean)
location: class IOUtils
FileSystemCat.java:40: error: cannot find symbol
IOUtils.closeStream(in);
^
symbol: variable in
location: class FileSystemCat
2 errors
I'm executing it on the NameNode with this command:
javac -cp /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.1.jar:/home/ubuntu/job_program/commons-io-2.5/commons-io-2.5.jar FileSystemCat.java
Necessary appendix to ~/.bashrc:
# Classpath for Java
# export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
How to compule the program at the bottom:
javac -cp ${HADOOP_CLASSPATH}:commons-io-2.5.jar ReaderHDFS.java
How to generate the jar file for that program:
jar cf rhdfs.jar ReaderHDFS*.class
Run the command:
$HADOOP_HOME/bin/hadoop jar rhdfs.jar ReaderHDFS hdfs://master:9000/input_1/codes.txt
This is the program:
import org.apache.hadoop.io.IOUtils;
//import org.apache.commons.io.IOUtils;
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
public class ReaderHDFS {
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null ;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}

Getting NullPointerException while running a normal Mapreduce program in Eclipse

I am getting NullPointerException when trying to execute a simple MapReduce program. I am unable to understand where the problem is?
package MapReduce.HadMapReduce;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class RecCount extends Configured implements Tool {
public int run(String[] arg0) throws Exception {
Job job = Job.getInstance(getConf());
FileInputFormat.setInputPaths(job, new Path("C:\\singledeck.txt"));
FileOutputFormat.setOutputPath(job, new Path("C:\\temp123"));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
System.exit(ToolRunner.run(new RecCount(), args));
}
}
Error is:
Exception in thread "main" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:483)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:731)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:489)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:530)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:507)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at MapReduce.HadMapReduce.RecCount.run(RecCount.java:22)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at MapReduce.HadMapReduce.RecCount.main(RecCount.java:26)
This is the logic which is happening behind the scenes:
ToolRunner is calling the below run method and this method calls its other run method (which is pasted right below this) where the configuration is set if it is null.
public static int run(Tool tool, String[] args) throws Exception {
return run(tool.getConf(), tool, args);
}
public static int run(Configuration conf, Tool tool, String[] args) throws Exception {
if (conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
// set the configuration back, so that Tool can configure itself
tool.setConf(conf);
// get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
In the last statement above run method is called which I implemented because I implemented Tool interface. I don't see any error in my code. Please let me know if you can find any!
Can someone please explain what is the problem with my code?

Categories

Resources