Word count in MapReduce

Word count in MapReduce - java

I have seen threads having resolution to this error but somehow the solutions don't help me.
I am running the code below :
public class WordCount
{
public static class MyMapper extends Mapper<LongWritable,Text,Text,IntWritable>
{
final static IntWritable one=new IntWritable(1);
Text word=new Text();
public void map(LongWritable key,Text Value,Context context) throws IOException,InterruptedException
{
String line=Value.toString();
StringTokenizer itr=new StringTokenizer(line);
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text Key,Iterable<IntWritable> Values,Context context) throws IOException,InterruptedException
{
int sum=0;
for (IntWritable value:Values)
{
sum=sum+value.get();
}
context.write(Key, new IntWritable(sum));}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
{
//Job job=new Job();
Configuration conf=new Configuration();
Job job=new Job(conf,"Word Count");
job.setJarByClass(WordCount.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
}
It has the main method in proper format.
It stil gives me the error below when I'm trying to run it.
**hadoop jar MyTesting-0.0.1-SNAPSHOT.jar MyPackage.WordCount <Input> <Output>**
Exception in thread "main" java.lang.NoSuchMethodException: MyPackage.WordCount.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1670)
at org.apache.hadoop.util.RunJar.run(RunJar.java:215)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Related

Java Hadoop Gengerate no result

I have made a test with Hadoop Map and Reducer with code below:
App.java
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
System.out.println("Words Count Start!");
System.setProperty("hadoop.home.dir", "C:/Users/xx/Meekou/Meekou.hadoop/hadoop-3.2.2.tar/hadoop-3.2.2");
URL url = App.class.getClassLoader().getResource("wordcount.txt");
Path inputPath = new Path(URLDecoder.decode(url.getFile(),"UTF-8") );
Configuration config = new Configuration();
Job job = Job.getInstance(config, "WordsCount");
job.setJarByClass(App.class);
job.setMapperClass(WordsCountMapper.class);
job.setReducerClass(WordsCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, inputPath);
FileOutputFormat.setOutputPath(job, new Path("./result.txt"));
job.waitForCompletion(true);
System.out.println("Words Count Complete!");
}
WordsCountMapper.java
public class WordsCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private static final Log LOG = LogFactory.getLog(WordsCountMapper.class);
#Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String line = value.toString();
LOG.info(line);
String[] words = line.split(",");
for (String word : words) {
LOG.info(word);
context.write(new Text(word), one);
}
// StringTokenizer tokenizer = new StringTokenizer(line);
// while (tokenizer.hasMoreTokens()) {
// word.set(tokenizer.nextToken());
// context.write(word, one);
// }
}
}
WordsCountReducer.java
public class WordsCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
#Override
protected void reduce(Text key, Iterable<IntWritable> words,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int frequencyForWord = 0;
for(IntWritable word: words){
frequencyForWord += word.get();
}
System.out.println(key);
context.write(key, new IntWritable(frequencyForWord));
}
}
And test txt file content is
tom,jack,mary,
rose,anly,billo,anly,
billo,mary,zoor,
zoor,poly,
It generated empty result folder instead of expected result.

Understanding Mapreduce Code

I am trying to practice Big Data Mapreduce by making Movie recommendation System . My code:
*imports
public class MRS {
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context con)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while(token.hasMoreTokens()){
String userId = token.nextToken();
String movieId = token.nextToken();
String ratings =token.nextToken();
token.nextToken();
con.write(new Text(userId), new Text(movieId + "," + ratings));
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, Text> {
public void reduce(Text key, Iterable<Text> value,Context con ) throws IOException, InterruptedException{
int item_count=0;
int item_sum =0;
String result="[";
for(Text t : value){
String s = t.toString();
StringTokenizer token = new StringTokenizer(s,",");
while(token.hasMoreTokens()){
token.nextToken();
item_sum=item_sum+Integer.parseInt(token.nextToken());
item_count++;
}
result=result+"("+s+"),";
}
result=result.substring(0, result.length()-1);
result=result+"]";
result=String.valueOf(item_count)+","+String.valueOf(item_sum)+","+result;
con.write(key, new Text(result));
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration con = new Configuration();
Job job = new Job(con,"Movie Recommendation");
job.setJarByClass(MRS.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I am using the movielens dataset from here
Of which input file is u.data
and my output after running this code should be like
userId Item_count,Item_sum,[list of movie_Id with rating]
However, I am getting this
99 173,4
99 288,4
99 66,3
99 203,4
99 105,2
99 12,5
99 1,4
99 741,3
99 895,3
99 619,4
99 742,5
99 294,4
99 196,4
99 328,4
99 120,2
99 246,3
99 232,4
99 181,5
99 201,3
99 978,3
99 123,3
99 433,4
99 345,3
This should be the output of the Map class

I made few adjustment to the code and it is giving me the exact expected result .
Here is my new code
imports*
public class MRS {
public static class Map extends
Mapper<LongWritable, Text, IntWritable, Text> {
public void map(LongWritable key, Text value, Context con)
throws IOException, InterruptedException {
String line = value.toString();
String[] s = line.split("\t");
StringTokenizer token = new StringTokenizer(line);
while (token.hasMoreTokens()) {
IntWritable userId = new IntWritable(Integer.parseInt(token
.nextToken()));
String movieId = token.nextToken();
String ratings = token.nextToken();
token.nextToken();
con.write(userId, new Text(movieId + "," + ratings));
}
}
}
public static class Reduce extends
Reducer<IntWritable, Text, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<Text> value, Context con)
throws IOException, InterruptedException {
int item_count = 0;
int item_sum = 0;
String result = "";
for (Text t : value) {
String s = t.toString();
StringTokenizer token = new StringTokenizer(s, ",");
result = result + "[" + s + "],";
}
result = result.substring(1, result.length() - 2);
System.out.println(result);
con.write(key, new Text(result));
}
}
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration con = new Configuration();
Job job = new Job(con, "Movie Recommendation");
job.setJarByClass(MRS.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
what I change is
Driver code
job.setOutputKeyClass(IntWritable.class);
Mapper code
Mapper<LongWritable, Text, IntWritable, Text>
Reducer code
public static class Reduce extends
Reducer<Text, IntWritable, Text, Text> {
public void reduce(Text key, Iterable<Text> value,Context con ) throws
IOException, InterruptedException{
I think the problem was that the outputkey and outputvalue data is matching the mapper class thats why it is printing mapper and not even executng reducer
Correct me if I am wrong.

Why the job chaining not working in mapreduce?

I create two jobs, and I want to chain them, so that one job is executed just after the previous job is complete. So I wrote the following code. But as I have observed job1 finished correctly, and job2 never seems to execute.
public class Simpletask extends Configured implements Tool {
public static enum FileCounters {
COUNT;
}
public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String line = itr.nextToken();
String part[] = line.split(",");
int id = Integer.valueOf(part[0]);
int x1 = Integer.valueOf(part[1]);
int y1 = Integer.valueOf(part[2]);
int z1 = Integer.valueOf(part[3]);
int x2 = Integer.valueOf(part[4]);
int y2 = Integer.valueOf(part[5]);
int z2 = Integer.valueOf(part[6]);
int h_v = Hilbert(x1,y1,z1);
int parti = h_v/10;
IntWritable partition = new IntWritable(parti);
Text neuron = new Text();
neuron.set(line);
context.write(partition,neuron);
}
}
public int Hilbert(int x,int y,int z){
return (int) (Math.random()*20);
}
}
public static class IntSumReducer extends Reducer<IntWritable,Text,IntWritable,Text> {
private Text result = new Text();
private MultipleOutputs<IntWritable, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs<IntWritable, Text>(context);
}
<K, V> String generateFileName(K k) {
return "p"+k.toString();
}
public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
String accu = "";
for (Text val : values) {
String[] entry=val.toString().split(",");
String MBR = entry[1];
accu+=entry[0]+",MBR"+MBR+" ";
}
result.set(accu);
context.getCounter(FileCounters.COUNT).increment(1);
mos.write(key, result, generateFileName(key));
}
}
public static class RTreeMapper extends Mapper<Object, Text, IntWritable, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
System.out.println("WOWOWOWOW RUNNING");// NOTHING SHOWS UP!
}
}
public static class RTreeReducer extends Reducer<IntWritable,Text,IntWritable,Text> {
private MultipleOutputs<IntWritable, Text> mos;
Text t = new Text();
public void setup(Context context) {
mos = new MultipleOutputs<IntWritable, Text>(context);
}
public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
t.set("dsfs");
mos.write(key, t, "WOWOWOWOWOW"+key.get());
//ALSO, NOTHING IS WRITTEN TO THE FILE!!!!!
}
}
public static class RTreeInputFormat extends TextInputFormat{
protected boolean isSplitable(FileSystem fs, Path file) {
return false;
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Enter valid number of arguments <Inputdirectory> <Outputlocation>");
System.exit(0);
}
ToolRunner.run(new Configuration(), new Simpletask(), args);
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Job1");
job.setJarByClass(Simpletask.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
boolean complete = job.waitForCompletion(true);
//================RTree Loop============
int capacity = 3;
Configuration rconf = new Configuration();
Job rtreejob = Job.getInstance(rconf, "rtree");
if(complete){
int count = (int) job.getCounters().findCounter(FileCounters.COUNT).getValue();
System.out.println("File count: "+count);
String path = null;
for(int i=0;i<count;i++){
path = "/Worker/p"+i+"-m-00000";
System.out.println("Add input path: "+path);
FileInputFormat.addInputPath(rtreejob, new Path(path));
}
System.out.println("Input path done.");
FileOutputFormat.setOutputPath(rtreejob, new Path("/RTree"));
rtreejob.setJarByClass(Simpletask.class);
rtreejob.setMapperClass(RTreeMapper.class);
rtreejob.setCombinerClass(RTreeReducer.class);
rtreejob.setReducerClass(RTreeReducer.class);
rtreejob.setOutputKeyClass(IntWritable.class);
rtreejob.setOutputValueClass(Text.class);
rtreejob.setInputFormatClass(RTreeInputFormat.class);
complete = rtreejob.waitForCompletion(true);
}
return 0;
}
}

For a mapreduce job, the output directory should not exists. It will check for the output directory first. If it is exists, the job will fail. In your case, you specified the same output directory for both the jobs. I modified your code. I changed the args[1] to args[2] in the job2. Now the third argument will be the output directory of second job. So pass a third argument also.
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Job1");
job.setJarByClass(Simpletask.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//AND THEN I WAIT THIS JOB TO COMPLETE.
boolean complete = job.waitForCompletion(true);
//I START A NEW JOB, BUT WHY IS IT NOT RUNNING?
Configuration conf = new Configuration();
Job job2 = Job.getInstance(conf, "Job2");
job2.setJarByClass(Simpletask.class);
job2.setMapperClass(TokenizerMapper.class);
job2.setCombinerClass(IntSumReducer.class);
job2.setReducerClass(IntSumReducer.class);
job2.setOutputKeyClass(IntWritable.class);
job2.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job2, new Path(args[0]));
FileOutputFormat.setOutputPath(job2, new Path(args[2]));

A few possible causes of errors:
conf is declared twice (no compile error there?)
The output path of job2 already exists, as it was created from job1 (+1 to Amal G Jose's answer)
I think you should also use job.setMapOutputKeyClass(Text.class); and job.setMapOutputValueClass(IntWritable.class); for both jobs.
Do you also have a command to execute job2 after the code snippet that you posted? I mean, do you actually run job2.waitForCompletion(true);, or something similar to that?
Overall: check the logs for error messages, which should clearly explain what went wrong.

Error - Hadoop Word Count Program in MapReduce

I am new to Hadoop to so pardon me if this looks like silly question.
I am running my below MapReduce program and getting the following error:
java.lang.Exception: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
Any help is appreciated.
public class WordCount {
// Mapper Class
public static class MapperClass extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
// Mapper method defined
public void mapperMethod(Object key,Text lineContent,Context context){
try{
StringTokenizer strToken = new StringTokenizer(lineContent.toString());
//Iterating through the line
while(strToken.hasMoreTokens()){
word.set(strToken.nextToken());
try{
context.write(word, one);
}
catch(Exception e){
System.err.println(new Date()+" ---> Cannot write data to hadoop in Mapper.");
e.printStackTrace();
}
}
}
catch(Exception ex){
ex.printStackTrace();
}
}
}
// Reducer Class
public static class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable result = new IntWritable();
//Reducer method
public void reduce(Text key,Iterable<IntWritable> values,Context context){
try{
int sum=0;
for(IntWritable itr : values){
sum+=itr.get();
}
result.set(sum);
try {
context.write(key,result);
} catch (Exception e) {
System.err.println(new Date()+" ---> Error while sending data to Hadoop in Reducer");
e.printStackTrace();
}
}
catch (Exception err){
err.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
try{
Configuration conf = new Configuration();
String [] arguments = new GenericOptionsParser(conf, args).getRemainingArgs();
if(arguments.length!=2){
System.err.println("Enter both and input and output location.");
System.exit(1);
}
Job job = new Job(conf,"Simple Word Count");
job.setJarByClass(WordCount.class);
job.setMapperClass(MapperClass.class);
job.setReducerClass(ReducerClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(arguments[0]));
FileOutputFormat.setOutputPath(job, new Path(arguments[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
catch(Exception e){
}
}
}

You need to override Map method in the Mapper Class instead you have a new method.
Comming to your error, as you dont have map method overridden your program boils down to a reduce only job. Reducer is getting input as LongWritable,Text but you have declared Intwritable and text as input.
Hope this explains.
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

how to custom select column reading for hadoop input in java for map reducer job

New to Hadoop and I'm trying to understand how Hadoop read file input : I am able to use this code below to run Hadoop job from 2 column ( key / value ) input file :
But what if I have 5 columns and the ( key /value ) i want is A&E ( instead of A&B) which function do I need to modify exactly ?
public class InverterCounter extends Configured implements Tool {
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
public void map(Text key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
output.collect(value, key);
}
}
public static class Reduce extends MapReduceBase
implements Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int count = 0;
while (values.hasNext()) {
values.next();
count++;
}
output.collect(key, new IntWritable(count));
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, InverterCounter.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("InverterCounter");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new InverterCounter(), args);
System.exit(res);
}
}
Any recommendation would appreciated, I was trying to change job.set("key.value.separator.in.input.line", ","); and job.setInputFormat(KeyValueTextInputFormat.class); with no luck still could not figure this out.
Thanks

KeyValueTextInputFormat assumes that the key is at the start of each line, so it isn't applicable for your 6 column data set.
Instead, you can use TextInputFormat and extract the key and value yourself. I'm assuming all values in the line are separated by commas (and that there are no commas in the data, which is another story).
With TextInputFormat you receive the full line in value, and the position of the line in the file in key. We don't need the position so we will ignore it. With the full line in a single Text we can turn it into a String, split it by commas, and derive the key and value to emit:
public class InverterCounter extends Configured implements Tool {
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
public void map(Text key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String[] lineFields = value.toString().split(",");
Text outputKey = new Text(lineFields[0] + "," + lineFields[4]);
Text outputValue = new Text(lineFields[1] + "," + lineFields[2] + "," +
lineFields[3] + "," + lineFields[5]);
output.collect(outputKey, outputValue);
}
}
public static class Reduce extends MapReduceBase
implements Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int count = 0;
while (values.hasNext()) {
values.next();
count++;
}
output.collect(key, new IntWritable(count));
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, InverterCounter.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("InverterCounter");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new InverterCounter(), args);
System.exit(res);
}
}
I haven't had a chance to test this, so there may be small bugs. You would probably want to rename the class because it is no longer inverting anything. Finally, the value has been sent to the reducer but it isn't being used, so you could just as easily send a NullWritable instead.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Word count in MapReduce - java

Related

Java Hadoop Gengerate no result

Understanding Mapreduce Code

Why the job chaining not working in mapreduce?

Error - Hadoop Word Count Program in MapReduce

how to custom select column reading for hadoop input in java for map reducer job

Categories

Resources