Configuration object is null in hadoop mapper - java

I have the following mapper class. I want to write to hdfs in my mapper function. So I need acces to configuration object which I am retrieving in the setup() method. However it is being returned as null and I am getting a NPE. Can you let me know what am I doing wrong ?
Here is the stacktrace
hduser#nikhil-VirtualBox:/usr/local/hadoop/hadoop-1.0.4$ bin/hadoop jar GWASMapReduce.jar /user/hduser/tet.gpg /user/hduser/output3
12/11/04 08:50:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/11/04 08:50:24 INFO mapred.FileInputFormat: Total input paths to process : 1
12/11/04 08:50:28 INFO mapred.JobClient: Running job: job_201211031924_0008
12/11/04 08:50:29 INFO mapred.JobClient: map 0% reduce 0%
12/11/04 08:51:35 INFO mapred.JobClient: Task Id : attempt_201211031924_0008_m_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:131)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at com.test.GWASMapper.writeCsvFileSmry(GWASMapper.java:208)
at com.test.GWASMapper.checkForNulls(GWASMapper.java:153)
at com.test.GWASMapper.map(GWASMapper.java:51)
at com.test.GWASMapper.map(GWASMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201211031924_0008_m_000000_0: ******************************************************************************************************************************************************************************************
attempt_201211031924_0008_m_000000_0: null
attempt_201211031924_0008_m_000000_0: ******************************************************************************************************************************************************************************************
12/11/04 08:51:37 INFO mapred.JobClient: Task Id : attempt_201211031924_0008_m_000001_0, Status : FAILED
Here is my driver class
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class GWASMapReduce extends Configured implements Tool{
/**
* #param args
*/
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
ToolRunner.run(configuration, new GWASMapReduce(), args);
}
#Override
public int run(String[] arg0) throws Exception {
JobConf conf = new JobConf();
conf.setInputFormat(GWASInputFormat.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setJarByClass(GWASMapReduce.class);
conf.setMapperClass(GWASMapper.class);
conf.setNumReduceTasks(0);
FileInputFormat.addInputPath(conf, new Path(arg0[0]));
FileOutputFormat.setOutputPath(conf, new Path(arg0[1]));
JobClient.runJob(conf);
return 0;
}
}
Mapper Class
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import com.google.common.base.Strings;
public class GWASMapper extends MapReduceBase implements Mapper<LongWritable, GWASGenotypeBean, Text, Text> {
private static Configuration conf;
#SuppressWarnings("rawtypes")
public void setup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException {
conf = context.getConfiguration();
// conf is null here
}
#Override
public void map(LongWritable inputKey, GWASGenotypeBean inputValue, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
// mapper code
}
}

I think u r missing this
JobClient jobClient = new JobClient();
client.setConf(conf);
JobClient.runJob(conf);
The conf parameter is not passed to the jobclient. Try this and see if it helps
And I suggest to use the new mapreduce librbary. check the v2.0 for the word count
http://hadoop.apache.org/docs/mapreduce/r0.22.0/mapred_tutorial.html#Example%3A+WordCount+v2.0
And also
try this JobConf job = new JobConf(new Configuration());
I think the configuration object is not initialized here.
And moreover u dont have anything special in the configuration object, so u can initialized the configuration object in the mapper also though which is not a good practice just for trying out

This is just a tip for others facing similar issue:
please make sure that you set the values first and declare a job.
For example:
Configuration conf = new Configuration();
conf.set("a","2");
conf.set("inputpath",args[0]);
//Must be set before the below line:
Job myjob = new Job(conf);
Hope this helps.

Related

Hadoop Jar runs but no output. Driver, mapper and reduce compiles successfully in namenode

I'm a newbie to Hadoop Programming and I have started learning by setting up Hadoop 2.7.1 on a three node cluster. I have tried running helloworld jars that comes out of the box in Hadoop and it ran fine with success but I wrote my own driver code in my local machine and bundled it into a jar and executed it this way but it fails with NO error messages.
Here is my code and this is what I did.
WordCountMapper.java
package mot.com.bin.test;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text Value,
OutputCollector<Text, IntWritable> opc, Reporter r)
throws IOException {
String s = Value.toString();
for (String word :s.split(" ")) {
if( word.length() > 0) {
opc.collect(new Text(word), new IntWritable(1));
}
}
}
}
WordCountReduce.java
package mot.com.bin.test;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WordCountReduce extends MapReduceBase implements Reducer < Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> opc, Reporter r)
throws IOException {
// TODO Auto-generated method stub
int i = 0;
while (values.hasNext()) {
IntWritable in = values.next();
i+=in.get();
}
opc.collect(key, new IntWritable (i));
}
WordCount.java
/**
* **DRIVER**
*/
package mot.com.bin.test;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.io.Text;
//import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;
/**
* #author rgb764
*
*/
public class WordCount extends Configured implements Tool{
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
}
public int run(String[] arg0) throws Exception {
if (arg0.length < 2) {
System.out.println("Need input file and output directory");
return -1;
}
JobConf conf = new JobConf();
FileInputFormat.setInputPaths(conf, new Path( arg0[0]));
FileOutputFormat.setOutputPath(conf, new Path( arg0[1]));
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(WordCountMapper.class);
conf.setReducerClass(WordCountReduce.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
}
First I tried extracting it as a jar from eclipse and run it in my hadoop cluster. No errors yet no success as well. Then moved my individual java files to my NameNode and compiled each java files and then created the jar file there, still hadoop command returns no results but no errors as well. Kindly help me on this.
hadoop jar WordCout.jar mot.com.bin.test.WordCount /karthik/mytext.txt /tempo
Extracted all dependent jar files using Maven and I added them into the classpath in my name node. Help me figure what and where am I going wrong.
IMO you are missing the code in your main method which instantiate the Tool implementation ( WordCount in your case) and runs the same.
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}
Refer this.

Error while importing data from MongoDb to Hdfs

I'm trying to import documents of a collection in MongoDb to HDFS through MapReduce job. I am using old Api. This is the driver code
package my.pac;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.mapred.MongoInputFormat;
import com.mongodb.hadoop.util.MongoConfigUtil;
public class ImportDriver extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new ImportDriver(), args);
System.exit(exitCode);
}
#Override
public int run(String[] args) throws Exception {
JobConf conf = new JobConf();
MongoConfigUtil.setInputURI(conf,"mongodb://127.0.0.1:27017/SampleDb.shows");
conf.setJarByClass(ImportDriver.class);
conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/core-site.xml"));
conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/hdfs-site.xml"));
FileOutputFormat.setOutputPath(conf, new Path(args[0]));
conf.setInputFormat(MongoInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setMapperClass(ImportMapper.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputKeyClass(Text.class);
JobClient.runJob(conf);
return 0;
}
}
This is my Mapper Code:
package my.pac;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.bson.BSONObject;
import com.mongodb.hadoop.io.BSONWritable;
public class ImportMapper extends MapReduceBase implements Mapper<BSONWritable, BSONWritable, Text, Text>{
#Override
public void map(BSONWritable key, BSONWritable value,
OutputCollector<Text, Text> o, Reporter arg3)
throws IOException {
String val = ((BSONObject) value).get("_id").toString();
System.out.println(val);
o.collect( new Text(val), new Text(val));
}
}
I am using
Ubuntu-14.0
Hadoop-1.2.1
MongoDb-3.0.4
I have added the following jars:
mongo-2.9.3.jar
mongo-hadoop-core-1.3.0.jar
mongo-java-driver-2.13.2.jar
When I run this, I am getting an error like this :
java.lang.Exception: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
at my.pac.ImportMapper.map(ImportMapper.java:18)
at my.pac.ImportMapper.map(ImportMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
How can I rectify this?
You may have an outdated driver in your classpath that's causing the conflict in read preference settings.
See below links for similar issues:
https://jira.mongodb.org/browse/JAVA-849
https://serverfault.com/questions/268953/mongodb-java-r2-5-3-nosuchmethoderror-on-dbcollection-savedbobject-in-tomca
If that doesn't help,
https://jira.talendforge.org/browse/TBD-1002
suggests you may need to re-run MongoDB or use a separate connection.
Apparently all the jars I used are correct. The way I tried getting data out of BSONWritable was wrong. I tried to cast BSONWritable to BSONObject, which cannot be casted. Here is how I solved the problem.
String name = (String)value.getDoc().get("name");

When to use NLineInputFormat in Hadoop Map-Reduce?

I have a Text based input file of size around 25 GB. And in that file one single record consists of 4 lines. And the processing for every record is the same. But inside every record,each of the four lines are processed differently.
I'm new to Hadoop so I wanted a guidance that whether to use NLineInputFormat in this situation or use the default TextInputFormat ? Thanks in advance !
Assuming you have the text file in the following format :
2015-8-02
error2014 blahblahblahblah
2015-8-02
blahblahbalh error2014
You could use NLineInputFormat.
With NLineInputFormat functionality, you can specify exactly how many lines should go to a mapper.
In your case you can use to input 4 lines per mapper.
EDIT:
Here is an example for using NLineInputFormat:
Mapper Class:
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MapperNLine extends Mapper<LongWritable, Text, LongWritable, Text> {
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
}
Driver class:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.NLineInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class Driver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.out
.printf("Two parameters are required for DriverNLineInputFormat- <input dir> <output dir>\n");
return -1;
}
Job job = new Job(getConf());
job.setJobName("NLineInputFormat example");
job.setJarByClass(Driver.class);
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.addInputPath(job, new Path(args[0]));
job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 4);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MapperNLine.class);
job.setNumReduceTasks(0);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Configuration(), new Driver(), args);
System.exit(exitCode);
}
}

Eclipse - Run on Hadoop does not prompt anything

Im trying to build a simple Wordcount Hadoop project(https://developer.yahoo.com/hadoop/tutorial/module3.html#running) but when I click "Run on Hadoop" there is no action at all...Infact nothing is displayed in the console.
Here is my project structure -
Here is my wordcount job file...
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
public class WordCount {
public static void main(String[] args) {
Configuration config = new Configuration();
config.addResource(new Path("/HADOOP_HOME/conf/hadoop-default.xml"));
config.addResource(new Path("/HADOOP_HOME/conf/hadoop-site.xml"));
JobClient client = new JobClient();
JobConf conf = new JobConf(WordCount.class);
// specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// specify input and output dirs
FileInputPath.addInputPath(conf, new Path("input"));
FileOutputPath.addOutputPath(conf, new Path("output"));
// specify a mapper
conf.setMapperClass(WordCountMapper.class);
// specify a reducer
conf.setReducerClass(WordCountReducer.class);
conf.setCombinerClass(WordCountReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
I think the problem is with the jar file you used for hadoop client to server, So What happens is it tries to stay in the following line and try to search for the server
Configuration config = new Configuration();
Try to debug and let us know if you face any more problem,
If not try the following
Have you tried running the program on eclipse by pointing the core-site , hdfs-site
Configuration.addResource(new Path("path-to-your-core-site.xml file"));
Configuration.addResource(new Path("path-to-your-hdfs-site.xml file"));
and
FileInputPath.addInputPath(hdfs path to your input file);
FileInputPath.addOutputPath(hdfs path to your output file);
See that It works and get back to us
try this,
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<Object, Text, Text, IntWritable> {
#Override
public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
System.out.println(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
output.collect(value, new IntWritable(1));
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception,IOException {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("WordCount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("/home/user17/test.txt"));
FileOutputFormat.setOutputPath(conf, new Path("hdfs://localhost:9000/out2"));
JobClient.runJob(conf);
}
}
I had the exact same problem, and I have just figured it out.
Add parameters in the run configurations.
Right Click WordCount.java > Run As > Run Configurations > Java Application > Word Count > Arguments
Enter this hdfs://hadoop:9000/ hdfs://hadoop:9000/
Apply and Finish Run again.
After running, refresh the project and the result is in the output folder.

Map Reduce: Wordcount don't make anything

I want to write my own word count example using MapReduce and hadoop v. 1.0.3 (I'm on MacOS) but i don't understand why it doesn't work
Sharing my code :
main:
package org.myorg;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
// set job name, mapper, combiner, and reducer classes
conf.setJobName("WordCount");
// set input and output formats
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
// set input and output paths
//FileInputFormat. setInputPaths(conf, new Path(input));
//FileOutputFormat.setOutputPath(conf, new Path(output));
FileOutputFormat.setCompressOutput(conf, false);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(org.myorg.Map.class);
conf.setReducerClass(org.myorg.Reduce.class);
String host = args[0];
String input = host + "/" + args[1];
String output = host + "/" + args[2];
// set input and output paths
FileInputFormat.addInputPath(conf, new Path(input));
FileOutputFormat.setOutputPath(conf, new Path(output));
JobClient j=new JobClient(conf);
(j.submitJob(conf)).waitForCompletion();
}
}
Mapper:
package org.myorg;
import java.io.IOException;
import java.util.HashMap;
import java.util.StringTokenizer;
import java.util.TreeMap;
import java.util.Vector;
import java.util.Map.Entry;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
MapWritable hs = new MapWritable();
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
//hs.put(word, one);
output.collect(word,one);
}
// TODO Auto-generated method stub
}
}
Reducer:
package org.myorg;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.math.RoundingMode;
import java.net.URI;
import java.net.URISyntaxException;
import java.text.DecimalFormat;
import java.text.DecimalFormatSymbols;
import java.text.NumberFormat;
import java.text.ParseException;
import java.util.Iterator;
import java.util.Map;
import java.util.TreeMap;
import java.util.Map.Entry;
import java.util.Vector;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Reducer.Context;
//public class Reduce extends MapReduceBase implements Reducer<Text, MapWritable, Text, Text> {
public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, Text> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
String host = "hdfs://localhost:54310/";
String tmp = host + "Temporany/output.txt";
FileSystem srcFS;
try {
srcFS = FileSystem.get(new URI(tmp), new JobConf());
srcFS.delete(new Path(tmp), true);
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(
srcFS.create(new Path(tmp))));
wr.write(key.toString() + ":" + sum);
wr.close();
srcFS.close();
} catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// context.write(key, new IntWritable(sum));
}
}
The Job started and end with no error but didn't write the output file.
I launch the jar with Hadoop with this command:
./Hadoop jar /Users/User/Desktop/hadoop/wordcount.jar hdfs://localhost:54310 /In/testo.txt /Out/wordcount17
This is the output:
2014-03-03 17:56:22.063 java[6365:1203] Unable to load realm info from SCDynamicStore
14/03/03 17:56:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/03 17:56:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/03 17:56:23 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/03 17:56:23 INFO mapred.FileInputFormat: Total input paths to process : 1
I suppose the problem is "unable to load native-hadoop library" but works fine for other Jar's.
Q : The Job start and end with no error but don't write the output-file ??
Ans : I am not sure that job successfully ends w/o error .
Problems :
Job Configuration:
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
The setOutputKeyClass() and setOutputValueClass() methods control the
output types for the map and the reduce functions, which are often the
same. If they are different, then the map
output types can be set using the methods setMapOutputKeyClass() and
setMapOutputValueClass().
The Output class in your case is :
Map key : Text
Map Value : IntWritable
Reduce key : Text
Reduce Value : Text
Which will result in Type mismatch exception
Reduce
I am not sure why you are using hdfs API to write your output to a file ?.
Should use output.collect(key,value).
In case of multiple reducer are you handling the simultaneous write operation?
And I wonder what context.write is doing in old apis (it's commented )? .
You can use following for more information
Debug your map-reduce Job :
Counters
Jobtracker web interface
Q. Difference b/w SubmitJob() && waitForCompletion() ?
Ans : SubmitJob() : submits the job and ends .
waitForCompletion() : submits the job and prints the status of the job on console .
So waitForCompletion() is SubmitJob()+status update of job until complete.
word count
Please read
Map Reduce Apache
You can also find hadoop-examples-X.X.X.jar in your installation folder.
Go through $HADOOP_HOME/src/expalmes/ for source code .
**$HADOOP_HOME = hadoop installation folder

Categories

Resources