I'm trying to import documents of a collection in MongoDb to HDFS through MapReduce job. I am using old Api. This is the driver code
package my.pac;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.mapred.MongoInputFormat;
import com.mongodb.hadoop.util.MongoConfigUtil;
public class ImportDriver extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new ImportDriver(), args);
System.exit(exitCode);
}
#Override
public int run(String[] args) throws Exception {
JobConf conf = new JobConf();
MongoConfigUtil.setInputURI(conf,"mongodb://127.0.0.1:27017/SampleDb.shows");
conf.setJarByClass(ImportDriver.class);
conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/core-site.xml"));
conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/hdfs-site.xml"));
FileOutputFormat.setOutputPath(conf, new Path(args[0]));
conf.setInputFormat(MongoInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setMapperClass(ImportMapper.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputKeyClass(Text.class);
JobClient.runJob(conf);
return 0;
}
}
This is my Mapper Code:
package my.pac;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.bson.BSONObject;
import com.mongodb.hadoop.io.BSONWritable;
public class ImportMapper extends MapReduceBase implements Mapper<BSONWritable, BSONWritable, Text, Text>{
#Override
public void map(BSONWritable key, BSONWritable value,
OutputCollector<Text, Text> o, Reporter arg3)
throws IOException {
String val = ((BSONObject) value).get("_id").toString();
System.out.println(val);
o.collect( new Text(val), new Text(val));
}
}
I am using
Ubuntu-14.0
Hadoop-1.2.1
MongoDb-3.0.4
I have added the following jars:
mongo-2.9.3.jar
mongo-hadoop-core-1.3.0.jar
mongo-java-driver-2.13.2.jar
When I run this, I am getting an error like this :
java.lang.Exception: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
at my.pac.ImportMapper.map(ImportMapper.java:18)
at my.pac.ImportMapper.map(ImportMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
How can I rectify this?
You may have an outdated driver in your classpath that's causing the conflict in read preference settings.
See below links for similar issues:
https://jira.mongodb.org/browse/JAVA-849
https://serverfault.com/questions/268953/mongodb-java-r2-5-3-nosuchmethoderror-on-dbcollection-savedbobject-in-tomca
If that doesn't help,
https://jira.talendforge.org/browse/TBD-1002
suggests you may need to re-run MongoDB or use a separate connection.
Apparently all the jars I used are correct. The way I tried getting data out of BSONWritable was wrong. I tried to cast BSONWritable to BSONObject, which cannot be casted. Here is how I solved the problem.
String name = (String)value.getDoc().get("name");
Related
I am new and try to run my first hadoop program. and I am facing some problem when I execute my wordcount job in hadoop.
WordCount.java
package hdp;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WordCount extends Configured implements Tool{
public static void main(String[] args) throws Exception {
System.out.println("application starting ....");
int exitCode = ToolRunner.run(new WordCount(), args);
System.out.println(exitCode);
}
#Override
public int run(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Plz enter input and output directory properly... ");
return -1;
}
JobConf conf = new JobConf(WordCount.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(WordMapper.class);
conf.setReducerClass(WordReducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputKeyClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
#Override
public Configuration getConf() {
return null;
}
#Override
public void setConf(Configuration arg0) {
}
}
WordMapper.java
package hdp;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
#Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> collect, Reporter reporter) throws IOException {
String str = value.toString();
for (String s : str.split(" ")) {
if (s.length() > 0) {
collect.collect(new Text(s), new IntWritable(1));
}
}
}
}
WordReducer
package hdp;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WordReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int count = 0;
while (values.hasNext()) {
IntWritable intWritable = values.next();
count += intWritable.get();
}
output.collect(key, new IntWritable(count));
}
}
When I run my program then I get following error message.
16/12/23 00:22:41 INFO mapreduce.Job: Task Id : attempt_1482432671993_0001_m_000001_1, Status : FAILED
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.IntWritable, received org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1072)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
at hdp.WordMapper.map(WordMapper.java:19)
at hdp.WordMapper.map(WordMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
16/12/23 00:22:47 INFO mapreduce.Job: Task Id : attempt_1482432671993_0001_m_000000_2, Status : FAILED
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.IntWritable, received org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1072)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
at hdp.WordMapper.map(WordMapper.java:19)
at hdp.WordMapper.map(WordMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Please tell me where i was wrong and what kind of changes i require. Either in WordCount.java or WordReducer or in WordMapper.java
You accidentally set the output key class twice:
conf.setMapOutputKeyClass(IntWritable.class);
Should become
conf.setMapOutputValueClass(IntWritable.class);
I'm a newbie to Hadoop Programming and I have started learning by setting up Hadoop 2.7.1 on a three node cluster. I have tried running helloworld jars that comes out of the box in Hadoop and it ran fine with success but I wrote my own driver code in my local machine and bundled it into a jar and executed it this way but it fails with NO error messages.
Here is my code and this is what I did.
WordCountMapper.java
package mot.com.bin.test;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text Value,
OutputCollector<Text, IntWritable> opc, Reporter r)
throws IOException {
String s = Value.toString();
for (String word :s.split(" ")) {
if( word.length() > 0) {
opc.collect(new Text(word), new IntWritable(1));
}
}
}
}
WordCountReduce.java
package mot.com.bin.test;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WordCountReduce extends MapReduceBase implements Reducer < Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> opc, Reporter r)
throws IOException {
// TODO Auto-generated method stub
int i = 0;
while (values.hasNext()) {
IntWritable in = values.next();
i+=in.get();
}
opc.collect(key, new IntWritable (i));
}
WordCount.java
/**
* **DRIVER**
*/
package mot.com.bin.test;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.io.Text;
//import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;
/**
* #author rgb764
*
*/
public class WordCount extends Configured implements Tool{
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
}
public int run(String[] arg0) throws Exception {
if (arg0.length < 2) {
System.out.println("Need input file and output directory");
return -1;
}
JobConf conf = new JobConf();
FileInputFormat.setInputPaths(conf, new Path( arg0[0]));
FileOutputFormat.setOutputPath(conf, new Path( arg0[1]));
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(WordCountMapper.class);
conf.setReducerClass(WordCountReduce.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
}
First I tried extracting it as a jar from eclipse and run it in my hadoop cluster. No errors yet no success as well. Then moved my individual java files to my NameNode and compiled each java files and then created the jar file there, still hadoop command returns no results but no errors as well. Kindly help me on this.
hadoop jar WordCout.jar mot.com.bin.test.WordCount /karthik/mytext.txt /tempo
Extracted all dependent jar files using Maven and I added them into the classpath in my name node. Help me figure what and where am I going wrong.
IMO you are missing the code in your main method which instantiate the Tool implementation ( WordCount in your case) and runs the same.
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}
Refer this.
I am trying to run the mapreduce job show below. But, its giving me ClassNotFound exception, even though this inner class is present in the jar. Can anyone give hint?
package com.example;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
public class Example {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setReducerClass(Example.ReduceTask.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
public static class ReduceTask
extends Reducer<LongWritable, Text, Text, Text> {
public void reduce(LongWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text value: values) {
String[] cols = value.toString().split(",");
context.write(new Text(cols[0]), value);
break;
}
}
}
}
java.lang.RuntimeException: java.lang.ClassNotFoundException: com.example.Example$ReduceTask
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
at org.apache.hadoop.mapreduce.JobContext.getReducerClass(JobContext.java:236)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:556)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
I am running it via command:
hadoop jar PracticeHadoop.jar com.example.Example workspace/input workspace/op
I write a mapreduce class and create a jar file from the class. now I want to use this jar in another java program.
can anyone help me please how could I do this?
thanks
here is my MapReduce Program:
package org.apache.cassandra.com;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.cassandra.hadoop.ConfigHelper;
import org.apache.cassandra.hadoop.cql3.CqlConfigHelper;
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat;
import org.apache.cassandra.utils.ByteBufferUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class CassandraSumLib extends Configured implements Tool
{
public CassandraSumLib(){
}
static final String KEYSPACE = "weather";
static final String COLUMN_FAMILY = "momentinfo1";
static final String OUTPUT_PATH = "/tmp/OutPut";
private static final Logger logger = LoggerFactory.getLogger(CassandraSum.class);
public int CassandraSum(String[] args) throws Exception
{
return ToolRunner.run(new Configuration(), new CassandraSumLib(), args);
}
///////////////////////////////////////////////////////////
public static class Summap extends Mapper<Map<String, ByteBuffer>, Map<FloatWritable, ByteBuffer>, Text, DoubleWritable>
{
Text word = new Text("SUM");
float temp;
public void map(Map<String, ByteBuffer> keys, Map<FloatWritable, ByteBuffer> columns, Context context) throws IOException, InterruptedException
{
for (Entry<FloatWritable, ByteBuffer> column : columns.entrySet())
{
if (!"column".equals(column.getKey()))
continue;
temp = ByteBufferUtil.toFloat(column.getValue());
//System.out.println(temp);
context.write(word, new DoubleWritable(temp));
//System.out.println(word + " " + temp);
}
}
}
///////////////////////////////////////////////////////////
public static class Sumred extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException
{
Double sum = 0.0;
for (DoubleWritable val : values){
// System.out.println(val.get());
sum += val.get();}
context.write(key, new DoubleWritable(sum));
}
}
///////////////////////////////////////////////////////////
public int run(String[] args) throws Exception
{
Job job = new Job(getConf(), "SUM");
job.setJarByClass(CassandraSum.class);
job.setMapperClass(Summap.class);
JobConf conf = new JobConf( getConf(), CassandraSum.class);
// conf.setNumMapTasks(1000);
// conf.setNumReduceTasks(900);
job.setOutputFormatClass(TextOutputFormat.class);
job.setCombinerClass(Sumred.class);
job.setReducerClass(Sumred.class);
job.setOutputKeyClass(Text.class);
job.setNumReduceTasks(900);
job.setOutputValueClass(DoubleWritable.class);
FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
job.setInputFormatClass(CqlPagingInputFormat.class);
ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setInputInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY);
ConfigHelper.setInputPartitioner(job.getConfiguration(), "Murmur3Partitioner");
CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), "3");
job.waitForCompletion(true);
return 0;
}
}
I want to call this class from another program. here is my second program that call my firs program:
package org.apache.cassandra.com;
import java.util;
import org.apache.hadoop.util.RunJar;
import org.apache.cassandra.com.CassandraSumLib;
public class CassandraSum {
public static void main(String[] args) throws Exception{
CassandraSumLib CSL = new CassandraSumLib();
CSL.??? (which method should I write here?)
}
}
thanks
Steps to add jar file in eclipse
1. right click on project
2. click on Bulid Path->configure path
3. click on java Build path
4. Click on libraries tab
5. click on add external jar tab
6. choose jar file
7. click on ok
Add the jar to class path of the second program. If you are compiling/running from command line, use -cp option.
I have the following mapper class. I want to write to hdfs in my mapper function. So I need acces to configuration object which I am retrieving in the setup() method. However it is being returned as null and I am getting a NPE. Can you let me know what am I doing wrong ?
Here is the stacktrace
hduser#nikhil-VirtualBox:/usr/local/hadoop/hadoop-1.0.4$ bin/hadoop jar GWASMapReduce.jar /user/hduser/tet.gpg /user/hduser/output3
12/11/04 08:50:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/11/04 08:50:24 INFO mapred.FileInputFormat: Total input paths to process : 1
12/11/04 08:50:28 INFO mapred.JobClient: Running job: job_201211031924_0008
12/11/04 08:50:29 INFO mapred.JobClient: map 0% reduce 0%
12/11/04 08:51:35 INFO mapred.JobClient: Task Id : attempt_201211031924_0008_m_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:131)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at com.test.GWASMapper.writeCsvFileSmry(GWASMapper.java:208)
at com.test.GWASMapper.checkForNulls(GWASMapper.java:153)
at com.test.GWASMapper.map(GWASMapper.java:51)
at com.test.GWASMapper.map(GWASMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201211031924_0008_m_000000_0: ******************************************************************************************************************************************************************************************
attempt_201211031924_0008_m_000000_0: null
attempt_201211031924_0008_m_000000_0: ******************************************************************************************************************************************************************************************
12/11/04 08:51:37 INFO mapred.JobClient: Task Id : attempt_201211031924_0008_m_000001_0, Status : FAILED
Here is my driver class
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class GWASMapReduce extends Configured implements Tool{
/**
* #param args
*/
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
ToolRunner.run(configuration, new GWASMapReduce(), args);
}
#Override
public int run(String[] arg0) throws Exception {
JobConf conf = new JobConf();
conf.setInputFormat(GWASInputFormat.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setJarByClass(GWASMapReduce.class);
conf.setMapperClass(GWASMapper.class);
conf.setNumReduceTasks(0);
FileInputFormat.addInputPath(conf, new Path(arg0[0]));
FileOutputFormat.setOutputPath(conf, new Path(arg0[1]));
JobClient.runJob(conf);
return 0;
}
}
Mapper Class
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import com.google.common.base.Strings;
public class GWASMapper extends MapReduceBase implements Mapper<LongWritable, GWASGenotypeBean, Text, Text> {
private static Configuration conf;
#SuppressWarnings("rawtypes")
public void setup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException {
conf = context.getConfiguration();
// conf is null here
}
#Override
public void map(LongWritable inputKey, GWASGenotypeBean inputValue, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
// mapper code
}
}
I think u r missing this
JobClient jobClient = new JobClient();
client.setConf(conf);
JobClient.runJob(conf);
The conf parameter is not passed to the jobclient. Try this and see if it helps
And I suggest to use the new mapreduce librbary. check the v2.0 for the word count
http://hadoop.apache.org/docs/mapreduce/r0.22.0/mapred_tutorial.html#Example%3A+WordCount+v2.0
And also
try this JobConf job = new JobConf(new Configuration());
I think the configuration object is not initialized here.
And moreover u dont have anything special in the configuration object, so u can initialized the configuration object in the mapper also though which is not a good practice just for trying out
This is just a tip for others facing similar issue:
please make sure that you set the values first and declare a job.
For example:
Configuration conf = new Configuration();
conf.set("a","2");
conf.set("inputpath",args[0]);
//Must be set before the below line:
Job myjob = new Job(conf);
Hope this helps.