I have a very simple "Hello world" style map/reduce job.
public class Tester extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s [generic options] <input> <output>\n",
getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
Job job = Job.getInstance(new Configuration());
job.setJarByClass(getClass());
getConf().set("mapreduce.job.queuename", "adhoc");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
job.setMapperClass(TesterMapper.class);
job.setNumReduceTasks(0);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Tester(), args);
System.exit(exitCode);
}
Which implements the ToolRunner, but when run is not parsing the arguments.
$hadoop jar target/manifold-mapreduce-0.1.0.jar ga.manifold.mapreduce.Tester -conf conf.xml etl/manifold/pipeline/ABV1T/ingest/input etl/manifold/pipeline/ABV1T/ingest/output
15/02/04 16:35:24 INFO client.RMProxy: Connecting to ResourceManager at lxjh116-pvt.phibred.com/10.56.100.23:8050
15/02/04 16:35:25 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
I can verify that the configuration is not being added.
Anyone know why Hadoop thinks the ToolRunner isn't implemented?
$hadoop version
Hadoop 2.4.0.2.1.2.0-402
Hortonworks
Thanks,
Chris
As your question pops really fast on the top of Google search for this warning, I'll give a proper answer here :
As user1797538 you said : (sorry about that)
user1797538: "The problem was the call to get a Job instance"
The superclass Configured must be used. As its name suggests, it is already configured, so the existing Configuration must be used by the Tester class and not set a new empty one.
If we extract the Job creation in a method :
private Job createJob() throws IOException {
// On this line use getConf() instead of new Configuration()
Job job = Job.getInstance(getConf(), Tester.class.getCanonicalName());
// Other job setter call here, for example
job.setJarByClass(Tester.class);
job.setMapperClass(TesterMapper.class);
job.setCombinerClass(TesterReducer.class);
job.setReducerClass(TesterReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// adapt this to your needs of course.
return job;
}
Another example from the javadoc : org.apache.hadoop.util.Tool
And the Javadoc : Configured.getConf()
Related
I have been working on a map reduce program and it works well in the hadoop hdfs environment in virtual machine. But when I tried the same program in windows with Intellij I'm getting this error.
WordCount.class // used this as sample program for testing whether it works or not.
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Intellij Error Log
2019-12-12 21:42:04,139 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1181)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2019-12-12 21:42:04,144 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(79)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2019-12-12 21:42:08,029 WARN [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-12-12 21:42:08,089 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(251)) - Cleaning up the staging area file:/tmp/hadoop/mapred/staging/Abhishek1224360463/.staging/job_local1224360463_0001
Exception in thread "main" 0: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:236)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
at org.apache.hadoop.fs.ChecksumFileSystem$1.apply(ChecksumFileSystem.java:506)
at org.apache.hadoop.fs.ChecksumFileSystem$FsOperation.run(ChecksumFileSystem.java:487)
at org.apache.hadoop.fs.ChecksumFileSystem.setPermission(ChecksumFileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:619)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:94)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:97)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:192)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
at WordCount.main(WordCount.java:59)
I have given input by sending directory name as argument to main class i.e by editing run configurations and passing the directory name which contains the text file. (Input Arguments: input output)
I have input directory under the project root folder.
Running Intellij in Administrator Mode did the trick. That is weird though. Will be appreciated if anyone explains me about this.
I was trying to write a mapreduce code in java.So here are my files.
mapper class(bmapper):
public class bmapper extends Mapper<LongWritable,Text,Text,NullWritable>{
private String txt=new String();
public void mapper(LongWritable key,Text value,Context context)
throws IOException, InterruptedException{
String str =value.toString();
int index1 = str.indexOf("TABLE OF CONTENTS");
int index2 = str.indexOf("</table>");
int index3 = str.indexOf("MANAGEMENT'S DISCUSSION AND ANALYSIS");
if(index1 == -1)
{ txt ="nil";
}
else
{
if(index1<index3 && index2>index3)
{
int index4 = index3+ 109;
int pageno =str.charAt(index4);
String[] pages =str.split("<page>");
txt = pages[pageno+1];
}
else
{
txt ="nil";
}
}
context.write(new Text(txt), NullWritable.get());
}
}
reducer class(breducer):
public class breducer extends Reducer<Text,NullWritable,Text,NullWritable>{
public void reducer(Text key,NullWritable value,Context context) throws IOException,InterruptedException{
context.write(key, value);
}
}
driver class (bdriver):
public class bdriver {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJobName("black coffer");
job.setJarByClass(bdriver.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
job.setReducerClass(breducer.class);
job.setMapperClass(bmapper.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path[]{new Path(args[0])});
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
`
I am getting following error.
[training#localhost ~]$ hadoop jar blackcoffer.jar com.test.bdriver /page1.txt /MROUT4
18/03/16 04:38:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
18/03/16 04:38:57 INFO input.FileInputFormat: Total input paths to process : 1
18/03/16 04:38:57 WARN snappy.LoadSnappy: Snappy native library is available
18/03/16 04:38:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
18/03/16 04:38:57 INFO snappy.LoadSnappy: Snappy native library loaded
18/03/16 04:38:57 INFO mapred.JobClient: Running job: job_201803151041_0007
18/03/16 04:38:58 INFO mapred.JobClient: map 0% reduce 0%
18/03/16 04:39:03 INFO mapred.JobClient: Task Id : attempt_201803151041_0007_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
I think it is not able to find Mapper and reducer class. I have written the code in main class, It is getting default Mapper and reducer class
Your input/output type seems compatible with job configuration.
Adding the issue detail and resolution here (As per discussion in the comments, it is confirmed by OP that the issue resolved).
As per Javadoc, The reducer's reduce method is having below signature
protected void reduce(KEYIN key,
Iterable<VALUEIN> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException,
InterruptedException
According to it, reducer should be
public class breducer extends Reducer<Text,NullWritable,Text,NullWritable>{
#Overwrite
public void reducer(Text key,Iterable<NullWritable> value,Context context) throws IOException,InterruptedException{
// Your logic
}
}
The issue was that because of slight difference in the signature of map() and reduce() method, the methods were not actually getting overriden. It was just overloading the same method names.
The issue was caught after putting #Override annotation on the map() and reduce() function. Although its not mandatory, but as a best practice, always add #Override annotation on implemented map() and reduce() methods.
I am a new Hadoop user. My program is to skip bad record data in mapreduce. I didnot skip bad data so firstly, I am not trying to skip data and I want to find the which error occur. So, I add mycustomrunjob() to know why I cannot skip bad record. Currently, I deleted skip coding line. I have a problem when I running this program although I already set output file path:
import java.io.IOException;
import org.apache.hadoop.conf.* ;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.* ;
import org.apache.hadoop.mapred.* ;
import org.apache.hadoop.mapred.lib.* ;
public class SkipData
{
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable>
{
private final static LongWritable one = new LongWritable(1);
private Text word = new Text("totalcount");
public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
if (line.equals("skiptext"))
throw new RuntimeException("Found skiptext") ;
output.collect(word, one);
}
}
public static RunningJob myCustomRunJob(JobConf job) throws Exception {
JobClient jc = new JobClient(job);
RunningJob rj = jc.submitJob(job);
if (!jc.monitorAndPrintJob(job, rj)) {
throw new IOException("Job failed with info: " + rj.getFailureInfo());
}
return rj;
}
public static void main(String[] args) throws Exception
{
System.setProperty("hadoop.home.dir", "/");
Configuration config = new Configuration() ;
JobConf conf = new JobConf(config, SkipData.class);
RunningJob result=myCustomRunJob(conf);
conf.setJobName("SkipData");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(MapClass.class);
conf.setCombinerClass(LongSumReducer.class);
conf.setReducerClass(LongSumReducer.class);
FileInputFormat.setInputPaths(conf,args[0]) ;
FileOutputFormat.setOutputPath(conf, new Path(args[1])) ;
JobClient.runJob(conf);
}
}
I am trying to accomplish this error many times. I use old API .How can I solve this?
18/02/28 11:05:28 DEBUG security.UserGroupInformation: PrivilegedActionException as:saung (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.
18/02/28 11:05:28 DEBUG security.UserGroupInformation: PrivilegedActionException as:saung (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.
Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.ja va:117)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at mapredpack.SkipData.myCustomRunJob(SkipData.java:90)
at mapredpack.SkipData.main(SkipData.java:140)
You're trying to run the job twice - by calling
RunningJob result=myCustomRunJob(conf);
so early on, your job will fail as none of the configuration is set at that stage. I would remove that line (and the myCustomRunJob(JobConf job) method). The JobClient.runJob(conf) at the very bottom will deal with running the job.
Two issues are there in the code.
You are calling the job for first time without setting any
input/output path.
Also, you are trying to resubmit job, which is
bound to fail (because each MR job needs a new output directory).
Change your main method like this:
public static void main(String[] args) throws Exception
{
System.setProperty("hadoop.home.dir", "/");
Configuration config = new Configuration() ;
JobConf conf = new JobConf(config, SkipData.class);
conf.setJobName("SkipData");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(MapClass.class);
conf.setCombinerClass(LongSumReducer.class);
conf.setReducerClass(LongSumReducer.class);
FileInputFormat.setInputPaths(conf,args[0]) ;
FileOutputFormat.setOutputPath(conf, new Path(args[1])) ;
RunningJob result=myCustomRunJob(conf);
}
I was trying to do a simple sort example with TotalOrderPartitioner. The input is a sequence file with IntWritable as key and NullWritable as value. I want to sort based on key. The output of is a sequence file with IntWritable as key and NullWritable as value. I'm running this job in clustered environment. This is my driver class:
public class SortDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Configuration conf = this.getConf();
Job job = Job.getInstance(conf);
job.setJobName("SORT-WITH-TOTAL-ORDER-PARTITIONER");
job.setJarByClass(SortDriver.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
SequenceFileInputFormat.setInputPaths(job, new Path("/user/client/seq-input"));
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(NullWritable.class);
job.setMapperClass(SortMapper.class);
job.setReducerClass(SortReducer.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
TotalOrderPartitioner.setPartitionFile(conf, new Path("/user/client/partition.lst"));
job.setOutputFormatClass(SequenceFileOutputFormat.class);
SequenceFileOutputFormat.setCompressOutput(job, true);
SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);
SequenceFileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);
SequenceFileOutputFormat.setOutputPath(job, new Path("/user/client/sorted-output"));
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setNumReduceTasks(3);
InputSampler.Sampler<IntWritable, NullWritable> sampler = new InputSampler.RandomSampler<>(0.1, 200);
InputSampler.writePartitionFile(job, sampler);
boolean res = job.waitForCompletion(true);
return res ? 0 : 1;
}
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new Configuration(), new SortDriver(), args));
}
}
Mapper class:
public class SortMapper extends Mapper<IntWritable, NullWritable, IntWritable, NullWritable>{
#Override
protected void map(IntWritable key, NullWritable value, Context context) throws IOException, InterruptedException {
context.write(key, value);
}
}
Reducer class:
public class SortReducer extends Reducer<IntWritable, NullWritable, IntWritable, NullWritable> {
#Override
protected void reduce(IntWritable key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
When I run this job I get:
Error: java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:678)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.FileNotFoundException: File file:/grid/hadoop/yarn/local/usercache/client/appcache/application_1406784047304_0002/container_1406784047304_0002_01_000003/_partition.lst does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:301)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
... 10 more
I found partition file in my home directory(/user/client) with name _partition.lst. The partition file name does not match with code: TotalOrderPartitioner.setPartitionFile(conf, new Path("/user/client/partition.lst"));. Can anyone help me with this problem? I'm using hadoop 2.4 in HDP 2.1 distribution.
I think the problem is in the line:
TotalOrderPartitioner.setPartitionFile(conf, new Path("/user/client/partition.lst"));
You have to replace it with:
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path("/user/client/partition.lst"));
since you are using
InputSampler.writePartitionFile(job, sampler);
Otherwise, try replacing the last line only with:
InputSampler.writePartitionFile(conf, sampler);
But I am not sure if it works like that in the new API.
Hope it helps! Good luck!
I also found this error when I was using hadoop mapreduce and the mapreduce service had not been installed and started. After installing mapreduce and starting it, the exception disappeared.
got this error when I had job.setNumReduceTasks(3); and was running my code in standalone mode
changed it to job.setNumReduceTasks(1) and worked fine in standalone mode
I'm using Hadoop 0.20.203.0. I want to output to two different files, so I'm trying to get MultipleOutputs working.
Here's my configuration method:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: indycascade <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "indy cascade");
job.setJarByClass(IndyCascade.class);
job.setMapperClass(ICMapper.class);
job.setCombinerClass(ICReducer.class);
job.setReducerClass(ICReducer.class);
TextInputFormat.addInputPath(job, new Path(otherArgs[0]));
TextOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
MultipleOutputs.addNamedOutput(conf, "sql", TextOutputFormat.class, LongWritable.class, Text.class);
job.waitForCompletion(true);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
However, this won't compile. The offending line is MultipleOutputs.addNamedOutput(...), which throws a "cannot find symbol" error.
isaac/me/saac/i/IndyCascade.java:94: cannot find symbol
symbol : method addNamedOutput(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.Class<org.apa che.hadoop.mapreduce.lib.output.TextOutputFormat>,java.lang.Class<org.apache.hadoop.io.LongWritable>,java.lang.Class<org.apache.hadoop.io.Text>)
location: class org.apache.hadoop.mapred.lib.MultipleOutputs
MultipleOutputs.addNamedOutput(conf, "sql", TextOutputFormat.class, LongWritable.class, Text.class);
Of course, I tried using a JobConf instead of Configuration, as the API demands, but that leads to the same error. Additionally, JobConf is deprecated.
How do I get MultipleOutputs to work? Is that even the correct class to use?
You're mixing old and new API types:
You're using the old API org.apache.hadoop.mapred.lib.MultipleOutputs:
location: class org.apache.hadoop.mapred.lib.MultipleOutputs
With the new API org.apache.hadoop.mapreduce.lib.output.TextOutputFormat:
symbol : method addNamedOutput(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.Class<org.apa che.hadoop.mapreduce.lib.output.TextOutputFormat>,java.lang.Class<org.apache.hadoop.io.LongWritable>,java.lang.Class<org.apache.hadoop.io.Text>)
Make the APIs consistent and you should be ok
Edit: Infact 0.20.203 doesn't have a port of MultipleOutputs for the new API, so you'll have to use the old api, find a new API port online Cloudera- 0.20.2+320), or port it yourself
Also, you should look at the ToolRunner class to execute your jobs, it will remove the need to explicitly call the GenericOptionsParser:
public static class Driver extends Configured implements Tool {
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new Driver(), args));
}
public int run(String args[]) {
if (args.length != 2) {
System.err.println("Usage: indycascade <in> <out>");
System.exit(2);
}
Job job = new Job(getConf());
Configuration conf = job.getConfiguration();
// insert other job set up here
return job.waitForCompletion(true) ? 0 : 1;
}
}
Final point - any reference to conf after you create the Job instance will be the original conf. Job makes a deep copy of the conf object, so calling MultipleOutputs.addNamedoutput(conf, ...) will not have the desired effect, use MultipleOutputs.addNamedoutput(job.getConfiguration(), ...) instead. See my example code above for the correct way to do this