I was reading and implementing this tutorial. At the last, I implement the three classes- Mapper, Reducer and driver. I copied the exact code given on the webpage for all three classes. But following two errors didn't go away:-
Mapper Class
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordCountMapper extends MapReduceBase // Here WordCountMapper was underlined as error source by Eclipse
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line.toLowerCase());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}
The error was:
The type WordCountMapper must implement the inherited abstract method
Mapper.map(LongWritable, Text,
OutputCollector, Reporter)
Driver Class (WordCount.java)
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
public class WordCount {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(WordCount.class);
// specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// specify input and output dirs
FileInputPath.addInputPath(conf, new Path("input")); //////////FileInputPath was underlined
FileOutputPath.addOutputPath(conf, new Path("output")); ////////FileOutputPath as underlined
// specify a mapper
conf.setMapperClass(WordCountMapper.class);
// specify a reducer
conf.setReducerClass(WordCountReducer.class);
conf.setCombinerClass(WordCountReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
The error was:
FileInputPath cannot be resolved
FileOutputPath cannot be resolved
Use this
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));
or
FileInputFormat.addInputPath(conf, new Path("inputfile.txt"));
FileOutputFormat.setOutputPath(conf,new Path("outputfile.txt"));
instead of this
// specify input and output dirs
FileInputPath.addInputPath(conf, new Path("input")); //////////FileInputPath was underlined
FileOutputPath.addOutputPath(conf, new Path("output")); ////////FileOutputPath as underlined
This should be the one to import in this case
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
Related
I am having unique requirement where i have to pass the zip shell command from text file and mapper will process the script that will create zip files in parallel fashion using mapper only. I am thinking to execute shell command using exec in java. I am bit stuck on how to implement the custom mapper as my output would be compressed format.
Below is my mapper class -
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class Map extends Mapper<LongWritable, Text, Text, NullWritable>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
String line= value.toString();
StringTokenizer tokenizer= new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
value.set(tokenizer.nextToken());
context.write(value,NullWritable.get());
}
}
}
Processor class -
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
public class ZipProcessor extends Configured implements Tool {
public static void main(String [] args) throws Exception{
int exitCode = ToolRunner.run(new ZipProcessor(), args);
System.exit(exitCode);
}
public int run(String[] args) throws Exception {
if(args.length!=2){
System.err.printf("Usage: %s needs two arguments, input and output files\n", getClass().getSimpleName());
return -1;
}
Configuration conf=new Configuration();
Job job = Job.getInstance(conf,"zipping");
job.setJarByClass(ZipProcessor.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapperClass(Map.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
int returnValue = job.waitForCompletion(true) ? 0:1;
if(job.isSuccessful()) {
System.out.println("Job was successful");
} else if(!job.isSuccessful()) {
System.out.println("Job was not successful");
}
return returnValue;
}
}
Sample mapr.txt
zip -r "/folder1/file.zip" "sourceFolder"
zip -r "/folder2/file.zip" "sourceFolder"
zip -r "/folder3/file.zip" "sourceFolder"
I'm a newbie to Hadoop Programming and I have started learning by setting up Hadoop 2.7.1 on a three node cluster. I have tried running helloworld jars that comes out of the box in Hadoop and it ran fine with success but I wrote my own driver code in my local machine and bundled it into a jar and executed it this way but it fails with NO error messages.
Here is my code and this is what I did.
WordCountMapper.java
package mot.com.bin.test;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text Value,
OutputCollector<Text, IntWritable> opc, Reporter r)
throws IOException {
String s = Value.toString();
for (String word :s.split(" ")) {
if( word.length() > 0) {
opc.collect(new Text(word), new IntWritable(1));
}
}
}
}
WordCountReduce.java
package mot.com.bin.test;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WordCountReduce extends MapReduceBase implements Reducer < Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> opc, Reporter r)
throws IOException {
// TODO Auto-generated method stub
int i = 0;
while (values.hasNext()) {
IntWritable in = values.next();
i+=in.get();
}
opc.collect(key, new IntWritable (i));
}
WordCount.java
/**
* **DRIVER**
*/
package mot.com.bin.test;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.io.Text;
//import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;
/**
* #author rgb764
*
*/
public class WordCount extends Configured implements Tool{
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
}
public int run(String[] arg0) throws Exception {
if (arg0.length < 2) {
System.out.println("Need input file and output directory");
return -1;
}
JobConf conf = new JobConf();
FileInputFormat.setInputPaths(conf, new Path( arg0[0]));
FileOutputFormat.setOutputPath(conf, new Path( arg0[1]));
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(WordCountMapper.class);
conf.setReducerClass(WordCountReduce.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
}
First I tried extracting it as a jar from eclipse and run it in my hadoop cluster. No errors yet no success as well. Then moved my individual java files to my NameNode and compiled each java files and then created the jar file there, still hadoop command returns no results but no errors as well. Kindly help me on this.
hadoop jar WordCout.jar mot.com.bin.test.WordCount /karthik/mytext.txt /tempo
Extracted all dependent jar files using Maven and I added them into the classpath in my name node. Help me figure what and where am I going wrong.
IMO you are missing the code in your main method which instantiate the Tool implementation ( WordCount in your case) and runs the same.
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}
Refer this.
Im trying to build a simple Wordcount Hadoop project(https://developer.yahoo.com/hadoop/tutorial/module3.html#running) but when I click "Run on Hadoop" there is no action at all...Infact nothing is displayed in the console.
Here is my project structure -
Here is my wordcount job file...
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
public class WordCount {
public static void main(String[] args) {
Configuration config = new Configuration();
config.addResource(new Path("/HADOOP_HOME/conf/hadoop-default.xml"));
config.addResource(new Path("/HADOOP_HOME/conf/hadoop-site.xml"));
JobClient client = new JobClient();
JobConf conf = new JobConf(WordCount.class);
// specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// specify input and output dirs
FileInputPath.addInputPath(conf, new Path("input"));
FileOutputPath.addOutputPath(conf, new Path("output"));
// specify a mapper
conf.setMapperClass(WordCountMapper.class);
// specify a reducer
conf.setReducerClass(WordCountReducer.class);
conf.setCombinerClass(WordCountReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
I think the problem is with the jar file you used for hadoop client to server, So What happens is it tries to stay in the following line and try to search for the server
Configuration config = new Configuration();
Try to debug and let us know if you face any more problem,
If not try the following
Have you tried running the program on eclipse by pointing the core-site , hdfs-site
Configuration.addResource(new Path("path-to-your-core-site.xml file"));
Configuration.addResource(new Path("path-to-your-hdfs-site.xml file"));
and
FileInputPath.addInputPath(hdfs path to your input file);
FileInputPath.addOutputPath(hdfs path to your output file);
See that It works and get back to us
try this,
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<Object, Text, Text, IntWritable> {
#Override
public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
System.out.println(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
output.collect(value, new IntWritable(1));
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
#Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception,IOException {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("WordCount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("/home/user17/test.txt"));
FileOutputFormat.setOutputPath(conf, new Path("hdfs://localhost:9000/out2"));
JobClient.runJob(conf);
}
}
I had the exact same problem, and I have just figured it out.
Add parameters in the run configurations.
Right Click WordCount.java > Run As > Run Configurations > Java Application > Word Count > Arguments
Enter this hdfs://hadoop:9000/ hdfs://hadoop:9000/
Apply and Finish Run again.
After running, refresh the project and the result is in the output folder.
In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than tab(which is default) to separate key and Value.
Sample Input :
106298345|Surender,Raja,CTS,50000,Chennai
106297845|Murali,Bala,TCS,60000,Chennai
106291271|Rajagopal,Ravi,CTS,50000,Chennai
106298616|Vikram,Darma,TCS,70000,Chennai
106299100|Kumar,Selvam,TCS,90000,Chennai
106299288|Sandeep,Krishna,CTS,10000,Chennai
106290071|Vimal,Pillai,TCS,20000,Chennai
I am specifying KeyValueTextInputFormat as :
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "|");
Job myhadoopJob = new Job(conf);
my mapper code is below
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class KeyValueMapper extends Mapper<Text, Text, Text, Text>
{
#Override
protected void map(Text key, Text value, Context context)throws IOException, InterruptedException {
String mapOutPutValue="";
String line = value.toString();
String[] details=line.split(",");
for(int i=0;i<details.length;i++)
{
if(details[i].equalsIgnoreCase("TCS"))
{
mapOutPutValue=line;
}
}if(mapOutPutValue!="")context.write(key, new Text(mapOutPutValue)); }
}
but my mapper class is printing all the output in my inputfile.My mapper class is not filtering the input as per logic in map method..
Can Someone help me
Please try the below option in driver code.
conf.set("key.value.separator.in.input.line", "|");
I want to write my own word count example using MapReduce and hadoop v. 1.0.3 (I'm on MacOS) but i don't understand why it doesn't work
Sharing my code :
main:
package org.myorg;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
// set job name, mapper, combiner, and reducer classes
conf.setJobName("WordCount");
// set input and output formats
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
// set input and output paths
//FileInputFormat. setInputPaths(conf, new Path(input));
//FileOutputFormat.setOutputPath(conf, new Path(output));
FileOutputFormat.setCompressOutput(conf, false);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(org.myorg.Map.class);
conf.setReducerClass(org.myorg.Reduce.class);
String host = args[0];
String input = host + "/" + args[1];
String output = host + "/" + args[2];
// set input and output paths
FileInputFormat.addInputPath(conf, new Path(input));
FileOutputFormat.setOutputPath(conf, new Path(output));
JobClient j=new JobClient(conf);
(j.submitJob(conf)).waitForCompletion();
}
}
Mapper:
package org.myorg;
import java.io.IOException;
import java.util.HashMap;
import java.util.StringTokenizer;
import java.util.TreeMap;
import java.util.Vector;
import java.util.Map.Entry;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
MapWritable hs = new MapWritable();
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
//hs.put(word, one);
output.collect(word,one);
}
// TODO Auto-generated method stub
}
}
Reducer:
package org.myorg;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.math.RoundingMode;
import java.net.URI;
import java.net.URISyntaxException;
import java.text.DecimalFormat;
import java.text.DecimalFormatSymbols;
import java.text.NumberFormat;
import java.text.ParseException;
import java.util.Iterator;
import java.util.Map;
import java.util.TreeMap;
import java.util.Map.Entry;
import java.util.Vector;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Reducer.Context;
//public class Reduce extends MapReduceBase implements Reducer<Text, MapWritable, Text, Text> {
public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, Text> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
String host = "hdfs://localhost:54310/";
String tmp = host + "Temporany/output.txt";
FileSystem srcFS;
try {
srcFS = FileSystem.get(new URI(tmp), new JobConf());
srcFS.delete(new Path(tmp), true);
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(
srcFS.create(new Path(tmp))));
wr.write(key.toString() + ":" + sum);
wr.close();
srcFS.close();
} catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// context.write(key, new IntWritable(sum));
}
}
The Job started and end with no error but didn't write the output file.
I launch the jar with Hadoop with this command:
./Hadoop jar /Users/User/Desktop/hadoop/wordcount.jar hdfs://localhost:54310 /In/testo.txt /Out/wordcount17
This is the output:
2014-03-03 17:56:22.063 java[6365:1203] Unable to load realm info from SCDynamicStore
14/03/03 17:56:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/03 17:56:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/03 17:56:23 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/03 17:56:23 INFO mapred.FileInputFormat: Total input paths to process : 1
I suppose the problem is "unable to load native-hadoop library" but works fine for other Jar's.
Q : The Job start and end with no error but don't write the output-file ??
Ans : I am not sure that job successfully ends w/o error .
Problems :
Job Configuration:
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
The setOutputKeyClass() and setOutputValueClass() methods control the
output types for the map and the reduce functions, which are often the
same. If they are different, then the map
output types can be set using the methods setMapOutputKeyClass() and
setMapOutputValueClass().
The Output class in your case is :
Map key : Text
Map Value : IntWritable
Reduce key : Text
Reduce Value : Text
Which will result in Type mismatch exception
Reduce
I am not sure why you are using hdfs API to write your output to a file ?.
Should use output.collect(key,value).
In case of multiple reducer are you handling the simultaneous write operation?
And I wonder what context.write is doing in old apis (it's commented )? .
You can use following for more information
Debug your map-reduce Job :
Counters
Jobtracker web interface
Q. Difference b/w SubmitJob() && waitForCompletion() ?
Ans : SubmitJob() : submits the job and ends .
waitForCompletion() : submits the job and prints the status of the job on console .
So waitForCompletion() is SubmitJob()+status update of job until complete.
word count
Please read
Map Reduce Apache
You can also find hadoop-examples-X.X.X.jar in your installation folder.
Go through $HADOOP_HOME/src/expalmes/ for source code .
**$HADOOP_HOME = hadoop installation folder