I am new to java/hadoop and am currently trying to recreate the results of a code I found here:
https://sunilmistri.wordpress.com/2015/02/13/mapreduce-example-for-minimum-and-maximum-value-by-group-key/
In contrast to the example, I only have 3 "columns", each with an integer number, delimited by a tab (/t), e.g. 100 115 3
The first two numbers are nodes, the third the weight between the nodes. Now I try to find the min and max weights for the first node.
The only things I have changed in the code are the delimiter (/t) and that I put both classes in one file (which, what I read, should be ok).
Now I get an error "java.lang.Object cannot be converted to minmaxduration" in the line indicated below.
I found similar questions like incompatible types : java.lang.Object cannot be converted to T
but this didn't really help me. How can I fix this?
Thanks in advance.
import java.io.IOException;
import java.io.DataInput;
import java.io.DataOutput;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Q1{
public static class DurationMapper
extends Mapper<Object, Text, Text, MinMaxDuration>{
private Text month= new Text();
private Integer minduration;
private Integer maxduration;
private MinMaxDuration outPut= new MinMaxDuration();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] campaignFields= value.toString().split("/t");
//10001,'telephone','may','mon',100,1,999,0,'nonexistent','no',0
month.set(campaignFields[0]);
minduration=Integer.parseInt(campaignFields[2]);
maxduration=Integer.parseInt(campaignFields[2]);
if (month == null || minduration == null || maxduration== null) {
return;
}
try {
outPut.setMinDuration(minduration);
outPut.setMaxDuration(maxduration);
context.write(month,outPut);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public static class DurationReducer
extends Reducer<Text,MinMaxDuration,Text,MinMaxDuration> {
private MinMaxDuration resultRow = new MinMaxDuration();
public void reduce(Text key, Iterable values,
Context context
) throws IOException, InterruptedException {
Integer minduration = 0;
Integer maxduration = 0;
resultRow.setMinDuration(null);
resultRow.setMaxDuration(null);
for (MinMaxDuration val : values) { //ERROR HERE
minduration = val.getMinDuration();
maxduration = val.getMaxDuration();
// get min score
if (resultRow.getMinDuration()==null || minduration.compareTo(resultRow.getMinDuration())<0) {
resultRow.setMinDuration(minduration);
}
// get min bonus
if (resultRow.getMaxDuration()==null || maxduration.compareTo(resultRow.getMaxDuration())>0) {
resultRow.setMaxDuration(maxduration);
}
} // end of for loop
context.write(key, resultRow);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Campaign Duration Count");
job.setJarByClass(CampaignMinMax.class);
job.setMapperClass(DurationMapper.class);
job.setCombinerClass(DurationReducer.class);
job.setReducerClass(DurationReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MinMaxDuration.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
class MinMaxDuration implements Writable {
// declare variables
Integer minDuration;
Integer maxDuration;
// constructor
public MinMaxDuration() {
minDuration=0;
maxDuration=0;
}
//set method
void setMinDuration(Integer duration){
this.minDuration=duration;
}
void setMaxDuration(Integer duration){
this.maxDuration=duration;
}
//get method
Integer getMinDuration() {
return minDuration;
}
Integer getMaxDuration(){
return maxDuration;
}
// write method
public void write(DataOutput out) throws IOException {
// what order we want to write !
out.writeInt(minDuration);
out.writeInt(maxDuration);
}
// readFields Method
public void readFields(DataInput in) throws IOException {
minDuration=new Integer(in.readInt());
maxDuration=new Integer(in.readInt());
}
public String toString() {
return minDuration + "\t" + maxDuration;
}
}
You are missing the type specification in the method signature. So it will be considered to be a generic Object and not your concrete type. So change the method signature from
public void reduce(Text key, Iterable values,Context context) throws IOException, InterruptedException
to
public void reduce(Text key, Iterable<MinMaxDuration> values,
Context context
) throws IOException, InterruptedException
So you specifiying the proper type.
Related
I am new in mapreduce and hadoop (hadoop 3.2.3 and java 8).
I am trying to separate some lines based on a symbol in a line.
Example: "q1,a,q0," should be return ('a',"q1,a,q0,") as (key, value).
My dataset contains ten(10) lines , five(5) for key 'a' and five for key 'b'.
I expect to get 5 line for each key but i always get five for 'a' and 10 for 'b'
Data
A,q0,a,q1;A,q0,b,q0;A,q1,a,q1;A,q1,b,q2;A,q2,a,q1;A,q2,b,q0;B,s0,a,s0;B,s0,b,s1;B,s1,a,s1;B,s1,b,s0
Mapper class:
import java.io.IOException;
import org.apache.hadoop.io.ByteWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MyMapper extends Mapper<LongWritable, Text, ByteWritable ,Text>{
private ByteWritable key1 = new ByteWritable();
//private int n ;
private int count =0 ;
private Text wordObject = new Text();
#Override
public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {
String ftext = value.toString();
for (String line: ftext.split(";")) {
wordObject = new Text();
if (line.split(",")[2].equals("b")) {
key1.set((byte) 'b');
wordObject.set(line) ;
context.write(key1,wordObject);
continue ;
}
key1.set((byte) 'a');
wordObject.set(line) ;
context.write(key1,wordObject);
}
}
}
Reducer class:
import java.io.IOException;
import org.apache.hadoop.io.ByteWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
public class MyReducer extends Reducer<ByteWritable, Text, ByteWritable ,Text>{
private Integer count=0 ;
#Override
public void reduce(ByteWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
for(Text val : values ) {
count++ ;
}
Text symb = new Text(count.toString()) ;
context.write(key , symb);
}
}
Driver class:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.ByteWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MyDriver extends Configured implements Tool {
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf("Usage: %s [generic options] <inputdir> <outputdir>\n", getClass().getSimpleName());
return -1;
}
#SuppressWarnings("deprecation")
Job job = new Job(getConf());
job.setJarByClass(MyDriver.class);
job.setJobName("separation ");
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setMapOutputKeyClass(ByteWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(ByteWritable.class);
job.setOutputValueClass(Text.class);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Configuration(), new MyDriver(), args);
System.exit(exitCode);
}
}
The problem was solved by putting the variable "count" inside the function "Reduce()".
Does your input read more than one line that has 5 more b's? I cannot reproduce for that one line, but your code can be cleaned up.
For the following code, I get output as
a 5
b 5
static class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, ByteWritable, Text> {
final ByteWritable keyOut = new ByteWritable();
final Text valueOut = new Text();
#Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, ByteWritable, Text>.Context context) throws IOException, InterruptedException {
String line = value.toString();
if (line.isEmpty()) {
return;
}
StringTokenizer tokenizer = new StringTokenizer(line, ";");
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
String[] parts = token.split(",");
String keyStr = parts[2];
if (keyStr.matches("[ab]")) {
keyOut.set((byte) keyStr.charAt(0));
valueOut.set(token);
context.write(keyOut, valueOut);
}
}
}
}
static class Reducer extends org.apache.hadoop.mapreduce.Reducer<ByteWritable, Text, Text, LongWritable> {
static final Text keyOut = new Text();
static final LongWritable valueOut = new LongWritable();
#Override
protected void reduce(ByteWritable key, Iterable<Text> values, org.apache.hadoop.mapreduce.Reducer<ByteWritable, Text, Text, LongWritable>.Context context)
throws IOException, InterruptedException {
keyOut.set(new String(new byte[]{key.get()}, StandardCharsets.UTF_8));
valueOut.set(StreamSupport.stream(values.spliterator(), true)
.mapToLong(v -> 1).sum());
context.write(keyOut, valueOut);
}
}
I am not an expert in Hadoop and I have the following problem. I have a job that have to run on a cluster with Hadoop version 0.20.2.
When I start the job I specify some parameters. Two of that I want to pass to mapper and reduce class becase I need it.
I try different solution and now my code looks like this:
package bigdata;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.TreeMap;
import org.apache.commons.math3.stat.regression.SimpleRegression;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.JobConfigurable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.vividsolutions.jts.geom.Geometry;
import com.vividsolutions.jts.io.ParseException;
import com.vividsolutions.jts.io.WKTReader;
public class BoxCount extends Configured implements Tool{
private static String mbr;
private static double cs;
public static class Map extends Mapper<LongWritable, Text, IntWritable, Text> implements JobConfigurable
{
public void configure(JobConf job) {
mbr = job.get(mbr);
cs = job.getDouble("cellSide", 0.1);
}
protected void setup(Context context)
throws IOException, InterruptedException {
// metodo in cui leggere l'MBR passato come parametro
System.out.println("mbr: " + mbr + "\ncs: " + cs);
// ...
}
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// some code here
}
protected void cleanup(Context context) throws IOException, InterruptedException
{
// other code
}
}
public static class Reduce extends Reducer<IntWritable,Text,IntWritable,IntWritable>implements JobConfigurable
{
private static String mbr;
private static double cs;
public void configure(JobConf job) {
mbr = job.get(mbr);
cs = job.getDouble("cellSide", 0.1);
}
protected void setup(Context context) throws IOException, InterruptedException
{
System.out.println("mbr: " + mbr + " cs: " + cs);
}
public void reduce(IntWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
//the reduce code
}
#SuppressWarnings("unused")
protected void cleanup(Context context)
throws IOException, InterruptedException {
// cleanup code
}
public BoxCount (String[] args) {
if (args.length != 4) {
// 0 1 2 3
System.out.println("Usage: OneGrid <mbr (Rectangle: (xmin,ymin)-(xmax,ymax))> <cell_Side> <input_path> <output_path>");
System.out.println("args.length = "+args.length);
for(int i = 0; i< args.length;i++)
System.out.println("args["+i+"]"+" = "+args[i]);
System.exit(0);
}
this.numReducers = 1;
//this.mbr = new String(args[0]);
// this.mbr = "Rectangle: (0.01,0.01)-(99.99,99.99)";
// per sierpinski_jts
this.mbr = "Rectangle: (0.0,0.0)-(100.01,86.6125)";
// per diagonale
//this.mbr = "Rectangle: (1.5104351688932738,1.0787616413335854)-(99999.3453727045,99999.98043392139)";
// per uniforme
// this.mbr = "Rectangle: (0.3020720559407146,0.2163091760095974)-(99999.68881210628,99999.46079314972)";
this.cellSide = Double.parseDouble(args[1]);
this.inputPath = new Path(args[2]);
this.outputDir = new Path(args[3]);
// Ricalcola la cellSize in modo da ottenere
// almeno minMunGriglie (10) griglie!
Grid g = new Grid(mbr, cellSide);
if ((this.cellSide*(Math.pow(2,minNumGriglie))) > g.width)
this.cellSide = g.width/(Math.pow(2,minNumGriglie));
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new BoxCount(args), args);
System.exit(res);
}
public int run(String[] args) throws Exception
{
// define new job instead of null using conf
Configuration conf = getConf();
#SuppressWarnings("deprecation")
Job job = new Job(conf, "BoxCount");
// conf.set("mapreduce.framework.name", "local");
// conf.set("mapreduce.jobtracker.address", "local");
// conf.set("fs.defaultFS","file:///");
// passo il valore mbr per creare la griglia
conf.set("mbr", mbr);
// passo lato cella
conf.setDouble("cellSide", cellSide);
job.setJarByClass(BoxCount.class);
// set job input format
job.setInputFormatClass(TextInputFormat.class);
// set map class and the map output key and value classes
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Text.class);
job.setMapperClass(Map.class);
// set reduce class and the reduce output key and value classes
job.setReducerClass(Reduce.class);
// set job output format
job.setOutputFormatClass(TextOutputFormat.class);
// add the input file as job input (from HDFS) to the variable
// inputFile
TextInputFormat.setInputPaths(job, inputPath);
// set the output path for the job results (to HDFS) to the variable
// outputPath
TextOutputFormat.setOutputPath(job, outputDir);
// set the number of reducers using variable numberReducers
job.setNumReduceTasks(numReducers);
// set the jar class
job.setJarByClass(BoxCount.class);
return job.waitForCompletion(true) ? 0 : 1; // this will execute the job
}
}
But the job not run. What is the correct solution?
I have to process 250 XML files each of which is 25 MB in size. For processing XML files, I am using XMLInputFormat from Apache Mahout and generating a sequence file. The key is filename and value is entire file contents in sequence file. But problem with this approach is that 250 Mappers are launched which makes the MapReduce job slower.
I have come across CombineFileInputFormat (while going through Tom White book) using which 250 Mappers wouldn't be launched for 250 files. But CombineFileInputFormat is an abstract class and I am facing difficulty implementing it for XML files as I am new to Java as well as Hadoop.
So, can someone please provide me implementation of CombineFileInputFormat for XML files.
Driver Code:
package com.ericsson.sequencefile;
//A MapReduce program for packaging a collection of small files as a single SequenceFile.
//hadoop jar sequencefiles.jar com.ericsson.sequencefile.SmallFilesToSequenceFileConverter -D xmlinput.start="<XMLstart>" -D xmlinput.end="</XMLstart>" /IRIS_NG/pfinder2/ccn/archive /IRIS_NG/pfinder2/output
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class SmallFilesToSequenceFileConverter extends Configured implements Tool {
public static class SequenceFileMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text filenameKey;
#Override
public void setup(Context context) throws IOException, InterruptedException {
InputSplit split = context.getInputSplit();
Path path = ((FileSplit) split).getPath();
filenameKey = new Text(path.toString() + "\n");
}
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String document = value.toString();
context.write(filenameKey, new Text(document));
}
}
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
Configuration conf = getConf();
Job job = Job.getInstance(conf,"SmallFilesToSequenceFile");
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(XmlInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(SequenceFileMapper.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new SmallFilesToSequenceFileConverter(), args);
System.exit(exitCode);
}
}
XMLInputFormt.java
package com.ericsson.sequencefile;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.slf4j.*;
import java.io.IOException;
/**
* Reads records that are delimited by a specific begin/end tag.
*/
public class XmlInputFormat extends TextInputFormat {
private static final Logger log =
LoggerFactory.getLogger(XmlInputFormat.class);
public static final String START_TAG_KEY = "xmlinput.start";
public static final String END_TAG_KEY = "xmlinput.end";
#Override
public RecordReader<LongWritable, Text> createRecordReader(
InputSplit split, TaskAttemptContext context) {
try {
return new XmlRecordReader((FileSplit) split,
context.getConfiguration());
} catch (IOException ioe) {
log.warn("Error while creating XmlRecordReader", ioe);
return null;
}
}
/**
* XMLRecordReader class to read through a given xml document to
* output xml blocks as records as specified
* by the start tag and end tag
*/
public static class XmlRecordReader
extends RecordReader<LongWritable, Text> {
private final byte[] startTag;
private final byte[] endTag;
private final long start;
private final long end;
private final FSDataInputStream fsin;
private final DataOutputBuffer buffer = new DataOutputBuffer();
private LongWritable currentKey;
private Text currentValue;
public XmlRecordReader(FileSplit split, Configuration conf)
throws IOException {
startTag = conf.get(START_TAG_KEY).getBytes("UTF-8");
endTag = conf.get(END_TAG_KEY).getBytes("UTF-8");
// open the file and seek to the start of the split
start = split.getStart();
end = start + split.getLength();
Path file = split.getPath();
FileSystem fs = file.getFileSystem(conf);
fsin = fs.open(split.getPath());
fsin.seek(start);
}
private boolean next(LongWritable key, Text value)
throws IOException {
if (fsin.getPos() < end && readUntilMatch(startTag, false)) {
try {
buffer.write(startTag);
if (readUntilMatch(endTag, true)) {
key.set(fsin.getPos());
value.set(buffer.getData(), 0, buffer.getLength());
return true;
}
} finally {
buffer.reset();
}
}
return false;
}
#Override
public void close() throws IOException {
fsin.close();
}
#Override
public float getProgress() throws IOException {
return (fsin.getPos() - start) / (float) (end - start);
}
private boolean readUntilMatch(byte[] match, boolean withinBlock)
throws IOException {
int i = 0;
while (true) {
int b = fsin.read();
// end of file:
if (b == -1) {
return false;
}
// save to buffer:
if (withinBlock) {
buffer.write(b);
}
// check if we're matching:
if (b == match[i]) {
i++;
if (i >= match.length) {
return true;
}
} else {
i = 0;
}
// see if we've passed the stop point:
if (!withinBlock && i == 0 && fsin.getPos() >= end) {
return false;
}
}
}
#Override
public LongWritable getCurrentKey()
throws IOException, InterruptedException {
return currentKey;
}
#Override
public Text getCurrentValue()
throws IOException, InterruptedException {
return currentValue;
}
#Override
public void initialize(InputSplit split,
TaskAttemptContext context)
throws IOException, InterruptedException {
}
#Override
public boolean nextKeyValue()
throws IOException, InterruptedException {
currentKey = new LongWritable();
currentValue = new Text();
return next(currentKey, currentValue);
}
}
}
I am new to Hadoop's MapReduce. I have written a map reduce task and I am trying to run that on my local machine. But the job hangs after map 100%.
Below is the code, I don't understand what am I missing.
I have a custom key class
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
public class AirlineMonthKey implements WritableComparable<AirlineMonthKey>{
Text airlineName;
Text month;
public AirlineMonthKey(){
super();
}
public AirlineMonthKey(Text airlineName, Text month) {
super();
this.airlineName = airlineName;
this.month = month;
}
public Text getAirlineName() {
return airlineName;
}
public void setAirlineName(Text airlineName) {
this.airlineName = airlineName;
}
public Text getMonth() {
return month;
}
public void setMonth(Text month) {
this.month = month;
}
#Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
this.airlineName.readFields(in);
this.month.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
this.airlineName.write(out);
this.month.write(out);
}
#Override
public int compareTo(AirlineMonthKey airlineMonthKey) {
// TODO Auto-generated method stub
int diff = getAirlineName().compareTo(airlineMonthKey.getAirlineName());
if(diff != 0){
return diff;
}
int m1 = Integer.parseInt(getMonth().toString());
int m2 = Integer.parseInt(airlineMonthKey.getMonth().toString());
if(m1>m2){
return -1;
}
else
return 1;
}
}
and The mapper and the reducer class that uses the custom key as below.
package com.mapresuce.secondarysort;
import java.io.IOException;
import java.io.StringReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import com.opencsv.CSVReader;
public class FlightDelayByMonth {
public static class FlightDelayByMonthMapper extends
Mapper<Object, Text, AirlineMonthKey, Text> {
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String str = value.toString();
// Reading Line one by one from the input CSV.
CSVReader reader = new CSVReader(new StringReader(str));
String[] split = reader.readNext();
reader.close();
String airlineName = split[6];
String month = split[2];
String year = split[0];
String delayMinutes = split[37];
String cancelled = split[41];
if (!(airlineName.equals("") || month.equals("") || delayMinutes
.equals(""))) {
if (year.equals("2008") && cancelled.equals("0.00")) {
AirlineMonthKey airlineMonthKey = new AirlineMonthKey(
new Text(airlineName), new Text(month));
Text delay = new Text(delayMinutes);
context.write(airlineMonthKey, delay);
System.out.println("1");
}
}
}
}
public static class FlightDelayByMonthReducer extends
Reducer<AirlineMonthKey, Text, Text, Text> {
public void reduce(AirlineMonthKey key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
for(Text val : values){
context.write(new Text(key.getAirlineName().toString()+" "+key.getMonth().toString()), val);
}
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage:<in> <out>");
System.exit(2);
}
Job job = new Job(conf, "Average monthly flight dealy");
job.setJarByClass(FlightDelayByMonth.class);
job.setMapperClass(FlightDelayByMonthMapper.class);
job.setReducerClass(FlightDelayByMonthReducer.class);
job.setOutputKeyClass(AirlineMonthKey.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Also I have created a job and configuration in the main. Don't know what I am missing. I am running all this in local environment.
Try with writing a custom implementation of toString, equals and hashcode in your AirlineMonthKey class.
Read below link.
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/WritableComparable.html
It is important for key types to implement hashCode().
Hope this could help you.
The issue was I had to use the default Constructor in the AirlineMonthKey (which I did) and initialize the instance variables in the custom key class (which I didn't).
Here's my source code
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class PageRank {
public static final String MAGIC_STRING = ">>>>";
boolean overwrite = true;
PageRank(boolean overwrite){
this.overwrite = overwrite;
}
public static class TextPair implements WritableComparable<TextPair>{
Text x;
int ordering;
public TextPair(){
x = new Text();
ordering = 1;
}
public void setText(Text t, int o){
x = t;
ordering = o;
}
public void setText(String t, int o){
x.set(t);
ordering = o;
}
public void readFields(DataInput in) throws IOException {
x.readFields(in);
ordering = in.readInt();
}
public void write(DataOutput out) throws IOException {
x.write(out);
out.writeInt(ordering);
}
public int hashCode() {
return x.hashCode();
}
public int compareTo(TextPair o) {
int x = this.x.compareTo(o.x);
if(x==0)
return ordering-o.ordering;
else
return x;
}
}
public static class MapperA extends Mapper<LongWritable, Text, TextPair, Text> {
private Text word = new Text();
Text title = new Text();
Text link = new Text();
TextPair textpair = new TextPair();
boolean start=false;
String currentTitle="";
private Pattern linkPattern = Pattern.compile("\\[\\[\\s*(.+?)\\s*\\]\\]");
private Pattern titlePattern = Pattern.compile("<title>\\s*(.+?)\\s*</title>");
private Pattern pagePattern = Pattern.compile("<page>\\s*(.+?)\\s*</page>");
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
int startPage=line.lastIndexOf("<title>");
if(startPage<0)
{
Matcher matcher = linkPattern.matcher(line);
int n = 0;
title.set(currentTitle);
while(matcher.find()){
textpair.setText(matcher.group(1), 1);
context.write(textpair, title);
}
link.set(MAGIC_STRING);
textpair.setText(title.toString(), 0);
context.write(textpair, link);
}
else
{
String result=line.trim();
Matcher titleMatcher = titlePattern.matcher(result);
if(titleMatcher.find()){
currentTitle = titleMatcher.group(1);
}
else
{
currentTitle=result;
}
}
}
}
public static class ReducerA extends Reducer<TextPair, Text, Text, Text>{
Text aw = new Text();
boolean valid = false;
String last = "";
public void run(Context context) throws IOException, InterruptedException {
setup(context);
while (context.nextKeyValue()) {
TextPair key = context.getCurrentKey();
Text value = context.getCurrentValue();
if(key.ordering==0){
last = key.x.toString();
}
else if(key.x.toString().equals(last)){
context.write(key.x, value);
}
}
cleanup(context);
}
}
public static class MapperB extends Mapper<Text, Text, Text, Text>{
Text t = new Text();
public void map(Text key, Text value, Context context) throws InterruptedException, IOException{
context.write(value, key);
}
}
public static class ReducerB extends Reducer<Text, Text, Text, PageRankRecord>{
ArrayList<String> q = new ArrayList<String>();
public void reduce(Text key, Iterable<Text> values, Context context)throws InterruptedException, IOException{
q.clear();
for(Text value:values){
q.add(value.toString());
}
PageRankRecord prr = new PageRankRecord();
prr.setPageRank(1.0);
if(q.size()>0){
String[] a = new String[q.size()];
q.toArray(a);
prr.setlinks(a);
}
context.write(key, prr);
}
}
public boolean roundA(Configuration conf, String inputPath, String outputPath, boolean overwrite) throws IOException, InterruptedException, ClassNotFoundException{
if(FileSystem.get(conf).exists(new Path(outputPath))){
if(overwrite){
FileSystem.get(conf).delete(new Path(outputPath), true);
System.err.println("The target file is dirty, overwriting!");
}
else
return true;
}
Job job = new Job(conf, "closure graph build round A");
//job.setJarByClass(GraphBuilder.class);
job.setMapperClass(MapperA.class);
//job.setCombinerClass(RankCombiner.class);
job.setReducerClass(ReducerA.class);
job.setMapOutputKeyClass(TextPair.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setNumReduceTasks(30);
FileInputFormat.addInputPath(job, new Path(inputPath));
SequenceFileOutputFormat.setOutputPath(job, new Path(outputPath));
return job.waitForCompletion(true);
}
public boolean roundB(Configuration conf, String inputPath, String outputPath) throws IOException, InterruptedException, ClassNotFoundException{
if(FileSystem.get(conf).exists(new Path(outputPath))){
if(overwrite){
FileSystem.get(conf).delete(new Path(outputPath), true);
System.err.println("The target file is dirty, overwriting!");
}
else
return true;
}
Job job = new Job(conf, "closure graph build round B");
//job.setJarByClass(PageRank.class);
job.setMapperClass(MapperB.class);
//job.setCombinerClass(RankCombiner.class);
job.setReducerClass(ReducerB.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(PageRankRecord.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setNumReduceTasks(30);
SequenceFileInputFormat.addInputPath(job, new Path(inputPath));
SequenceFileOutputFormat.setOutputPath(job, new Path(outputPath));
return job.waitForCompletion(true);
}
public boolean build(Configuration conf, String inputPath, String outputPath) throws IOException, InterruptedException, ClassNotFoundException{
System.err.println(inputPath);
if(roundA(conf, inputPath, "cgb", true)){
return roundB(conf, "cgb", outputPath);
}
else
return false;
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
Configuration conf = new Configuration();
//PageRanking.banner("ClosureGraphBuilder");
PageRank cgb = new PageRank(true);
cgb.build(conf, args[0], args[1]);
}
}
Here's how i compile and run
javac -classpath hadoop-0.20.1-core.jar -d pagerank_classes PageRank.java PageRankRecord.java
jar -cvf pagerank.jar -C pagerank_classes/ .
bin/hadoop jar pagerank.jar PageRank pagerank result
but I am getting the following errors:
INFO mapred.JobClient: Task Id : attempt_201001012025_0009_m_000001_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: PageRank$MapperA
Can someone tell me whats wrong
Thanks
If you are using the 0.2.0 hadoop (want to use the non-deprecated classes), you can do:
public int run(String[] args) throws Exception {
Job job = new Job();
job.setJarByClass(YourMapReduceClass.class); // <-- omitting this causes above error
job.setMapperClass(MyMapper.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
return 0;
}
Did "PageRank$MapperA.class" end up inside that jar file? It should be in the same place as "PageRank.class".
Try to add "--libjars pagerank.jar". Mapper and reducer are running across machines, thus you need to distribute your jar to every machine. "--libjars" helps to do that.
For the HADOOP_CLASSPATH you should specify the folder where the JAR file is located...
If you want to understand how the classpath works: http://download.oracle.com/javase/6/docs/technotes/tools/windows/classpath.html
I guess you should change your HADOOP_CLASSPATH variable, so that it points to the jar file.
e.g. HADOOP_CLASSPATH=<what ever the path>/PageRank.jar or something like that.
If you are using ECLIPSE for generating jar then use "Extract generated libraries into generated JAR" option.
Though MapReduce program is parallel processing. Mapper, Combiner and Reducer class has sequence flow. Have to wait for completing each flow depends on other class so need job.waitForCompletion(true); But It must to set input and output path before starting Mapper, Combiner and Reducer class. Reference
Solution for this already answered in https://stackoverflow.com/a/38145962/3452185