Hadoop mapreduce custom writable static context - java

I'm working on an university homework and we have to use hadoop mapreduce for it. I'm trying to create a new custom writable as I want to output key-value pairs as (key, (doc_name, 1)).
public class Detector {
private static final Path TEMP_PATH = new Path("temp");
private static final String LENGTH = "gramLength";
private static final String THRESHOLD = "threshold";
public class Custom implements Writable {
private Text document;
private IntWritable count;
public Custom(){
setDocument("");
setCount(0);
}
public Custom(String document, int count) {
setDocument(document);
setCount(count);
}
#Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
document.readFields(in);
count.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
document.write(out);
count.write(out);
}
public int getCount() {
return count.get();
}
public void setCount(int count) {
this.count = new IntWritable(count);
}
public String getDocument() {
return document.toString();
}
public void setDocument(String document) {
this.document = new Text(document);
}
}
public static class NGramMapper extends Mapper<Text, Text, Text, Text> {
private int gramLength;
private Pattern space_pattern=Pattern.compile("[ ]");
private StringBuilder gramBuilder= new StringBuilder();
#Override
protected void setup(Context context) throws IOException, InterruptedException{
gramLength=context.getConfiguration().getInt(LENGTH, 0);
}
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
String[] tokens=space_pattern.split(value.toString());
for(int i=0;i<tokens.length;i++){
gramBuilder.setLength(0);
if(i+gramLength<=tokens.length){
for(int j=i;j<i+gramLength;j++){
gramBuilder.append(tokens[j]);
gramBuilder.append(" ");
}
context.write(new Text(gramBuilder.toString()), key);
}
}
}
}
public static class OutputReducer extends Reducer<Text, Text, Text, Custom> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text val : values) {
context.write(key,new Custom(val.toString(),1));
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
conf.setInt(LENGTH, Integer.parseInt(args[0]));
conf.setInt(THRESHOLD, Integer.parseInt(args[1]));
// Setup first MapReduce phase
Job job1 = Job.getInstance(conf, "WordOrder-first");
job1.setJarByClass(Detector.class);
job1.setMapperClass(NGramMapper.class);
job1.setReducerClass(OutputReducer.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Custom.class);
job1.setInputFormatClass(WholeFileInputFormat.class);
FileInputFormat.addInputPath(job1, new Path(args[2]));
FileOutputFormat.setOutputPath(job1, new Path(args[3]));
boolean status1 = job1.waitForCompletion(true);
if (!status1) {
System.exit(1);
}
}
}
When I compile the code to a class file i get this error:
Detector.java:147: error: non-static variable this cannot be referenced from a static context
context.write(key,new Custom(val.toString(),1));
I followed differents tutorials about custom writable and my solution is the same as the others. Any suggestion?

Static fields and methods are shared with all instances. They are for values which are specific to the class and not a specific instance. Stay out of them as much as possible.
To solve your problem, you need to instantiate an instance (create an object) of your class so the run-time can reserve memory for the instance; or change the part you are accessing it to have static access (not recommended!).
The keyword this is for referencing something that's indeed an instance (hence the this thing) and not something that's static, which in that case should be referenced by the class name instead. You are using it in a static context which is not allowed.

Related

How to implement custom writable's write and readFields for a map or list of custom object?

I have a wrapper class that contains the following
class myWrapperClass {
Map<Long, myInnerClass> myMap;
int myInt1;
int myInt2;
}
class myInnerClass {
int myInnerInt;
long myInnerLong;
}
I want to have a customWritable so far I have the following
#Override
public void write(DataOutput out) throws IOException {
out.writeInt(this.myInt1);
out.writeInt(this.myInt2);
// What do I do here
}
#Override
public void readFields(DataInput in) throws IOException {
myInt1 = in.readInt();
myInt2 = in.readShort();
// What do I do here
}
I am not sure how I would write to and read from DataOutput if I have a custom object.
Can someone point me a direction?

Extends a class that extends the Hadoop's Mapper

This is an example of a Map class [1] from the Hadoop that extends the Mapper class. [3] is the Hadoop's Mapper class.
I want to create my MyExampleMapper that extends the ExampleMapper that also extends the hadoop's Mapper [2]. I do this because I just want to set a property in the ExampleMapper so that, when I create the MyExampleMapper or other examples, I don't have to set the property myself because I have extended the ExampleMapper. Is it possible to do this?
[1] Example mapper
import org.apache.hadoop.mapreduce.Mapper;
public class ExampleMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
[2] What I want
import org.apache.hadoop.mapreduce.Mapper;
public class MyExampleMapper
extends ExampleMapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
String result = System.getProperty("job.examplemapper")
if (result.equals("true")) {
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
}
public class ExampleMapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
extends Mapper{
System.setProperty("job.examplemapper", "true");
}
[3] This is the Hadoop's Mapper class
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
public Mapper() {
}
protected void setup(Mapper.Context context) throws IOException, InterruptedException {
}
protected void map(KEYIN key, VALUEIN value, Mapper.Context context) throws IOException, InterruptedException {
context.write(key, value);
}
protected void cleanup(Mapper.Context context) throws IOException, InterruptedException {
}
public void run(Mapper.Context context) throws IOException, InterruptedException {
this.setup(context);
try {
while(context.nextKeyValue()) {
this.map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
this.cleanup(context);
}
}
public class Context extends MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
public Context(Configuration var1, TaskAttemptID conf, RecordReader<KEYIN, VALUEIN> taskid, RecordWriter<KEYOUT, VALUEOUT> reader, OutputCommitter writer, StatusReporter committer, InputSplit reporter) throws IOException, InterruptedException {
super(conf, taskid, reader, writer, committer, reporter, split);
}
}
}
import org.apache.hadoop.mapreduce.Mapper;
public class ExampleMapper<T, X, Y, Z> extends Mapper<T, X, Y, Z> {
static {
System.setProperty("job.examplemapper", "true");
}
}
Then extend it, in your program
public class MyExampleMapper
extends ExampleMapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
String result = System.getProperty("job.examplemapper")
if (result.equals("true")) {
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
}

Serialize generic field from java object to json

I've a generic field in User.java. I want to use the value of T in json.
public class User<T> {
public enum Gender {MALE, FEMALE};
private T field;
private Gender _gender;
private boolean _isVerified;
private byte[] _userImage;
public T getField() { return field; }
public boolean isVerified() { return _isVerified; }
public Gender getGender() { return _gender; }
public byte[] getUserImage() { return _userImage; }
public void setField(T f) { field = f; }
public void setVerified(boolean b) { _isVerified = b; }
public void setGender(Gender g) { _gender = g; }
public void setUserImage(byte[] b) { _userImage = b; }
}
and mapper class is:
public class App
{
public static void main( String[] args ) throws JsonParseException, JsonMappingException, IOException
{
ObjectMapper mapper = new ObjectMapper();
Name n = new Name();
n.setFirst("Harry");
n.setLast("Potter");
User<Name> user = new User<Name>();
user.setField(n);
user.setGender(Gender.MALE);
user.setVerified(false);
mapper.writeValue(new File("user1.json"), user);
}
}
and the json output is :
{"field":{"first":"Harry","last":"Potter"},"gender":"MALE","verified":false,"userImage":null}
In the output, i want Name to be appeared in place of field. How do i do that. Any help?
I think what u ask is not JSON's default behavior. Field name is the "key" of the json map, not the variable name. U should rename the field or make some String process to do it.
private T field;
change the above to this:
private T name;
You need a custom serializer to do that. That's a runtime data transformation and Jackson has no support for data transformation other than with a custom serializer (well, there's wrapping/unwrapping of value, but let's not go there). Also, you will need to know in advance every type of transformation you want to apply inside your serializer. The following works:
public class UserSerializer extends JsonSerializer<User<?>> {
private static final String USER_IMAGE_FIELD = "userImage";
private static final String VERIFIED_FIELD = "verified";
private static final String FIELD_FIELD = "field";
private static final String NAME_FIELD = "name";
#Override
public void serialize(User<?> value, JsonGenerator jgen, SerializerProvider provider) throws IOException,
JsonProcessingException {
jgen.writeStartObject();
if (value.field instanceof Name) {
jgen.writeFieldName(NAME_FIELD);
} else {
jgen.writeFieldName(FIELD_FIELD);
}
jgen.writeObject(value.field);
jgen.writeStringField("gender", value._gender.name());
jgen.writeBooleanField(VERIFIED_FIELD, value._isVerified);
if (value._userImage == null) {
jgen.writeNullField(USER_IMAGE_FIELD);
} else {
jgen.writeBinaryField(USER_IMAGE_FIELD, value._userImage);
}
jgen.writeEndObject();
}
}

Send multiple arguments to reducer-MapReduce

I've written a code which does something similar to SQL GroupBy.
The dataset I took is here:
250788681419,20090906,200937,200909,619,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,17,0,1,21.25,635-10-112-30455
public class MyMap extends Mapper<LongWritable, Text, Text, DoubleWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
double rs=Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DoubleWritable (rs));
}
}
public class MyReduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
protected void reduce(Text key, Iterator<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double sum = 0;
Iterator<DoubleWritable> iter=values.iterator();
while (iter.hasNext())
{
double val=iter.next().get();
sum = sum+ val;
}
context.write(key, new DoubleWritable(sum));
};
}
In the Mapper, as its value sends the 17th argument to the reducer to sum it. Now I also want to sum the 14th argument how do i send it to the reducer?
If your data types are the same, then creating an ArrayWritable class should work for this. The class should resemble:
public class DblArrayWritable extends ArrayWritable
{
public DblArrayWritable()
{
super(DoubleWritable.class);
}
}
Your mapper class then looks like:
public class MyMap extends Mapper<LongWritable, Text, Text, DblArrayWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
DoubleWritable[] values = new DoubleWritable[2];
values[0] = Double.parseDouble(attribute[14]);
values[1] = Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DblArrayWritable.set(values));
}
}
In your reducer you should now be able to iterate over the values of the DblArrayWritable.
Based on your sample data however it looks like they may be separate types. You may be able to implement an ObjectArrayWritable class that would do the trick, but I'm not certain of this and I can't see much to support it. If it works the class would be:
public class ObjArrayWritable extends ArrayWritable
{
public ObjArrayWritable()
{
super(Object.class);
}
}
You could handle this by simply concatenating the values and passing them as Text to the reducer which would then split them again.
Another option is to implement your own Writable class. Here's a sample of how that could work:
public static class PairWritable implements Writable
{
private Double myDouble;
private String myString;
// TODO :- Override the Hadoop serialization/Writable interface methods
#Override
public void readFields(DataInput in) throws IOException {
myLong = in.readDouble();
myString = in.readUTF();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeDouble(myLong);
out.writeUTF(myString);
}
//End of Implementation
//Getter and Setter methods for myLong and mySring variables
public void set(Double d, String s) {
myDouble = d;
myString = s;
}
public Long getLong() {
return myDouble;
}
public String getString() {
return myString;
}
}

Enum value implementing Writable interface of Hadoop

Suppose I have an enumeration:
public enum SomeEnumType implements Writable {
A(0), B(1);
private int value;
private SomeEnumType(int value) {
this.value = value;
}
#Override
public void write(final DataOutput dataOutput) throws IOException {
dataOutput.writeInt(this.value);
}
#Override
public void readFields(final DataInput dataInput) throws IOException {
this.value = dataInput.readInt();
}
}
I want to pass an instance of it as a part of some other class instance.
The equals would not work, because it will not consider the inner variable of enumeration, plus all enum instances are fixed at compile time and could not be created elsewhere.
Does it mean I could not send enums over the wire in Hadoop or there's a solution?
My normal and preferred solution for enums in Hadoop is serializing the enums through their ordinal value.
public class EnumWritable implements Writable {
static enum EnumName {
ENUM_1, ENUM_2, ENUM_3
}
private int enumOrdinal;
// never forget your default constructor in Hadoop Writables
public EnumWritable() {
}
public EnumWritable(Enum<?> arbitraryEnum) {
this.enumOrdinal = arbitraryEnum.ordinal();
}
public int getEnumOrdinal() {
return enumOrdinal;
}
#Override
public void readFields(DataInput in) throws IOException {
enumOrdinal = in.readInt();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeInt(enumOrdinal);
}
public static void main(String[] args) {
// use it like this:
EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
// let Hadoop do the write and read stuff
EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
}
}
Obviously it has drawbacks: Ordinals can change, so if you exchange ENUM_2 with ENUM_3 and read a previously serialized file, this will return the other wrong enum.
So if you know the enum class beforehand, you can write the name of your enum and use it like this:
enumInstance = EnumName.valueOf(in.readUTF());
This will use slightly more space, but it is more save to changes to your enum names.
The full example would look like this:
public class EnumWritable implements Writable {
static enum EnumName {
ENUM_1, ENUM_2, ENUM_3
}
private EnumName enumInstance;
// never forget your default constructor in Hadoop Writables
public EnumWritable() {
}
public EnumWritable(EnumName e) {
this.enumInstance = e;
}
public EnumName getEnum() {
return enumInstance;
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(enumInstance.name());
}
#Override
public void readFields(DataInput in) throws IOException {
enumInstance = EnumName.valueOf(in.readUTF());
}
public static void main(String[] args) {
// use it like this:
EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
// let Hadoop do the write and read stuff
EnumName yourDeserializedEnum = enumWritable.getEnum();
}
}
WritableUtils has convenience methods that make this easier.
WritableUtils.writeEnum(dataOutput,enumData);
enumData = WritableUtils.readEnum(dataInput,MyEnum.class);
I don't know anything about Hadoop, but based on the documentation of the interface, you could probably do it like that:
public void readFields(DataInput in) throws IOException {
// do nothing
}
public static SomeEnumType read(DataInput in) throws IOException {
int value = in.readInt();
if (value == 0) {
return SomeEnumType.A;
}
else if (value == 1) {
return SomeEnumType.B;
}
else {
throw new IOException("Invalid value " + value);
}
}

Categories

Resources