MultipleOutputFormat in hadoop - java

I'm a newbie in Hadoop. I'm trying out the Wordcount program.
Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it. http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
in my driver class i had
MultipleOutputs.addNamedOutput(conf, "even",
org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
IntWritable.class);
MultipleOutputs.addNamedOutput(conf, "odd",
org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
IntWritable.class);`
and my reduce class became this
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
MultipleOutputs mos = null;
public void configure(JobConf job) {
mos = new MultipleOutputs(job);
}
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
if (sum % 2 == 0) {
mos.getCollector("even", reporter).collect(key, new IntWritable(sum));
}else {
mos.getCollector("odd", reporter).collect(key, new IntWritable(sum));
}
//output.collect(key, new IntWritable(sum));
}
#Override
public void close() throws IOException {
// TODO Auto-generated method stub
mos.close();
}
}
Things worked , but i get LOT of files, (one odd and one even for every map-reduce)
Question is : How can i have just 2 output files (odd & even) so that every odd output of every map-reduce gets written into that odd file, and same for even.

Each reducer uses an OutputFormat to write records to. So that's why you are getting a set of odd and even files per reducer. This is by design so that each reducer can perform writes in parallel.
If you want just a single odd and single even file, you'll need to set mapred.reduce.tasks to 1. But performance will suffer, because all the mappers will be feeding into a single reducer.
Another option is to change the process the reads these files to accept multiple input files, or write a separate process that merges these files together.

I wrote a class for doing this.
Just use it your job:
job.setOutputFormatClass(m_customOutputFormatClass);
This is the my class:
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/**
* TextOutputFormat extension which enables writing the mapper/reducer's output in multiple files.<br>
* <p>
* <b>WARNING</b>: The number of different folder shuoldn't be large for one mapper since we keep an
* {#link RecordWriter} instance per folder name.
* </p>
* <p>
* In this class the folder name is defined by the written entry's key.<br>
* To change this behavior simply extend this class and override the
* {#link HdMultipleFileOutputFormat#getFolderNameExtractor()} method and create your own
* {#link FolderNameExtractor} implementation.
* </p>
*
*
* #author ykesten
*
* #param <K> - Keys type
* #param <V> - Values type
*/
public class HdMultipleFileOutputFormat<K, V> extends TextOutputFormat<K, V> {
private String folderName;
private class MultipleFilesRecordWriter extends RecordWriter<K, V> {
private Map<String, RecordWriter<K, V>> fileNameToWriter;
private FolderNameExtractor<K, V> fileNameExtractor;
private TaskAttemptContext job;
public MultipleFilesRecordWriter(FolderNameExtractor<K, V> fileNameExtractor, TaskAttemptContext job) {
fileNameToWriter = new HashMap<String, RecordWriter<K, V>>();
this.fileNameExtractor = fileNameExtractor;
this.job = job;
}
#Override
public void write(K key, V value) throws IOException, InterruptedException {
String fileName = fileNameExtractor.extractFolderName(key, value);
RecordWriter<K, V> writer = fileNameToWriter.get(fileName);
if (writer == null) {
writer = createNewWriter(fileName, fileNameToWriter, job);
if (writer == null) {
throw new IOException("Unable to create writer for path: " + fileName);
}
}
writer.write(key, value);
}
#Override
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
for (Entry<String, RecordWriter<K, V>> entry : fileNameToWriter.entrySet()) {
entry.getValue().close(context);
}
}
}
private synchronized RecordWriter<K, V> createNewWriter(String folderName,
Map<String, RecordWriter<K, V>> fileNameToWriter, TaskAttemptContext job) {
try {
this.folderName = folderName;
RecordWriter<K, V> writer = super.getRecordWriter(job);
this.folderName = null;
fileNameToWriter.put(folderName, writer);
return writer;
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
#Override
public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
Path path = super.getDefaultWorkFile(context, extension);
if (folderName != null) {
String newPath = path.getParent().toString() + "/" + folderName + "/" + path.getName();
path = new Path(newPath);
}
return path;
}
#Override
public RecordWriter<K, V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
return new MultipleFilesRecordWriter(getFolderNameExtractor(), job);
}
public FolderNameExtractor<K, V> getFolderNameExtractor() {
return new KeyFolderNameExtractor<K, V>();
}
public interface FolderNameExtractor<K, V> {
public String extractFolderName(K key, V value);
}
private static class KeyFolderNameExtractor<K, V> implements FolderNameExtractor<K, V> {
public String extractFolderName(K key, V value) {
return key.toString();
}
}
}

Multiple Output files will be generated based on number of reducers.
You can use hadoop dfs -getmerge to merged outputs

you may try to change the output file name (Reducer output), since HDFS supports append operations only, then it will collect all Temp-r-0000x files (partitions) from all reducers and put them together in one file.
here the class you need to create which overrides methods in TextOutputFormat:
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class CustomNameMultipleFileOutputFormat<K, V> extends TextOutputFormat<K, V> {
private String folderName;
private class MultipleFilesRecordWriter extends RecordWriter<K, V> {
private Map<String, RecordWriter<K, V>> fileNameToWriter;
private FolderNameExtractor<K, V> fileNameExtractor;
private TaskAttemptContext job;
public MultipleFilesRecordWriter(FolderNameExtractor<K, V> fileNameExtractor, TaskAttemptContext job) {
fileNameToWriter = new HashMap<String, RecordWriter<K, V>>();
this.fileNameExtractor = fileNameExtractor;
this.job = job;
}
#Override
public void write(K key, V value) throws IOException, InterruptedException {
String fileName = "**[FOLDER_NAME_INCLUDING_SUB_DIRS]**";//fileNameExtractor.extractFolderName(key, value);
RecordWriter<K, V> writer = fileNameToWriter.get(fileName);
if (writer == null) {
writer = createNewWriter(fileName, fileNameToWriter, job);
if (writer == null) {
throw new IOException("Unable to create writer for path: " + fileName);
}
}
writer.write(key, value);
}
#Override
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
for (Entry<String, RecordWriter<K, V>> entry : fileNameToWriter.entrySet()) {
entry.getValue().close(context);
}
}
}
private synchronized RecordWriter<K, V> createNewWriter(String folderName,
Map<String, RecordWriter<K, V>> fileNameToWriter, TaskAttemptContext job) {
try {
this.folderName = folderName;
RecordWriter<K, V> writer = super.getRecordWriter(job);
this.folderName = null;
fileNameToWriter.put(folderName, writer);
return writer;
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
#Override
public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
Path path = super.getDefaultWorkFile(context, extension);
if (folderName != null) {
String newPath = path.getParent().toString() + "/" + folderName + "/**[ONE_FILE_NAME]**";
path = new Path(newPath);
}
return path;
}
#Override
public RecordWriter<K, V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
return new MultipleFilesRecordWriter(getFolderNameExtractor(), job);
}
public FolderNameExtractor<K, V> getFolderNameExtractor() {
return new KeyFolderNameExtractor<K, V>();
}
public interface FolderNameExtractor<K, V> {
public String extractFolderName(K key, V value);
}
private static class KeyFolderNameExtractor<K, V> implements FolderNameExtractor<K, V> {
public String extractFolderName(K key, V value) {
return key.toString();
}
}
}
then Reducer/Mapper:
public static class ExtraLabReducer extends Reducer<CustomKeyComparable, Text, CustomKeyComparable, Text>
{
MultipleOutputs multipleOutputs;
#Override
protected void setup(Context context) throws IOException, InterruptedException {
multipleOutputs = new MultipleOutputs(context);
}
#Override
public void reduce(CustomKeyComparable key, Iterable<Text> values, Context context) throws IOException, InterruptedException
{
for(Text d : values)
{
**multipleOutputs.write**("batta",key, d,**"[EXAMPLE_FILE_NAME]"**);
}
}
#Override
protected void cleanup(Context context) throws IOException, InterruptedException {
multipleOutputs.close();
}
}
then in job config:
Job job = new Job(getConf(), "ExtraLab");
job.setJarByClass(ExtraLab.class);
job.setMapperClass(ExtraLabMapper.class);
job.setReducerClass(ExtraLabReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.setMapOutputKeyClass(CustomKeyComparable.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
//job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
//adding one more reducer
job.setNumReduceTasks(2);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
MultipleOutputs.addNamedOutput(job,"batta", CustomNameMultipleFileOutputFormat.class,CustomKeyComparable.class,Text.class);

Related

Add a layer of mapping on a child class

I am trying to write a class that works as a type of map. The children of this class have layers of mapping on top of basic functionality. Something like the below:
public interface MyMap<K, V> {
public V get(K key);
}
public interface Client<K, V> {
public V fetch(K key);
}
public class ComplexKey<T> {
private T _key;
public ComplexKey(T key) {
_key = key;
}
T getKey() {
return _key;
}
}
public class BasicMyMap<K, V> implements MyMap<K, V> {
private final Client<K, V> _client;
public BasicMyMap(Client<K, V> client) {
_client = client;
}
#Override
public V get(K key) {
return _client.fetch(key);
}
}
/**
*
* #param <MK> mapped key
* #param <K> key
* #param <V> value
*/
public class ComplexKeyMyMap<MK extends ComplexKey, K, V> implements BasicMyMap<MK, V> {
private Function<K, MK> _mapper;
public ComplexKeyMyMap(Client<MK, V> client, Function<K, MK> mapper) {
super(client);
_mapper = mapper;
}
public V get(K rawKey) {
return super.get(_mapper.apply(rawKey));
}
}
public static void main(String[] args) {
BasicMyMap<String, String> basicMyMap = new BasicMyMap<>(key -> "success");
assert "success".equals(basicMyMap.get("testing"));
ComplexKeyMyMap<ComplexKey, String, String> complexKeyMyMap = new ComplexKeyMyMap<>(key -> "success", (Function<Object, ComplexKey>) ComplexKey::new);
assert "success".equals(complexKeyMyMap.get("testing"));
}
In addition to the key mapping, I would like to add a layer for mapping the value that is returned as well.
So question is:
What is the common approach to this problem? I have encountered this pattern multiple times and have not found a great solution.
How can I achieve this such that the users of these classes can just rely on the MyMap interface definition.
Thanks for the help.

Extends a class that extends the Hadoop's Mapper

This is an example of a Map class [1] from the Hadoop that extends the Mapper class. [3] is the Hadoop's Mapper class.
I want to create my MyExampleMapper that extends the ExampleMapper that also extends the hadoop's Mapper [2]. I do this because I just want to set a property in the ExampleMapper so that, when I create the MyExampleMapper or other examples, I don't have to set the property myself because I have extended the ExampleMapper. Is it possible to do this?
[1] Example mapper
import org.apache.hadoop.mapreduce.Mapper;
public class ExampleMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
[2] What I want
import org.apache.hadoop.mapreduce.Mapper;
public class MyExampleMapper
extends ExampleMapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
String result = System.getProperty("job.examplemapper")
if (result.equals("true")) {
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
}
public class ExampleMapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
extends Mapper{
System.setProperty("job.examplemapper", "true");
}
[3] This is the Hadoop's Mapper class
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
public Mapper() {
}
protected void setup(Mapper.Context context) throws IOException, InterruptedException {
}
protected void map(KEYIN key, VALUEIN value, Mapper.Context context) throws IOException, InterruptedException {
context.write(key, value);
}
protected void cleanup(Mapper.Context context) throws IOException, InterruptedException {
}
public void run(Mapper.Context context) throws IOException, InterruptedException {
this.setup(context);
try {
while(context.nextKeyValue()) {
this.map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
this.cleanup(context);
}
}
public class Context extends MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
public Context(Configuration var1, TaskAttemptID conf, RecordReader<KEYIN, VALUEIN> taskid, RecordWriter<KEYOUT, VALUEOUT> reader, OutputCommitter writer, StatusReporter committer, InputSplit reporter) throws IOException, InterruptedException {
super(conf, taskid, reader, writer, committer, reporter, split);
}
}
}
import org.apache.hadoop.mapreduce.Mapper;
public class ExampleMapper<T, X, Y, Z> extends Mapper<T, X, Y, Z> {
static {
System.setProperty("job.examplemapper", "true");
}
}
Then extend it, in your program
public class MyExampleMapper
extends ExampleMapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
String result = System.getProperty("job.examplemapper")
if (result.equals("true")) {
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
}

Hadoop mapreduce custom writable static context

I'm working on an university homework and we have to use hadoop mapreduce for it. I'm trying to create a new custom writable as I want to output key-value pairs as (key, (doc_name, 1)).
public class Detector {
private static final Path TEMP_PATH = new Path("temp");
private static final String LENGTH = "gramLength";
private static final String THRESHOLD = "threshold";
public class Custom implements Writable {
private Text document;
private IntWritable count;
public Custom(){
setDocument("");
setCount(0);
}
public Custom(String document, int count) {
setDocument(document);
setCount(count);
}
#Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
document.readFields(in);
count.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
document.write(out);
count.write(out);
}
public int getCount() {
return count.get();
}
public void setCount(int count) {
this.count = new IntWritable(count);
}
public String getDocument() {
return document.toString();
}
public void setDocument(String document) {
this.document = new Text(document);
}
}
public static class NGramMapper extends Mapper<Text, Text, Text, Text> {
private int gramLength;
private Pattern space_pattern=Pattern.compile("[ ]");
private StringBuilder gramBuilder= new StringBuilder();
#Override
protected void setup(Context context) throws IOException, InterruptedException{
gramLength=context.getConfiguration().getInt(LENGTH, 0);
}
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
String[] tokens=space_pattern.split(value.toString());
for(int i=0;i<tokens.length;i++){
gramBuilder.setLength(0);
if(i+gramLength<=tokens.length){
for(int j=i;j<i+gramLength;j++){
gramBuilder.append(tokens[j]);
gramBuilder.append(" ");
}
context.write(new Text(gramBuilder.toString()), key);
}
}
}
}
public static class OutputReducer extends Reducer<Text, Text, Text, Custom> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text val : values) {
context.write(key,new Custom(val.toString(),1));
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
conf.setInt(LENGTH, Integer.parseInt(args[0]));
conf.setInt(THRESHOLD, Integer.parseInt(args[1]));
// Setup first MapReduce phase
Job job1 = Job.getInstance(conf, "WordOrder-first");
job1.setJarByClass(Detector.class);
job1.setMapperClass(NGramMapper.class);
job1.setReducerClass(OutputReducer.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Custom.class);
job1.setInputFormatClass(WholeFileInputFormat.class);
FileInputFormat.addInputPath(job1, new Path(args[2]));
FileOutputFormat.setOutputPath(job1, new Path(args[3]));
boolean status1 = job1.waitForCompletion(true);
if (!status1) {
System.exit(1);
}
}
}
When I compile the code to a class file i get this error:
Detector.java:147: error: non-static variable this cannot be referenced from a static context
context.write(key,new Custom(val.toString(),1));
I followed differents tutorials about custom writable and my solution is the same as the others. Any suggestion?
Static fields and methods are shared with all instances. They are for values which are specific to the class and not a specific instance. Stay out of them as much as possible.
To solve your problem, you need to instantiate an instance (create an object) of your class so the run-time can reserve memory for the instance; or change the part you are accessing it to have static access (not recommended!).
The keyword this is for referencing something that's indeed an instance (hence the this thing) and not something that's static, which in that case should be referenced by the class name instead. You are using it in a static context which is not allowed.

How to sort word count program by value or count?

How do I sort my wordcount output by count/value rather than by the key.
In the normal case, the output is
hi 2
hw 3
wr 1
r 3
but the desired output is
wr 1
hi 2
hw 3
r 3
My code is:
public class sortingprog {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(one,word);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<IntWritable,Text, IntWritable, Text> {
public void reduce(Iterator<IntWritable> key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException {
int sum=0;
while (key.hasNext()) {
sum+=key.next().get();
}
output.collect(new IntWritable(sum),value);
}
#Override
public void reduce(IntWritable arg0, Iterator<Text> arg1,
OutputCollector<IntWritable, Text> arg2, Reporter arg3)
throws IOException {
// TODO Auto-generated method stub
}
}
public static class GroupComparator extends WritableComparator {
protected GroupComparator() {
super(IntWritable.class, true);
}
#SuppressWarnings("rawtypes")
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
IntWritable v1 = (IntWritable) w1;
IntWritable v2 = (IntWritable) w2;
return -1 * v1.compareTo(v2);
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(sortingprog.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setOutputValueGroupingComparator(GroupComparator.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
What you look for is called "Secondary Sort". Here you can find two tutorials of how to achieve a value short in your MapReduce:
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
http://codingjunkie.net/secondary-sort/
you need to do the following.
Create a custom writable comparable which uses both the fields.
In the compareTo method provide the implementation logic of comparing the custom writable. This is called by the Reducer later to sort the keys. That is key in the whole implementation. Here in the compareTo just use the second field to compare the values.
public CustomPair implements WritableComparable{
public CustomPair(String fld1,int fld2){
this.fld1=fld1; //wr
this.fld2=fld2;//1
}
#Override
public int compareTo(Object o2) {
CustomPair other = (CustomPair ) o2;
int compareValue = other.fld2().compareTo(this.fld2());
return compareValue;
}
public void write(DataOutput out) throws IOException {
dataOutput.writeUTF(fld1);
dataOutput.writeInt(fld2);
}
// You have to implement the rest of the methods.
}
Let me know if you need additional help.

Send multiple arguments to reducer-MapReduce

I've written a code which does something similar to SQL GroupBy.
The dataset I took is here:
250788681419,20090906,200937,200909,619,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,17,0,1,21.25,635-10-112-30455
public class MyMap extends Mapper<LongWritable, Text, Text, DoubleWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
double rs=Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DoubleWritable (rs));
}
}
public class MyReduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
protected void reduce(Text key, Iterator<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double sum = 0;
Iterator<DoubleWritable> iter=values.iterator();
while (iter.hasNext())
{
double val=iter.next().get();
sum = sum+ val;
}
context.write(key, new DoubleWritable(sum));
};
}
In the Mapper, as its value sends the 17th argument to the reducer to sum it. Now I also want to sum the 14th argument how do i send it to the reducer?
If your data types are the same, then creating an ArrayWritable class should work for this. The class should resemble:
public class DblArrayWritable extends ArrayWritable
{
public DblArrayWritable()
{
super(DoubleWritable.class);
}
}
Your mapper class then looks like:
public class MyMap extends Mapper<LongWritable, Text, Text, DblArrayWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
DoubleWritable[] values = new DoubleWritable[2];
values[0] = Double.parseDouble(attribute[14]);
values[1] = Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DblArrayWritable.set(values));
}
}
In your reducer you should now be able to iterate over the values of the DblArrayWritable.
Based on your sample data however it looks like they may be separate types. You may be able to implement an ObjectArrayWritable class that would do the trick, but I'm not certain of this and I can't see much to support it. If it works the class would be:
public class ObjArrayWritable extends ArrayWritable
{
public ObjArrayWritable()
{
super(Object.class);
}
}
You could handle this by simply concatenating the values and passing them as Text to the reducer which would then split them again.
Another option is to implement your own Writable class. Here's a sample of how that could work:
public static class PairWritable implements Writable
{
private Double myDouble;
private String myString;
// TODO :- Override the Hadoop serialization/Writable interface methods
#Override
public void readFields(DataInput in) throws IOException {
myLong = in.readDouble();
myString = in.readUTF();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeDouble(myLong);
out.writeUTF(myString);
}
//End of Implementation
//Getter and Setter methods for myLong and mySring variables
public void set(Double d, String s) {
myDouble = d;
myString = s;
}
public Long getLong() {
return myDouble;
}
public String getString() {
return myString;
}
}

Categories

Resources