I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). This is the code.
Mapper, reducer and the main driver:
public class RPhase2 extends Configured implements Tool
{
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, IntWritable, RPhase2Value>
{
public void map(LongWritable key, Text value,
OutputCollector<IntWritable, RPhase2Value> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String[] parts = line.split(" +");
// key format <rid1>
IntWritable mapKey = new IntWritable(Integer.valueOf(parts[0]));
// value format <rid2, dist>
RPhase2Value np2v = new RPhase2Value(Integer.valueOf(parts[1]), Float.valueOf(parts[2]));
System.out.println("############### key: " + mapKey.toString() + " np2v: " + np2v.toString());
output.collect(mapKey, np2v);
}
}
public static class Reduce extends MapReduceBase
implements Reducer<IntWritable, RPhase2Value, NullWritable, Text>
{
int numberOfPartition;
int knn;
class Record {...}
class RecordComparator implements Comparator<Record> {...}
public void configure(JobConf job)
{
numberOfPartition = job.getInt("numberOfPartition", 2);
knn = job.getInt("knn", 3);
System.out.println("########## configuring!");
}
public void reduce(IntWritable key, Iterator<RPhase2Value> values,
OutputCollector<NullWritable, Text> output,
Reporter reporter) throws IOException
{
//initialize the pq
RecordComparator rc = new RecordComparator();
PriorityQueue<Record> pq = new PriorityQueue<Record>(knn + 1, rc);
System.out.println("Phase 2 is at reduce");
System.out.println("########## key: " + key.toString());
// For each record we have a reduce task
// value format <rid1, rid2, dist>
while (values.hasNext())
{
RPhase2Value np2v = values.next();
int id2 = np2v.getFirst().get();
float dist = np2v.getSecond().get();
Record record = new Record(id2, dist);
pq.add(record);
if (pq.size() > knn)
pq.poll();
}
while(pq.size() > 0)
{
output.collect(NullWritable.get(), new Text(key.toString() + " " + pq.poll().toString()));
//break; // only ouput the first record
}
} // reduce
} // Reducer
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), RPhase2.class);
conf.setJobName("RPhase2");
conf.setMapOutputKeyClass(IntWritable.class);
conf.setMapOutputValueClass(RPhase2Value.class);
conf.setOutputKeyClass(NullWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
int numberOfPartition = 0;
List<String> other_args = new ArrayList<String>();
for(int i = 0; i < args.length; ++i)
{
try {
if ("-m".equals(args[i])) {
//conf.setNumMapTasks(Integer.parseInt(args[++i]));
++i;
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else if ("-p".equals(args[i])) {
numberOfPartition = Integer.parseInt(args[++i]);
conf.setInt("numberOfPartition", numberOfPartition);
} else if ("-k".equals(args[i])) {
int knn = Integer.parseInt(args[++i]);
conf.setInt("knn", knn);
System.out.println(knn + "~ hi");
} else {
other_args.add(args[i]);
}
conf.setNumReduceTasks(numberOfPartition * numberOfPartition);
//conf.setNumReduceTasks(1);
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " + args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RPhase2(), args);
}
} // RPhase2
When I run this the mapper is successful but the job terminates suddenly, and the reducer never instantiated. Moreover, no errors are ever printed (even in the log files). I know that also because the print statements in the configuration of the Reducer never get printed. Output:
15/06/15 14:00:37 INFO mapred.LocalJobRunner: map task executor complete.
15/06/15 14:00:38 INFO mapreduce.Job: map 100% reduce 0%
15/06/15 14:00:38 INFO mapreduce.Job: Job job_local833125918_0001 completed successfully
15/06/15 14:00:38 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=12505456
FILE: Number of bytes written=14977422
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11408
HDFS: Number of bytes written=8724
HDFS: Number of read operations=216
HDFS: Number of large read operations=0
HDFS: Number of write operations=99
Map-Reduce Framework
Map input records=60
Map output records=60
Input split bytes=963
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=14
Total committed heap usage (bytes)=1717567488
File Input Format Counters
Bytes Read=2153
File Output Format Counters
Bytes Written=1645
What I have done so far:
I have been looking at similar questions, and I found the most frequent problem is not configuring the output formats when the output of the mapper and reducer are different which is done in the code above: conf.setMapOutputKeyClass(Class); conf.setMapOutputValueClass(Class);
In another post I found a suggestion to change reduce(..., Iterator <...>, ...) to (..., Iterable <...>, ...) which gave me trouble compiling. I could no longer use .getNext() and .next() methods as well as got this error:
error: Reduce is not abstract and does not override abstract method reduce(IntWritable,Iterator,OutputCollector,Reporter) in Reducer
If anyone has any hints or suggestions on what I can try to find what the issue is I would be very appreciative!
Just a note that I have posted a question about my problem before in here (Hadoop kNN join algorithm stuck at map 100% reduce 0%) but it did not get enough attention so I wanted to re-ask this from a different perspective. You could use this link for more details on my log files.
I have figured out the problem and it was something silly. If you notice in the code above, numberOfPartition is set to 0 before the arguments are read, and the number of reducers are set to numberOfPartition * numberOfPartition. I, as the user did not change the number of partitions parameter (mostly because I simply copy pasted the argument line from their provided README) so that's why the reducer never even started.
Related
Looking for some help please.
The mapreduce job executes but no output is produced. It is a simple program to count the total number of words in a file. I began very simple to ensure that it works with a txt file which has one row with the following content:
tiny country second largest country second tiny food exporter second
second second
Unfortunately it does not, any suggestion about where to look next would be appreciated. I have cut and pasted the last bit of the output log.
File System Counters
FILE: Number of bytes read=890
FILE: Number of bytes written=947710
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=87
Map output materialized bytes=95
Input split bytes=198
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=95
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=7
Total committed heap usage (bytes)=468713472
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=82
File Output Format Counters
Bytes Written=97
Process finished with exit code 0
public class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] datas = line.split("\t");
for(String data: datas) {
Text outputKey = new Text(data);
IntWritable outputValue = new IntWritable();
context.write(outputKey, outputValue);
}
}
}
public class Reduce extends Reducer<Text, IntWritable, Text,
IntWritable> {
#Override
public void reduce(final Text outputKey,
final Iterable<IntWritable> values,
final Context context)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
context.write(outputKey, new IntWritable(sum));
}
}
public class Main extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf());
job.setJobName("WordCount");
job.setJarByClass(Main.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
Path inputFilePath = new Path("/Users/francesco/input/input.txt");
Path outputFilePath = new Path("/Users/francesco/output/first");
FileInputFormat.addInputPath(job, inputFilePath);
FileOutputFormat.setOutputPath(job, outputFilePath);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new Main(), args);
System.exit(exitCode);
}
}
You don't set any IntWritable value to emit in your mapper:
IntWritable outputValue = new IntWritable();
Need to replace by:
IntWritable outputValue = new IntWritable(1);
I have two mapper classes which simply create key-value pairs my main logic is supposed to be in the reducer part.I am trying to compare data from two different text files.
My mapper class is
public static class Map extends
Mapper<LongWritable, Text, Text, Text> {
private String ky,vl="a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl = tokens[1].trim();
ky = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky),new Text(vl));
}
}
My second mapper is
public static class Map2 extends
Mapper<LongWritable, Text, Text, Text> {
private String ky2,vl2 = "a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl2 = tokens[1].trim();
ky2 = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky2),new Text(vl2));
}
}
Reducer class is
public static class Reduce extends
Reducer<Text, Text, Text, Text> {
private String rslt = "l";
public void reduce(Text key, Iterator<Text> values,Context context) throws IOException, InterruptedException {
int count = 0;
while(values.hasNext()){
count++;
}
rslt = Integer.toString(count);
if(count>1){
context.write(key,new Text(rslt));
}
}
}
And my main method is
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(CompareTwoFiles.class);
job.setJobName("Compare Two Files and Identify the Difference");
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]),
TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, new Path(args[1]),
TextInputFormat.class, Map2.class);
job.waitForCompletion(true);
output
File System Counters
FILE: Number of bytes read=361621
FILE: Number of bytes written=1501806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=552085
HDFS: Number of bytes written=150962
HDFS: Number of read operations=28
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=10783
Map output records=10783
Map output bytes=150962
Map output materialized bytes=172540
Input split bytes=507
Combine input records=0
Combine output records=0
Reduce input groups=7985
Reduce shuffle bytes=172540
Reduce input records=10783
Reduce output records=10783
Spilled Records=21566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=12
Total committed heap usage (bytes)=928514048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=150962
Hadoop n00b here.
I have installed Hadoop 2.6.0 on a server where I have stored twelve json files I want to perform MapReduce operations on. These files are large, ranging from 2-5 gigabytes each.
The structure of the JSON files is an array of JSON objects. Snippet of two objects below:
[{"campus":"Gløshaugen","building":"Varmeteknisk og Kjelhuset","floor":"4. etasje","timestamp":1412121618,"dayOfWeek":3,"hourOfDay":2,"latitude":63.419161638078066,"salt_timestamp":1412121602,"longitude":10.404867443910122,"id":"961","accuracy":56.083199914753536},{"campus":"Gløshaugen","building":"IT-Vest","floor":"2. etasje","timestamp":1412121612,"dayOfWeek":3,"hourOfDay":2,"latitude":63.41709424828986,"salt_timestamp":1412121602,"longitude":10.402167488838765,"id":"982","accuracy":7.315199988880896}]
I want to perform MapReduce operations based on the fields building and timestamp. At least in the beginning until I get the hang of this. E.g. mapReduce the data where building equals a parameter and timestamp is greater than X and less than Y. The relevant fields I need after the reduce process is latitude and longitude.
I know there are different tools(Hive, HBase, PIG, Spark etc) you can use with Hadoop that might solve this easier, but my boss wants an evaluation of the MapReduce performance of standalone Hadoop.
So far I have created the main class triggering the map and reduce classes, implemented what I believe is a start in the map class, but I'm stuck on the reduce class. Below is what I have so far.
public class Hadoop {
public static void main(String[] args) throws Exception {
try {
Configuration conf = new Configuration();
Job job = new Job(conf, "maze");
job.setJarByClass(Hadoop.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path inPath = new Path("hdfs://xxx.xxx.106.23:50070/data.json");
FileInputFormat.addInputPath(job, inPath);
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}catch (Exception e){
e.printStackTrace();
}
}
}
Mapper:
public class Map extends org.apache.hadoop.mapreduce.Mapper{
private Text word = new Text();
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
try {
JSONObject jo = new JSONObject(value.toString());
String latitude = jo.getString("latitude");
String longitude = jo.getString("longitude");
long timestamp = jo.getLong("timestamp");
String building = jo.getString("building");
StringBuilder sb = new StringBuilder();
sb.append(latitude);
sb.append("/");
sb.append(longitude);
sb.append("/");
sb.append(timestamp);
sb.append("/");
sb.append(building);
sb.append("/");
context.write(new Text(sb.toString()),value);
}catch (JSONException e){
e.printStackTrace();
}
}
}
Reducer:
public class Reducer extends org.apache.hadoop.mapreduce.Reducer{
private Text result = new Text();
protected void reduce(Text key, Iterable<Text> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException {
}
}
UPDATE
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
private static String BUILDING;
private static int tsFrom;
private static int tsTo;
try {
JSONArray ja = new JSONArray(key.toString());
StringBuilder sb;
for(int n = 0; n < ja.length(); n++)
{
JSONObject jo = ja.getJSONObject(n);
String latitude = jo.getString("latitude");
String longitude = jo.getString("longitude");
int timestamp = jo.getInt("timestamp");
String building = jo.getString("building");
if (BUILDING.equals(building) && timestamp < tsTo && timestamp > tsFrom) {
sb = new StringBuilder();
sb.append(latitude);
sb.append("/");
sb.append(longitude);
context.write(new Text(sb.toString()), value);
}
}
}catch (JSONException e){
e.printStackTrace();
}
}
#Override
public void configure(JobConf jobConf) {
System.out.println("configure");
BUILDING = jobConf.get("BUILDING");
tsFrom = Integer.parseInt(jobConf.get("TSFROM"));
tsTo = Integer.parseInt(jobConf.get("TSTO"));
}
This works for a small data set. Since I am working with LARGE json files, I get Java Heap Space exception. Since I am not familiar with Hadoop, I'm having trouble understanding how MapR can read the data without getting outOfMemoryError.
If you simply want a list of LONG/LAT under the constraint of building=something and timestamp=somethingelse.
This is a simple filter operation; for this you do not need a reducer. In the mapper you should check if the current JSON satisfies the condition, and only then write it out to the context. If it fails to satisfy the condition you don't want it in the output.
The output should be LONG/LAT (no building/timestamp, unless you want them there as well)
If no reducer is present, the output of the mappers is the output of the job, which in your case is sufficient.
As for the code:
your driver should pass the building ID and the timestamp range to the mapper, using the job configuration. Anything you put there will be available to all your mappers.
Configuration conf = new Configuration();
conf.set("Building", "123");
conf.set("TSFROM", "12300000000");
conf.set("TSTO", "12400000000");
Job job = new Job(conf);
your mapper class needs to implement JobConfigurable.configure; in there you will read from the configuration object into local static variables
private static String BUILDING;
private static Long tsFrom;
private static Long tsTo;
public void configure(JobConf job) {
BUILDING = job.get("Building");
tsFrom = Long.parseLong(job.get("TSFROM"));
tsTo = Long.parseLong(job.get("TSTO"));
}
Now, your map function needs to check:
if (BUILDING.equals(building) && timestamp < TSTO && timestamp > TSFROM) {
sb = new StringBuilder();
sb.append(latitude);
sb.append("/");
sb.append(longitude);
context.write(new Text(sb.toString()),1);
}
this means any rows belonging to other buildings or outside the timestamp, would not appear in the result.
I have a map reduce program that runs perfectly when run in stand-alone mode but when I run it on Hadoop Cluster at my school, an exception is happening in the Reducer. I have no clue what exception it is. I came to know this as when I keep a try/catch in reducer, the job passes but empty output. When I don't keep the try/catch, job fails. Since it is a school cluster, I do not have access to any of the job trackers or other files. All I can find is through programatically only. Is there a way I can find what exception happened on hadoop during run time ?
Following are snippets of my code
public static class RowMPreMap extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
private Text keyText = new Text();
private Text valText = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
// Input: (lineNo, lineContent)
// Split each line using seperator based on the dataset.
String line[] = null;
line = value.toString().split(Settings.INPUT_SEPERATOR);
keyText.set(line[0]);
valText.set(line[1] + "," + line[2]);
// Output: (userid, "movieid,rating")
output.collect(keyText, valText);
}
}
public static class RowMPreReduce extends MapReduceBase implements
Reducer<Text, Text, Text, Text> {
private Text valText = new Text();
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
// Input: (userid, List<movieid, rating>)
float sum = 0.0F;
int totalRatingCount = 0;
ArrayList<String> movieID = new ArrayList<String>();
ArrayList<Float> rating = new ArrayList<Float>();
while (values.hasNext()) {
String[] movieRatingPair = values.next().toString().split(",");
movieID.add(movieRatingPair[0]);
Float parseRating = Float.parseFloat(movieRatingPair[1]);
rating.add(parseRating);
sum += parseRating;
totalRatingCount++;
}
float average = ((float) sum) / totalRatingCount;
for (int i = 0; i < movieID.size(); i++) {
valText.set("M " + key.toString() + " " + movieID.get(i) + " "
+ (rating.get(i) - average));
output.collect(null, valText);
}
// Output: (null, <M userid, movieid, normalizedrating>)
}
}
Exception happens in the above reducer. Below is the config
public void normalizeM() throws IOException, InterruptedException {
JobConf conf1 = new JobConf(UVDriver.class);
conf1.setMapperClass(RowMPreMap.class);
conf1.setReducerClass(RowMPreReduce.class);
conf1.setJarByClass(UVDriver.class);
conf1.setMapOutputKeyClass(Text.class);
conf1.setMapOutputValueClass(Text.class);
conf1.setOutputKeyClass(Text.class);
conf1.setOutputValueClass(Text.class);
conf1.setKeepFailedTaskFiles(true);
conf1.setInputFormat(TextInputFormat.class);
conf1.setOutputFormat(TextOutputFormat.class);
FileInputFormat.addInputPath(conf1, new Path(Settings.INPUT_PATH));
FileOutputFormat.setOutputPath(conf1, new Path(Settings.TEMP_PATH + "/"
+ Settings.NORMALIZE_DATA_PATH_TEMP));
JobConf conf2 = new JobConf(UVDriver.class);
conf2.setMapperClass(ColMPreMap.class);
conf2.setReducerClass(ColMPreReduce.class);
conf2.setJarByClass(UVDriver.class);
conf2.setMapOutputKeyClass(Text.class);
conf2.setMapOutputValueClass(Text.class);
conf2.setOutputKeyClass(Text.class);
conf2.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(conf2, new Path(Settings.TEMP_PATH + "/"
+ Settings.NORMALIZE_DATA_PATH_TEMP));
FileOutputFormat.setOutputPath(conf2, new Path(Settings.TEMP_PATH + "/"
+ Settings.NORMALIZE_DATA_PATH));
Job job1 = new Job(conf1);
Job job2 = new Job(conf2);
JobControl jobControl = new JobControl("jobControl");
jobControl.addJob(job1);
jobControl.addJob(job2);
job2.addDependingJob(job1);
handleRun(jobControl);
}
I caught the exception in reducer and write the stack trace to a file in the file system. I know this is the dirtiest possible way of doing this, but I have no option at this point. Following is the code if it helps any one in future. Put the code in catch block.
String valueString = "";
while (values.hasNext()) {
valueString += values.next().toString();
}
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
String exceptionAsString = sw.toString();
Path pt = new Path("errorfile");
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
br.write(exceptionAsString + "\nkey: " + key.toString() + "\nvalues: " + valueString);
br.close();
Inputs to do this in a clean way are welcome.
On a sider note, Eventually I found it is a NumberFormatException. Counters would not have helped me identify this. Later I realized the format of splitting input in stand-alone and on cluster is happening in different fashion, which I am yet to find the reason.
Even if you don't have access to the server, you can get the counters for a job:
Counters counters = job.getCounters();
and dump the set of counters to your local console. These counters will show, among other things, the counts for the number of records input to and written from the mappers and reducers. The counters that have value zero indicate the problem location in your workflow. You can instrument your own counters to help debug/monitor the flow.
I'm new on hadoop.
I have a MapReduce job which is supposed to get an input from Hdfs and write the output of the reducer to Hbase. I haven't found any good example.
Here's the code, the error runing this example is Type mismatch in map, expected ImmutableBytesWritable recieved IntWritable.
Mapper Class
public static class AddValueMapper extends Mapper < LongWritable,
Text, ImmutableBytesWritable, IntWritable > {
/* input <key, line number : value, full line>
* output <key, log key : value >*/
public void map(LongWritable key, Text value,
Context context)throws IOException,
InterruptedException {
byte[] key;
int value, pos = 0;
String line = value.toString();
String p1 , p2 = null;
pos = line.indexOf("=");
//Key part
p1 = line.substring(0, pos);
p1 = p1.trim();
key = Bytes.toBytes(p1);
//Value part
p2 = line.substring(pos +1);
p2 = p2.trim();
value = Integer.parseInt(p2);
context.write(new ImmutableBytesWritable(key),new IntWritable(value));
}
}
Reducer Class
public static class AddValuesReducer extends TableReducer<
ImmutableBytesWritable, IntWritable, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
long total =0;
// Loop values
while(values.iterator().hasNext()){
total += values.iterator().next().get();
}
// Put to HBase
Put put = new Put(key.get());
put.add(Bytes.toBytes("data"), Bytes.toBytes("total"),
Bytes.toBytes(total));
Bytes.toInt(key.get()), total));
context.write(key, put);
}
}
I had a similar job only with HDFS and works fine.
Edited 18-06-2013. The college project finished successfully two years ago. For job configuration (driver part) check correct answer.
Here is the code which will solve your problem
Driver
HBaseConfiguration conf = HBaseConfiguration.create();
Job job = new Job(conf,"JOB_NAME");
job.setJarByClass(yourclass.class);
job.setMapperClass(yourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Intwritable.class);
FileInputFormat.setInputPaths(job, new Path(inputPath));
TableMapReduceUtil.initTableReducerJob(TABLE,
yourReducer.class, job);
job.setReducerClass(yourReducer.class);
job.waitForCompletion(true);
Mapper&Reducer
class yourMapper extends Mapper<LongWritable, Text, Text,IntWritable> {
//#overide map()
}
class yourReducer
extends
TableReducer<Text, IntWritable,
ImmutableBytesWritable>
{
//#override reduce()
}
Not sure why the HDFS version works: normaly you have to set the input format for the job, and FileInputFormat is an abstract class. Perhaps you left some lines out? such as
job.setInputFormatClass(TextInputFormat.class);
The best and fastest way to BulkLoad data in HBase is use of HFileOutputFormat and CompliteBulkLoad utility.
You will find a sample code here:
Hope this will be useful :)
public void map(LongWritable key, Text value,
Context context)throws IOException,
InterruptedException {
change this to immutableBytesWritable, intwritable.
I am not sure..hope it works