Error when using MultithreadMapper - java

I have got the same problem as mentioned in this question (Type mismatch in key from map when replacing Mapper with MultithreadMapper), but the answer do not work for me.
The error message i get looks like the following:
13/09/17 10:37:38 INFO mapred.JobClient: Task Id : attempt_201309170943_0006_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:690)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Here is my main method:
public static int main(String[] init_args) throws Exception {
Configuration config = new Configuration();
if (args.length != 5) {
System.out.println("Invalid Arguments");
print_usage();
throw new IllegalArgumentException();
}
config.set("myfirstdata", args[0]);
config.set("myseconddata", args[1]);
config.set("mythirddata", args[2]);
config.set("mykeyattribute", "GK");
config.setInt("myy", 50);
config.setInt("myx", 49);
// additional attributes
config.setInt("myobjectid", 1);
config.setInt("myplz", 3);
config.setInt("mygenm", 4);
config.setInt("mystnm", 6);
config.setInt("myhsnr", 7);
config.set("mapred.textoutputformat.separator", ";");
Job job = new Job(config);
job.setJobName("MySample");
// set the outputs for the Job
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// set the outputs for the Job
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
MultithreadedMapper.setMapperClass(job, MyMapper.class);
job.setReducerClass(MyReducer.class);
// In our case, the combiner is the same as the reducer. This is
// possible
// for reducers that are both commutative and associative
job.setCombinerClass(MyReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
TextInputFormat.setInputPaths(job, new Path(args[3]));
TextOutputFormat.setOutputPath(job, new Path(args[4]));
job.setJarByClass(MySampleDriver.class);
MultithreadedMapper.setNumberOfThreads(job, 2);
return job.waitForCompletion(true) ? 0 : 1;
}
The mapper code looks like this:
public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
...
/**
* Sets up mapper with filter geometry provided as argument[0] to the jar
*/
#Override
public void setup(Context context) {
...
}
#Override
public void map(LongWritable key, Text val, Context context)
throws IOException, InterruptedException {
...
// We know that the first line of the CSV is just headers, so at byte
// offset 0 we can just return
if (key.get() == 0)
return;
String line = val.toString();
String[] values = line.split(";");
float latitude = Float.parseFloat(values[latitudeIndex]);
float longitude = Float.parseFloat(values[longitudeIndex]);
...
// Create our Point directly from longitude and latitude
Point point = new Point(longitude, latitude);
IntWritable one = new IntWritable();
if (...) {
int name = ...
one.set(name);
String out = ...
context.write(new Text(out), one);
} else {
String out = ...
context.write(new Text(out), new IntWritable(-1));
}
}
}

You forgot to set the mapper class. You need to add job.setMapperClass(MultithreadedMapper.class); to your codes.

Related

custom partitioner in Hadoop error java.lang.NoSuchMethodException:- <init>()

I am trying to make a custom partitioner to allocate each unique key to a single reducer. this was after the default HashPartioner failed
Alternative to the default hashpartioner provided with hadoop
I keep getting the following error. It has something to do with the constructor not receiving its arguments, from what I can tell from doing some research. but in this context, with hadoop, aren't the arguments passed automatically by the framework? I cant find an error in the code
18/04/20 17:06:51 INFO mapred.JobClient: Task Id : attempt_201804201340_0007_m_000000_1, Status : FAILED
java.lang.RuntimeException: java.lang.NoSuchMethodException: biA3pipepart$parti.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:587)
This is my partitioner:
public class Parti extends Partitioner<Text, Text> {
String partititonkey;
int result=0;
#Override
public int getPartition(Text key, Text value, int numPartitions) {
String partitionKey = key.toString();
if(numPartitions >= 9){
if(partitionKey.charAt(0) =='0' ){
if(partitionKey.charAt(2)=='0' )
result= 0;
else
if(partitionKey.charAt(2)=='1' )
result= 1;
else
result= 2;
}else
if(partitionKey.charAt(0)=='1'){
if(partitionKey.charAt(2)=='0' )
result= 3;
else
if(partitionKey.charAt(2)=='1' )
result= 4;
else
result= 5;
}else
if(partitionKey.charAt(0)=='2' ){
if(partitionKey.charAt(2)=='0' )
result= 6;
else
if(partitionKey.charAt(2)=='1' )
result= 7;
else
result= 8;
}
} //
else
result= 0;
return result;
}// close method
}// close class
My mapper signature
public static class JoinsMap extends Mapper<LongWritable,Text,Text,Text>{
public void Map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
My reducer signiture
public static class JoinsReduce extends Reducer<Text,Text,Text,Text>{
public void Reduce (Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
main class:
public static void main( String[] args ) throws Exception {
Configuration conf1 = new Configuration();
Job job1 = new Job(conf1, "biA3pipepart");
job1.setJarByClass(biA3pipepart.class);
job1.setNumReduceTasks(9);//***
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
job1.setMapperClass(JoinsMap.class);
job1.setReducerClass(JoinsReduce.class);
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
job1.setPartitionerClass(Parti.class); //+++
// inputs to map.
FileInputFormat.addInputPath(job1, new Path(args[0]));
// single output from reducer.
FileOutputFormat.setOutputPath(job1, new Path(args[1]));
job1.waitForCompletion(true);
}
keys emitted by Mapper are the following:
0,0
0,1
0,2
1,0
1,1
1,2
2,0
2,1
2,2
and the Reducer only writes keys and values it receives.
SOLVED
I just added static to my Parti class like the mapper and reducer classes as suggested by comment (user238607).
public static class Parti extends Partitioner<Text, Text> {

mapreduce to read hive table and write to hdfs location with context

I am looking out for the mapreduce program to read from one hive table and write to hdfs location of first column value of each record. And it should contain only map phase not reducer phase.
Below is the mapper
public class Map extends Mapper<WritableComparable, HCatRecord, NullWritable, IntWritable> {
protected void map( WritableComparable key,
HCatRecord value,
org.apache.hadoop.mapreduce.Mapper<WritableComparable, HCatRecord,
NullWritable, IntWritable>.Context context)
throws IOException, InterruptedException {
// The group table from /etc/group has name, 'x', id
// groupname = (String) value.get(0);
int id = (Integer) value.get(1);
// Just select and emit the name and ID
context.write(null, new IntWritable(id));
}
}
Main class
public class mapper1 {
public static void main(String[] args) throws Exception {
mapper1 m=new mapper1();
m.run(args);
}
public void run(String[] args) throws IOException, Exception, InterruptedException {
Configuration conf =new Configuration();
// Get the input and output table names as arguments
String inputTableName = args[0];
// Assume the default database
String dbName = "xademo";
Job job = new Job(conf, "UseHCat");
job.setJarByClass(mapper1.class);
HCatInputFormat.setInput(job, dbName, inputTableName);
job.setMapperClass(Map.class);
// An HCatalog record as input
job.setInputFormatClass(HCatInputFormat.class);
// Mapper emits a string as key and an integer as value
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(IntWritable.class);
FileOutputFormat.setOutputPath((JobConf) conf, new Path(args[1]));
job.waitForCompletion(true);
}
}
Is there anything wrong in this code?
This is giving some error as Numberformat exception from string 5s. I am not sure where it is being taken from. Showing error at below line HCatInputFormat.setInput()

Using MultipleOutputs without context.write results empty files

I don't know how to use MultipleOutputs class. I'm using it to create multiple output files. Following is my Driver class's code snippet
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(CustomKeyValueTest.class);//class with mapper and reducer
job.setOutputKeyClass(CustomKey.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(CustomKey.class);
job.setMapOutputValueClass(CustomValue.class);
job.setMapperClass(CustomKeyValueTestMapper.class);
job.setReducerClass(CustomKeyValueTestReducer.class);
job.setInputFormatClass(TextInputFormat.class);
Path in = new Path(args[1]);
Path out = new Path(args[2]);
out.getFileSystem(conf).delete(out, true);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
MultipleOutputs.addNamedOutput(job, "islnd" , TextOutputFormat.class, CustomKey.class, Text.class);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
MultipleOutputs.setCountersEnabled(job, true);
boolean status = job.waitForCompletion(true);
and in Reducer, I used MultipleOutputs like this,
private MultipleOutputs<CustomKey, Text> multipleOutputs;
#Override
public void setup(Context context) throws IOException, InterruptedException {
multipleOutputs = new MultipleOutputs<>(context);
}
#Override
public void reduce(CustomKey key, Iterable<CustomValue> values, Context context) throws IOException, InterruptedException {
...
multipleOutputs.write("islnd", key, pop, key.toString());
//context.write(key, pop);
}
public void cleanup() throws IOException, InterruptedException {
multipleOutputs.close();
}
}
When I use context.write I get output files with data in it. But When I remove context.write the output files are empty. But I don't want to call context.write because it creates extra file part-r-00000. As Stated here(last para in the description of class) I used LazyOutputFormat to avoid part-r-00000 file. But still didn't work.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
This means , in case you are not creating any output, dont create empty files.
Can you please look at hadoop counters and find
1. map.output.records
2. reduce.input.groups
3. reduce.input.records to verify if your mappers are sending any data to mapper.
Code with IT for multioutput is
http://bytepadding.com/big-data/map-reduce/multipleoutputs-in-map-reduce/

MapReduce HBase NullPointerException

I am beginner at bigdata. First I wanna try how mapreduce work with hbase. The scenario is summing of a field uas in my hbase use map reduce based on date which is as primary key. Here is my table :
Hbase::Table - test
ROW COLUMN+CELL
10102010#1 column=cf:nama, timestamp=1418267197429, value=jonru
10102010#1 column=cf:quiz, timestamp=1418267197429, value=\x00\x00\x00d
10102010#1 column=cf:uas, timestamp=1418267197429, value=\x00\x00\x00d
10102010#1 column=cf:uts, timestamp=1418267197429, value=\x00\x00\x00d
10102010#2 column=cf:nama, timestamp=1418267180874, value=jonru
10102010#2 column=cf:quiz, timestamp=1418267180874, value=\x00\x00\x00d
10102010#2 column=cf:uas, timestamp=1418267180874, value=\x00\x00\x00d
10102010#2 column=cf:uts, timestamp=1418267180874, value=\x00\x00\x00d
10102012#1 column=cf:nama, timestamp=1418267156542, value=jonru
10102012#1 column=cf:quiz, timestamp=1418267156542, value=\x00\x00\x00\x0A
10102012#1 column=cf:uas, timestamp=1418267156542, value=\x00\x00\x00\x0A
10102012#1 column=cf:uts, timestamp=1418267156542, value=\x00\x00\x00\x0A
10102012#2 column=cf:nama, timestamp=1418267166524, value=jonru
10102012#2 column=cf:quiz, timestamp=1418267166524, value=\x00\x00\x00\x0A
10102012#2 column=cf:uas, timestamp=1418267166524, value=\x00\x00\x00\x0A
10102012#2 column=cf:uts, timestamp=1418267166524, value=\x00\x00\x00\x0A
My codes are like these :
public class TestMapReduce {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "Test");
job.setJarByClass(TestMapReduce.TestMapper.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
"test",
scan,
TestMapReduce.TestMapper.class,
Text.class,
IntWritable.class,
job);
TableMapReduceUtil.initTableReducerJob(
"test",
TestReducer.class,
job);
job.waitForCompletion(true);
}
public static class TestMapper extends TableMapper<Text, IntWritable> {
#Override
protected void map(ImmutableBytesWritable rowKey, Result columns, Mapper.Context context) throws IOException, InterruptedException {
System.out.println("mulai mapping");
try {
//get row key
String inKey = new String(rowKey.get());
//get new key having date only
String onKey = new String(inKey.split("#")[0]);
//get value s_sent column
byte[] bUas = columns.getValue(Bytes.toBytes("cf"), Bytes.toBytes("uas"));
String sUas = new String(bUas);
Integer uas = new Integer(sUas);
//emit date and sent values
context.write(new Text(onKey), new IntWritable(uas));
} catch (RuntimeException ex) {
ex.printStackTrace();
}
}
}
public class TestReducer extends TableReducer {
public void reduce(Text key, Iterable values, Reducer.Context context) throws IOException, InterruptedException {
try {
int sum = 0;
for (Object test : values) {
System.out.println(test.toString());
sum += Integer.parseInt(test.toString());
}
Put inHbase = new Put(key.getBytes());
inHbase.add(Bytes.toBytes("cf"), Bytes.toBytes("sum"), Bytes.toBytes(sum));
context.write(null, inHbase);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I got errors like these :
Exception in thread "main" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:451)
at org.apache.hadoop.util.Shell.run(Shell.java:424)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:745)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:728)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at TestMapReduce.main(TestMapReduce.java:97)
Java Result: 1
Help me please :)
Let's look at this part of your code:
byte[] bUas = columns.getValue(Bytes.toBytes("cf"), Bytes.toBytes("uas"));
String sUas = new String(bUas);
For the current key you are trying to get a value of column uas from column family cf. This is a non-relational DB, so it is easily possible that this key doesn't have a value for this column. In that case, getValue method will return null. String constructor that accepts byte[] as an input can't handle null values, so it will throw a NullPointerException. A quick fix will look like this:
byte[] bUas = columns.getValue(Bytes.toBytes("cf"), Bytes.toBytes("uas"));
String sUas = bUas == null ? "" : new String(bUas);

job.setOutputKeyClass and setOutputValueClass in Driver is mismatching with the reducer's context.write method,still program is running fine.how?

Driver code:
public class WcDriver {
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf, "WcDriver");
job.setJarByClass(WcDriver.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WcMapper.class);
job.setReducerClass(WcReducer.class);
job.waitForCompletion(true);
}
}
Reducer code
public class WcReducer extends Reducer<Text, LongWritable, Text,String>
{
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
String key1 = null;
int total = 0;
for (LongWritable value : values) {
total += value.get();
key1= key.toString();
}
context.write(new Text(key1), "ABC");
}
}
Here, in driver class I have set job.setOutputKeyClass(Text.class) and job.setOutputValueClass(LongWritable.class), but in reducer class I am writing a string context.write(new Text(key1), "ABC");. I think there should be an error while running the program because output types are not matching, and also reducer's key should implement WritableComparable and value should implement Writable interface. Strangely, this program is running good. I do not understand why there is not an exception.
try to do this :
// job.setOutputFormatClass(TextOutputFormat.class);
// comment this line, and you'll sure get exception of casting.
This is because, TextOutputFormat assumes LongWritable as key, and Text as value, if you'll not define the outPutFormat class, it will expect to get default behaviour of writable, which is by default, but if u'll mention it, it would implicitly cast it to the given type.;
try this
//job.setOutputValueClass(LongWritable.class); if you comment this line you get an error
this will for only define the key value pair by defaul it depent on the output format and
it will be text so this is not giving any error

Categories

Resources