Hadoop - How to extract a taskId from mapred.JobConf?

Hadoop - How to extract a taskId from mapred.JobConf? - java

Is it possible to create a valid *mapreduce*.TaskAttemptID from *mapred*.JobConf?
The background
I need to write a FileInputFormatAdapter for an ExistingFileInputFormat. The problem is that the Adapter needs to extend mapred.InputFormat and the Existing format extends mapreduce.InputFormat.
I need to build a mapreduce.TaskAttemptContextImpl, so that I can instantiate the ExistingRecordReader. However, I can't create a valid TaskId...the taskId comes out as null.
So How can I get the taskId, jobId, etc from mapred.JobConf.
In particular in the Adapter's getRecordReader I need to do something like:
public org.apache.hadoop.mapred.RecordReader<NullWritable, MyWritable> getRecordReader(
org.apache.hadoop.mapred.InputSplit split, JobConf job, Reporter reporter) throws IOException {
SplitAdapter splitAdapter = (SplitAdapter) split;
final Configuration conf = job;
/*************************************************/
//The problem is here, "mapred.task.id" is not in the conf
/*************************************************/
final TaskAttemptID taskId = TaskAttemptID.forName(conf.get("mapred.task.id"));
final TaskAttemptContext context = new TaskAttemptContextImpl(conf, taskId);
try {
return new RecordReaderAdapter(new ExistingRecordReader(
splitAdapter.getMapRedeuceSplit(),
context));
} catch (InterruptedException e) {
throw new RuntimeException("Failed to create record-reader.", e);
}
}
This code throws an exception:
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.<init>(TaskAttemptContextImpl.java:44)
at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.<init>(TaskAttemptContextImpl.java:39)
'super(conf, taskId.getJobID());' is throwing the exception, most likely because taskId is null.

I found the answer by looking through HiveHbaseTableInputFormat. Since my solution is targeted for hive, this works perfectly.
TaskAttemptContext tac = ShimLoader.getHadoopShims().newTaskAttemptContext(
job.getConfiguration(), reporter);

Related

Test for thrown exception in constructor method not behaving as expected

I have the following code that reads data from a csv file that I am trying to write a unit test for. I am unsure of how to go about it.
public class BudgetTags implements BudgetTagsList{
// State variables
private Set<String> tags = new TreeSet<>();
private String tag_file_path;
public BudgetTags(String tag_file_path){
//Retrieve tags from tag file
this.tag_file_path = tag_file_path;
this.retrieveTags();
}
public void retrieveTags() {
String line = "";
try{
// Begin reading each line
BufferedReader br = new BufferedReader(new FileReader(this.tag_file_path ));
while((line = br.readLine()) != null){
String[] row = line.split(",");
this.tags.add(row[0]); //Assume correct file format
}
br.close();
} catch (IOException e){
System.out.println("Fatal exception: "+ e.getMessage());
}
}
}
Note that the method retrieveTags(); is not allowing me to specify an additional FileNotFoundException since it extends IOException. It is being tested in the following manner:
#Test
#DisplayName("File name doesn't exist")
void testRetrieveTag3() {
String path = "test\\no_file.csv";
//Instantiate new tags object
BudgetTags tags = new BudgetTags(path);
IOException thrown = assertThrows(IOException.class, () -> tags.retrieveTags());
}
The variable path does not exist so I am expecting the test to catch the IOException, (although I would prefer a FileNotFoundException) . When I run this particular test, I receive an AssertionFailedError How can I restructure my test so that it catches the FileNotFoundException when a new tags object is instantiated, since retrieveTags() is called when a new tags object is generated?
The method retrieveTags() will not allow me to specify

The method is not actually throwing the exception but catching it. What you actually need to test is that your catch block gets executed. If all you want to do on catching the exception is printing the error, test system.out can help you assert the print statement

Your assertThrows test is failing becuase it's impossible for the constructor to throw an IOException. For one, it's a checked exception, which means both the constructor and the method would require a throws IOException clause. Second, you catch the exception; it's not thrown out of the method.
Based on your test, it should look more like this:
public class BudgetTags implements BudgetTagsList {
private final Set<String> tags = new TreeSet<>();
private String tagFilePath;
public BudgetTags(String tagFilePath) throws IOException {
this.tagFilePath = tagFilePath;
retrieveTags(); // can throw IOException
}
public void retrieveTags() throws IOException {
// note: use try-with-resources to handle closing the reader
try (BufferedReader br = new BufferedReader(new FileReader(tagFilePath))) {
String line;
while ((line = br.readLine()) != null) {
String row = line.split(",");
tags.add(row[0]);
}
}
// don't catch the exception; your test indicates you want it
// thrown out to the caller
}
}
class BudgetTagsTests {
#Test
#DisplayName("File does not exist")
void testRetrieveTags3() {
String tagFilePath = "test/no_file.csv";
// note: we need to test on the constructor call, because you call
// 'retrieveTags()' in it.
assertThrows(FileNotFoundException.class, () -> new BudgetTags(tagFilePath));
}
}
By passing FileNotFoundException.class, the test will fail if any other IOException is thrown.
You should not be catching the IOException the way you are, anyway. Yes, you log it, which means if you look at the logs you'll be aware that something went wrong. But other code won't know something went wrong. To that code, it will appear as if there were simply no tags in the file. By throwing the IOException out to the caller of retrieveTags(), you're letting the caller react to the exception as needed. And if the call succeeds, but the tags are empty, then it knows the file exists but simply had no tags.
Also, you say:
Note that the method retrieveTags(); is not allowing me to specify an additional FileNotFoundException since it extends IOException.
I'm not sure what exactly you tried from that statement, but it is possible to catch more specific exceptions even though you're also catching the more general exception. It's just that the order of the catch blocks matter:
try {
somethingThatThrowsIOException();
} catch (FileNotFoundException ex) {
// do something special for when the file doesn't exist
} catch (IOException ex) {
// handle general exception
}
The more specific exception must be caught before the more general exception.

How to create mock CsvExceptions to use with csvToBean.getCapturedExceptions()

I am trying to write some unit tests to see if a logging method gets called for csv exceptions. The flow goes something like this:
CsvToBean is used to parse some info and each bean that is produced has some work done on it.
After all this, CsvToBean.getCapturedExceptions().forEach() is used to processed the exceptions.
How to I create some of these exceptions for testing?
public void parseAndSaveReportToDB(Reader reader, String reportFileName,ItemizedActivityRepository iaRepo,
ICFailedRecordsRepository icFailedRepo,
String reportCols) throws Exception {
try {
CsvToBean<ItemizedActivity> csvToBean = new CsvToBeanBuilder<ItemizedActivity>(reader).withType(ItemizedActivity.class).withThrowExceptions(false).build();
csvToBean.parse().forEach(itmzActvty -> {
itmzActvty.setReportFileName(reportFileName);
String liteDesc = itmzActvty.getBalanceTransactionDescription();
if (liteDesc.contains(":")) {
liteDesc = liteDesc.substring(liteDesc.indexOf(":")+1).trim();
}
itmzActvty.setLiteDescription(liteDesc);
itmzActvty.setAmount(convertCentToDollar(itmzActvty.getAmount()));
iaRepo.save(itmzActvty);
});
log.info("Successfully saved report data in DB");
csvToBean.getCapturedExceptions().forEach(csvExceptionObj -> logFailedRecords(reportFileName, csvExceptionObj, icFailedRepo, reportCols));
reader.close();
} catch (Exception ex) {
log.error("Exception when saving report data to DB", ex);
throw ex;
}
}
In this code I need to trigger the logFailedRecords method. To do so I need to fill the captured exceptions queue with an exception. I don't know how to get an exception in there.
What I have is not much since I keep hitting walls
#Test
public void testParseAndSaveReportToDBWithExceptions() throws Exception {
// CsvException csvExceptionObject = new CsvException("testException");
CsvToBean<ItemizedActivity> csvToBean = mock(CsvToBean.class);//<ItemizedActivity>(reader).withType(ItemizedActivity.class).withThrowExceptions(false).build().class);
BufferedReader reader = mock(BufferedReader.class);
ReportingMetadata rmd = this.getReportingMetadata();
verify(this.reportsUtil).parseAndSaveReportToDB(reader,"test.csv",
this.iaRepo,this.icFailedRepo,rmd.getReportCols());
// System.out.println(csvToBean.getCapturedExceptions().toString());
}

null pointer exception in getstrings method hadoop

Getting Null pointer exception in Driver class conf.getstrings() method. This driver class is invoked from my custom website.
Below are Driver class details
#SuppressWarnings("unchecked")
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException
{
Configuration conf = new Configuration();
//conf.set("fs.default.name", "hdfs://localhost:54310");
//conf.set("mapred.job.tracker", "localhost:54311");
//conf.set("mapred.jar","/home/htcuser/Desktop/ResumeLatest.jar");
Job job = new Job(conf, "ResumeSearchClass");
job.setJarByClass(HelloForm.class);
job.setJobName("ResumeParse");
job.setInputFormatClass(FileInputFormat.class);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:54310/usr/ResumeDirectory"));
job.setMapperClass(ResumeMapper.class);
job.setReducerClass(ResumeReducer.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setSortComparatorClass(ReverseComparator.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(FileOutPutFormat.class);
FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:54310/usr/output" + System.currentTimeMillis()));
long start = System.currentTimeMillis();
var = job.waitForCompletion(true) ? 0 : 1;
Getting NULL pointer exception from following two line of code
String[] keytextarray=conf.getStrings("Keytext");
for(int i=0;i<keytextarray.length;i++) //GETTING NULL POINTER EXCEPTION HERE IN keytextarray.length
{
//some code here
}
if(var==0)
{
RequestDispatcher dispatcher = request.getRequestDispatcher("/Result.jsp");
dispatcher.forward(request, response);
long finish= System.currentTimeMillis();
System.out.println("Time Taken "+(finish-start));
}
}
I have removed few unwanted codes from above Drives class method...
Below are RecordWriter class where I use conf.setstrings() in Write() method to set values
Below are RecordWriter class details
public class RecordWrite extends org.apache.hadoop.mapreduce.RecordWriter<IntWritable, Text> {
TaskAttemptContext context1;
Configuration conf;
public RecordWrite(DataOutputStream output, TaskAttemptContext context)
{
out = output;
conf = context.getConfiguration();
HelloForm.context1=context;
try {
out.writeBytes("result:\n");
out.writeBytes("Name:\t\t\t\tExperience\t\t\t\t\tPriority\tPriorityCount\n");
} catch (IOException e) {
e.printStackTrace();
}
}
public RecordWrite() {
// TODO Auto-generated constructor stub
}
#Override
public void close(TaskAttemptContext context) throws IOException,
InterruptedException
{
out.close();
}
int z=0;
#Override
public void write(IntWritable value,Text key) throws IOException,
InterruptedException
{
conf.setStrings("Keytext", key1string); //setting values here
conf.setStrings("valtext", valuestring);
String[] keytext=key.toString().split(Pattern.quote("^"));
//some code here
}
}`
`I suspect this null pointer exception happens since i call conf.getstrings() method after job is completed (job.waitForCompletion(true)). Please help fix this issue.
If above code is not correct way of passing values from recordwriter() method to driverclass.. please let me know how to pass values from recordwriter() to driver class.
I have tried option of setting values in RecordWriter() to an custom static class and accessing that object from static class in Driverclass again returns Null exception if i am running code in cluster..

If you have the value of key1staring and valuestirng, in Job class, try setting them in job class itself, rather than RecordWriter.write() method.

Hadoop: NullPointerException with Custom InputFormat

I've developed a custom InputFormat for Hadoop (including a custom InputSplit and a custom RecordReader) and I'm experiencing a rare NullPointerException.
These classes are going to be used for querying a third-party system which exposes a REST API for records retrieving. Thus, I got inspiration in DBInputFormat, which is a non-HDFS InputFormat as well.
The error I get is the following:
Error: java.lang.NullPointerException at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
I've searched the code for MapTask (2.1.0 version of Hadoop) and I've seen the problematic part is the initialization of the RecordReader:
472 NewTrackingRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
473 org.apache.hadoop.mapreduce.InputFormat<K, V> inputFormat,
474 TaskReporter reporter,
475 org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
476 throws InterruptedException, IOException {
...
491 this.real = inputFormat.createRecordReader(split, taskContext);
...
494 }
...
519 #Override
520 public void initialize(org.apache.hadoop.mapreduce.InputSplit split,
521 org.apache.hadoop.mapreduce.TaskAttemptContext context
522 ) throws IOException, InterruptedException {
523 long bytesInPrev = getInputBytes(fsStats);
524 real.initialize(split, context);
525 long bytesInCurr = getInputBytes(fsStats);
526 fileInputByteCounter.increment(bytesInCurr - bytesInPrev);
527 }
Of course, the relevant parts of my code:
# MyInputFormat.java
public static void setEnvironmnet(Job job, String host, String port, boolean ssl, String APIKey) {
backend = new Backend(host, port, ssl, APIKey);
}
public static void addResId(Job job, String resId) {
Configuration conf = job.getConfiguration();
String inputs = conf.get(INPUT_RES_IDS, "");
if (inputs.isEmpty()) {
inputs += restId;
} else {
inputs += "," + resId;
}
conf.set(INPUT_RES_IDS, inputs);
}
#Override
public List<InputSplit> getSplits(JobContext job) {
// resulting splits container
List<InputSplit> splits = new ArrayList<InputSplit>();
// get the Job configuration
Configuration conf = job.getConfiguration();
// get the inputs, i.e. the list of resource IDs
String input = conf.get(INPUT_RES_IDS, "");
String[] resIDs = StringUtils.split(input);
// iterate on the resIDs
for (String resID: resIDs) {
splits.addAll(getSplitsResId(resID, job.getConfiguration()));
}
// return the splits
return splits;
}
#Override
public RecordReader<LongWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) {
if (backend == null) {
logger.info("Unable to create a MyRecordReader, it seems the environment was not properly set");
return null;
}
// create a record reader
return new MyRecordReader(backend, split, context);
}
# MyRecordReader.java
#Override
public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {
// get start, end and current positions
MyInputSplit inputSplit = (MyInputSplit) this.split;
start = inputSplit.getFirstRecordIndex();
end = start + inputSplit.getLength();
current = 0;
// query the third-party system for the related resource, seeking to the start of the split
records = backend.getRecords(inputSplit.getResId(), start, end);
}
# MapReduceTest.java
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new MapReduceTest(), args);
System.exit(res);
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = this.getConf();
Job job = Job.getInstance(conf, "MapReduce test");
job.setJarByClass(MapReduceTest.class);
job.setMapperClass(MyMap.class);
job.setCombinerClass(MyReducer.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(MyInputFormat.class);
MyInputFormat.addInput(job, "ca73a799-9c71-4618-806e-7bd0ca1911f4");
InputFormat.setEnvironmnet(job, "my.host.com", "443", true, "my_api_key");
FileOutputFormat.setOutputPath(job, new Path(args[0]));
return job.waitForCompletion(true) ? 0 : 1;
}
Any ideas about what is wrong?
BTW, which is the "good" InputSplit the RecordReader must use, the one given to the constructor or the one given in the initialize method? Anyway I've tried both options and the resulting error is the same :)

The way I read your strack trace real is null on line 524.
But don't take my word for it. Slip an assert or system.out.println in there and check the value of real yourself.
NullPointerException almost always means you dotted off something you didn't expect to be null. Some libraries and collections will throw it at you as their way of saying "this can't be null".
Error: java.lang.NullPointerException at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:524)
To me this reads as: in the org.apache.hadoop.mapred package the MapTask class has an inner class NewTrackingRecordReader with an initialize method that threw a NullPointerException at line 524.
524 real.initialize( blah, blah) // I actually stopped reading after the dot
this.real was set on line 491.
491 this.real = inputFormat.createRecordReader(split, taskContext);
Assuming you haven't left out any more closely scoped reals that are masking the this.real then we need to look at inputFormat.createRecordReader(split, taskContext); If this can return null then it might be the culprit.
Turns out it will return null when backend is null.
#Override
public RecordReader<LongWritable, Text> createRecordReader(
InputSplit split,
TaskAttemptContext context) {
if (backend == null) {
logger.info("Unable to create a MyRecordReader, " +
"it seems the environment was not properly set");
return null;
}
// create a record reader
return new MyRecordReader(backend, split, context);
}
It looks like setEnvironmnet is supposed to set backend
# MyInputFormat.java
public static void setEnvironmnet(
Job job,
String host,
String port,
boolean ssl,
String APIKey) {
backend = new Backend(host, port, ssl, APIKey);
}
backend must be declared somewhere outside setEnvironment (or you'd be getting a compiler error).
If backend hasn't been set to something non-null upon construction and setEnvironmnet was not called before createRecordReader then you should expect to get exactly the NullPointerException you got.
UPDATE:
As you've noted, since setEnvironmnet() is static backend must be static as well. This means that you must be sure other instances aren't setting it to null.

Solved. The problem is the backend variable is declared as static, i.e. it belongs to the java class and thus any other object changing that variable (e.g. to null) affects all the other objects of the same class.
Now, setEnvironment adds the host, port, ssl usage and the API key as configuration (the same than setResId already did with the resource ID); when createRecordReader is invoked this configuration is got and the backend object is created.
Thanks to CandiedOrange who put me in the right path!

Android File manipulation tests with JUNIT

I am trying to test file manipulation with my APP. First of all I wanna check that whenever I call a function that reads the file, this function will throw an Exception because the file isn't there.
However, I don't seem to understand how to achieve this... This is the code I designed, but it doesn't run ... the normal JUNIT says the FILEPATH wasn't found, the android JUNIT says, the Test could not be run.
The folder: /data/data/example.triage/files/ is already available in the virtual device...
#Before
public void setUp() throws Exception {
dr = new DataReader();
dw = new DataWriter();
DefaultValues.file_path_folder = "/data/data/example.triage/files/";
}
#After
public void tearDown() throws Exception {
dr = null;
dw = null;
// Remove the patients file we may create in a test.
dr.removeFile(DefaultValues.patients_file_path);
}
#Test
public void readHealthCardsNonExistentPatientsFile() {
try {
List<String> healthcards = dr.getHealthCardsofPatients();
fail("The method didn't generate an Exception when the file wasn't found.");
} catch (Exception e) {
assertTrue(e.getClass().equals(FileNotFoundException.class));
}
}

It doesn't look like you are checking for the exception in a way that correlates with the JUnit API.
Have you tried to make the call:
#Test (expected = Exception.class)
public void tearDown() {
// code that throws an exception
}
I don't think you want the setup() function to be able to generate an exception, since it is called before all other test cases.
Here's another way to test exceptions:
Exception occurred = null;
try
{
// Some action that is intended to produce an exception
}
catch (Exception exception)
{
occurred = exception;
}
assertNotNull(occurred);
assertTrue(occurred instanceof /* desired exception type */);
assertEquals(/* expected message */, occurred.getMessage());
So I would make you setup() code not throw an exception and move the exception generating code to a test method, using an appropriate way to test for it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop - How to extract a taskId from mapred.JobConf? - java

I found the answer by looking through HiveHbaseTableInputFormat. Since my solution is targeted for hive, this works perfectly. TaskAttemptContext tac = ShimLoader.getHadoopShims().newTaskAttemptContext( job.getConfiguration(), reporter);

Related

Test for thrown exception in constructor method not behaving as expected

How to create mock CsvExceptions to use with csvToBean.getCapturedExceptions()

null pointer exception in getstrings method hadoop

Hadoop: NullPointerException with Custom InputFormat

Android File manipulation tests with JUNIT

Categories

Resources