Hadoop MRUnit IllegalStateException when using Hadoop-The Definitive Guide code - java

I am studying Hadoop from the Definitive Guide book and tried to execute this piece of code which is resulting in the error.
Example from Chapter-5. Link to Github code:
source:
https://github.com/tomwhite/hadoop-book/blob/master/ch05/src/main/java/v1/MaxTemperatureMapper.java
public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature = Integer.parseInt(line.substring(87, 92));
context.write(new Text(year), new IntWritable(airTemperature));
}
}
test:
https://github.com/tomwhite/hadoop-book/blob/master/ch05/src/test/java/v1/MaxTemperatureMapperTest.java
public class MaxTemperatureMapperTest {
#Test
public void processesValidRecord() throws IOException, InterruptedException {
Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
"99999V0203201N00261220001CN9999999N9-00111+99999999999");
new MapDriver<LongWritable, Text, Text, IntWritable>()
.withMapper(new MaxTemperatureMapper())
.withInputValue(value)
.withOutput(new Text("1950"), new IntWritable(-11))
.runTest();
}
The error I'm getting is the following:
java.lang.IllegalStateException: No input was provided
at org.apache.hadoop.mrunit.MapDriverBase.preRunChecks(MapDriverBase.java:286)
at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:142)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
at book.hadoopdefinitiveguide.chap5.examples.MaxTemperatureMapperTest.processesValidRecord(MaxTemperatureMapperTest.java:12)
This happens to be the first code I'm executing in hadoop and it is throwing this error. Any help is appreciated. Thanks in advance

As #Thomas Junblut pointed out, you need to specify a key along with the value.
But (assuming you're using mrunit 1.0 or higher), withInputKey|Value is deprecated. You should instead be using withInput(K1 key, V1 val), where you specify bot the key and value in one method, instead of withInputValue(..).withInputKey(..). So you'd have something like this
new MapDriver<LongWritable, Text, Text, IntWritable>()
.withMapper(new MaxTemperatureMapper())
.withInput(new LongWritable(), value)
.withOutput(new Text("1950"), new IntWritable(-11))
.runTest();
The new LongWritable() is just an arbitrary key.
EDIT
So after further testing, it's not a problem with your code (except for the deprecation, but that isn't the cause).
I ran the same test using the code I posted and got the exact same error. I tested with some old practice project I had. It seemed to be a problem with the classes not being built correctly. I created a whole new project, rewrote (copy-paste) the mapper class and created a new test case, saved everything, ran it, and it worked fine. Try to do this. BTW I'm on eclipse using the Eclipse Hadoop plugin to create the MR Project

Related

Updating pre-existing documents in mongoDB java driver when you've changed document structure

I've got a database of playerdata that has some pre-existing fields from previous versions of the program. Example out-dated document:
{
"playername": "foo"
}
but a player document generated under the new version would look like this:
{
"playername": "bar",
"playercurrency": 20
}
the issue is that if I try to query playercurrency on foo I get a NullPointerException because playercurrency doesn't exist for foo. I want to add the playercurrency field to foo without disturbing any other data that could be stored in foo. I've tried some code using $exists Example:
players.updateOne(new Document("playername", "foo"), new Document("$exists", new Document("playername", "")));
players.updateOne(new Document("playername", "foo"), new Document("$exists", new Document("playercurrency", 20)));
My thought is that it updates only playercurrency because it doesn't exist and it would leave playername alone becuase it exists. I might be using exists horribly wrong, and if so please do let me know because this is one of my first MongoDB projects and I would like to learn as much as I possibly can.
Do you have to do this with java? Whenever I add a new field that I want to be required I just use the command line to migrate all existing documents. This will loop through all players that don't have a playercurrency and set it to 0 (change to whatever default you want):
db.players.find({playercurrency:null}).forEach(function(player) {
player.playercurrency = 0; // or whatever default value
db.players.save(player);
});
This will result in you having the following documents:
{
"playername" : "foo",
"playercurrency" : 0
}
{
"playername" : "bar",
"playercurrency" : 20
}
So I know that it is normally frowned upon on answering your own question, but nobody really posted what I ended up doing I would like to take this time to thank #Mark Watson for answering and ultimately guiding me to finding my answer.
Since checking if a certain field is null doesn't work in the MongoDB Java Driver I needed to find a different way to know when something is primed for an update. So after a little bit of research I stumbled upon this question which helped me come up with this code:
private static void updateValue(final String name, final Object defaultValue, final UUID key) {
if (!exists(name, key)) {
FindIterable iterable = players.find(new Document("_id", key));
iterable.forEach(new Block<Document>() {
#Override
public void apply(Document document) {
players.updateOne(new Document("_id", key), new Document("$set", new Document(name, defaultValue)));
}
});
}
}
private static boolean exists(String name, UUID key) {
Document query = new Document(name, new Document("$exists", true)).append("_id", key);
return players.count(query) == 1;
}
Obviously this is a little specialized to what I wanted to do, but with little revisions it can be easliy changed to work with anything you might need. Make sure to replace players with your Collection object.

Files.createSymbolicLink() (java.nio.file) doesn't override existing symbol link and doesn't throw exception

Based on the Java Doc, it will throw FileAlreadyExistsException if the link already exists. But in the actual testing, when running follow two lines, both of them return "/tmp/ln1" and no exception is thrown. And the "ln1" is still point to "/tmp/dir1". Seems this behavior doesn't follow the documentation. It's a JDK bug?
Is there a way to override the old link? like what's the command line does:
ln -nfs from to
Files.createSymbolicLink(Paths.get("/tmp/ln1"), Paths.get("/tmp/dir1"))
Files.createSymbolicLink(Paths.get("/tmp/ln1"), Paths.get("/tmp/dir2"))
I use JDK 1.7. The OS is Linux. I try those two statements, it creates a symbolic link according to the first statement, and then throws a FileAlreadyExistsException for executing the second one.
If you want to override the old link, you should delete the old link before you create a new link, like this:
public class Test {
public static void main(String[] args) throws IOException {
String link = "/tmp/ln1";
// create first symbolic link
deleteIfExists(link);
Files.createSymbolicLink(Paths.get(link), Paths.get("/tmp/dir1"));
//create second symbolic link
deleteIfExists(link);
Files.createSymbolicLink(Paths.get(link), Paths.get("/tmp/dir2"));
}
private static void deleteIfExists(String filePath) {
File file = new File(filePath);
if(file.exists()) {
file.delete();
}
}
}

Sending mapper output to different reducer

I am new to Hadoop and now I am working with java mapper/reducer codes. While working, I came across a problem that I have to pass the output of mapper class to two different reducer class.If it is possible or not.Also can we send two different outputs from same mapper class...Can any one tell me..
I've been trying to do the same. Based on what I found, we cannot have mapper output send to two reducers. But could perform the task that you wanted to do in two reducers in one by differentiating the tasks in the reducer. The reducer can select the task based on some key criteria. I must warn you I'm new to hadoop so may not be the best answer.
The mapper will generate keys like this +-TASK_XXXX. The reducer will then invoke different methods to process TASK_XXXX
Think it is better to have TASK_NAME at the end to ensure effective partitioning.
As for your second question, I believe you can send multiple output from same mapper class to reducer. This post maybe of interest to you Can Hadoop mapper produce multiple keys in output?
The map method would look like
#Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context) throws IOException, InterruptedException {
//do stuff 1
Text outKey1 = new Text(<Your_Original_Key>+"-TASK1");
context.write(outKey, task1OutValues);
//do stuff 2
Text outKey2 = new Text(<Your_Original_Key>+"-TASK2");
context.write(outKey, task2OutValues);
}
and reduce method
#Override
protected void reduce(Text inkey, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {
String key = inKey.toString();
if(inKey.matches(".*-TASK1$")) {
processTask1(values);
} else if(inKey.matches(".*-TASK2$")) {
processTask2(values);
}
}

Conflict Between StaticWriter and Writer Classes

As of our recent move from WSED 5.2 to RAD 7.5, an application-wrecking bug has appeared in our code.
RAD 7.5 marks it thusly, all at the declaration of the class header (public class FoStringWriter extends StringWriter implements FoWriter {)
- Exception IOException in throws clause of Writer.append(CharSequence, int, int) is not compatable with StringWriter.append(CharSequence, int, int)
- Exception IOException in throws clause of Writer.append(char) is not compatable with StringWriter.append(char)
- Exception IOException in throws clause of Writer.append(CharSequence) is not compatable with StringWriter.append(CharSequence)
Every piece of literature I have been able to find on the web points to this being a 'bug' in Eclipse, yet my developer should already have the most recent version of Eclipse software on it. So I am left not knowing what to make of this error. Is there a fix from IBM that I simply haven't yet updated to? Or is there a code-fix that could rectify this error?
public class FoStringWriter extends StringWriter implements FoWriter {
public void filteredWrite(String str) throws IOException {
FoStringWriter.filteredWrite(str,this);
}
public static void filteredWrite(String str, StringWriter writer) throws IOException {
if (str == null) str = "";
TagUtils tagUtils = TagUtils.getInstance();
str = tagUtils.filter(str);
HashMap dictionary = new HashMap();
dictionary.put("&#","&#");
str = GeneralUtils.translate(str,dictionary);
writer.write(str);
}
}
Editory Note:
The process that this runs creates PDF documents for our app. In WSED 5.5, it worked, had a few errors but nothing that stopped the PDF from being written.
Slap me in the forehead, because this is another "Error" solved by a seemingly completely obvious answer.
Just by adding the listed methods that throw "errors", I can eliminate the error throwing when calling this class.
Copying them straight from StringWriter actually worked, without editing them in any way.
public StringWriter append(char c) {
write(c);
return this;
}
public StringWriter append(CharSequence csq) {
if (csq == null)
write("null");
else
write(csq.toString());
return this;
}
public StringWriter append(CharSequence csq, int start, int end) {
CharSequence cs = (csq == null ? "null" : csq);
write(cs.subSequence(start, end).toString());
return this;
}
I'm both pleased that this worked, and at the same time frustrated that it was so glaringly simple a fix that it took nearly a full week to figure out.
I suppose the reason behind this error was likely a conflict in implementation. FoStringWriter extends StringWriter, but that itself extends Writer, and both classes have their own "append" methods that overwrite one another. By creating them explicitly, this error is resolved.

Get Specific data from MapReduce

I have the following File as input which consists of 10000 lines as follows
250788965731,20090906,200937,200909,621,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,5,0,1,6.25,635-10-104-40163.
I had to print the first column if the 18th column is lesser than 10 and the 9th column is morning. I did the following code. i'm not getting the output. The output file is empty.
public static class MyMap extends Mapper<LongWritable, Text, Text, DoubleWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] day=line.split(",");
double day1=Double.parseDouble(day[17]);
if(day[8]=="MORNING" && day1<10.0)
{
context.write(new Text(day[0]),new DoubleWritable(day1));
}
}
}
public static class MyReduce extends Reducer<Text, DoubleWritable, Text,DoubleWritable> {
public void reduce(Text key, Iterator<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
String no=values.toString();
double no1=Double.parseDouble(no);
if(no1>10.0)
{
context.write(key,new DoubleWritable(no1) );
}
}
}
Please tell what I did wrong? Is the flow correct?
I can see a few problems.
First, in your Mapper, you should use .equals() instead of == when comparing Strings. Otherwise you're just comparing references, and the comparison will fail even if the String objects content is the same. There is a possibility that it might succeed because of Java String interning, but I would avoid relying too much on that if that was the original intent.
In your Reducer, I am not sure what you want to achieve, but there are a few wrong things that I can spot anyway. The input key is an Iterable<DoubleWritable>, so you should iterate over it and apply whatever condition you need on each individual value. Here is how I would rewrite your Reducer:
public static class MyReduce extends Reducer<Text, DoubleWritable, Text,DoubleWritable> {
public void reduce(Text key, Iterator<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
for (DoubleWritable val : values) {
if (val.get() > 10.0) {
context.write(key, val);
}
}
}
}
But the overall logic doesn't make much sense. If all you want to do is print the first column when the 18th column is less than 10 and the 9th column is MORNING, then you could use a NullWritable as output key of your mapper, and write column 1 day[0] as your output value. You probably don't even need Reducer in this case, which you could tell Hadoop with job.setNumReduceTasks(0);.
One thing that got me thinking, if your input is only 10k lines, do you really need a Hadoop job for this? It seems to me a simple shell script (for example with awk) would be enough for this small dataset.
Hope that helps !
I believe this is a mapper only job as your data already has the values you want to check.
Your mapper has emitted values with day1 < 10.0 while your reducer emits only value ie. day1 > 10.0 hence none of the values would be outputted by your reducers.
So I think your reducer should look like this:
String no=values.toString();
double no1=Double.parseDouble(no);
if(no1 < 10.0)
{
context.write(key,new DoubleWritable(no1) );
}
I think that should get your desired output.

Categories

Resources