Skip a record in LoadFunc.getNext() - java

I'm extending the LoadFunc. In the getNext function I'd like to skip returning a tuple under certain conditions - this way I could only load a sample of the data file. I tried returning null for the rows I don't want to return but the problem is that the method terminates after the first null Tuple is returned.
Does anyone know of a way to do this? Should I do it in a different method?
Thanks in advance.

(Assuming you mean LoadFunc in Pig ... )
I would suggest writing a new method that does what you want simply to not break the original documented use of the getNext() method.
You should look at the source for the Pig classes that extend LoadFunc and see how they implement getNext(). For example: TextLoader
From there it should be fairly trivial to do what you're trying to do.
Edit to try and offer a little more detailed help:
(This is using the TextReader as an example)
The getNext() method is reading from a RecordReader. It does this by calling RecordReader.nextKeyValue() to advance to the next record. You check to see if that's true (meaning it read a record) and if it is, you call RecordReader.getCurrentValue() to retrieve the value.
Lets say you only wanted every fifth one as a sample in getNext():
int count = 0;
Text myText = null;
whlie(myRecordReader.nextKeyValue() == true)
{
if (count == 4)
{
myText = (Text) myRecordReader.getCurrentValue();
break;
}
count++;
}
if (myText != null) // we didn't hit the end; we have a record
{
... // create the tuple
return myTuple;
}
else
return null;
(corrected my silly off-by-one mistake)

Related

Best way to check if an object is null to prepare for mapping

I’m trying to map an object to another and I’m having trouble deciding what’s the best practice to check if the object from where I want to map is null
1 -
public DTOIntIdentityDocument mapIdentityDocument(Identitydocument in) {
if (in == null) {
return null;
} else {
DTOIntIdentityDocument out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
return out;
}
}
2 -
public DTOIntIdentityDocument mapIdentityDocument(Identitydocument in) {
DTOIntIdentityDocument out = null;
if (in != null) {
out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
}
return out;
}
¿Any ideas on what's the best practice to do this?
Obviously, this boils down to style, thus there are no hard rules that tells us which version is "best". If all the code your team writes follows scheme 1, then that is the best code for you.
Having said that, I prefer a simple initial guard, followed by the code computing the "real" result, like this:
if (in == null)
return null;
DTOIntIdentityDocument out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
return out;
You want to write code that is easy to read and understand. Your version one has that else block ... that actually doesn't need to be in its own block, with additional indents. On the other hand, your second snippet is using three different layers of abstraction: a simple assignment, an if-block, a simple return. That is definitely "more complex" than option 1, or the modified code I used above. But note: option 2 has its advantages, too. If you want/have to trace/log the result of that method, with option 2, you add a single trace(out) right before the return statement.
And for the record: when you go "hardcore" clean code, the method would finally read:
if (in == null)
return null;
return createDocumentFrom(in);
or something alike. Meaning: you push that code that actually creates and configures the result object into its own private method. And that method doesn't need to worry about a null parameter being passed in!
Finally: the ideal solution does not need to have to worry about null parameters. Simply because you avoid null like the plague. Not always possible, but always desirable!
if(in != null)
mapIdentityDocument(in)
public DTOIntIdentityDocument mapIdentityDocument(Identitydocument in) {
DTOIntIdentityDocument out = new DTOIntIdentityDocument();
out.setDocumentType(this.mapDocumentTypeÇ(in.getDocumenttype()));
out.setDocumentNumber(in.getDocumentnumber());
return out;
}

Searching Array list of objects

I have an array list in a class the array is called realtorList. From my main class I store objects with realtor data to the realtorList.
My data that is stored to a text file and is read in the first line.
This is the first element in the realtorList after I store the first line of data.
[Realtor{licenseNumber=AA1111111, firstName=Anna, lastName=Astrid, phoneNumber=111-111-1111,
commission=0.011}]
When I read the next line of data from the input file I need to see if the licenseNumber in bold already exists in the realtorList. I am having trouble figuring out how to go about doing this.
For example if the next realtor data license number is AA1111111 how do I check the realtorList for AA1111111 which does exist for this example.
A really simple way to do this would be to have a String ArrayList running alongside (for example, one called licenses) and use an if statement with indexOf to return if that license value is already in the List. Since the licenses ArrayList only has one value it can be easily searched with indexOf.
An example would be
private boolean checkLicense (String licenseNumber) {
int i = licenses.indexOf(licenseNumber);
if(i == -1) {
return false;
} else {
return true;
}
}
Similar code works in one of my projects where a dynamic List of motors for a robot checks to see if there's already a motor with the listed port before adding a new one.
Another method could use a for loop for a linear search such as
private boolean checkLicense (String licenseNumber) {
for(int i = 0; i < (realtorList.size() - 1); i++) {
if (licenseNumber.equals(realtorList[i].getLicenseNumber())) {
return true;
}
}
return false;
}
This would perform a linear search of each and every object until it finds it (it would need to be in a method like the one for the example above to work this way)

Loop Turning into an Infinite Loop

A segment of my code is triggering an infinite while loop, and I'm not sure why. I've used the loop itself before to add friends to a Linked List in this same program and it worked fine, so I do not understand why it is turning into an infinite loop now.
while (!a.equals("*")){
curr = friendlist.getUsers().getFront();
while (curr!=null){
if (curr.getData().getName().equals(a)){ //why is it not removing friends?
d.removeFriend(curr.getData());
}
curr = curr.getNext();
}
System.out.println("Add a friend by typing in their name. Enter * to end. ");
a = in.nextLine();
}
The above code accesses the following segment from another class:
public void removeFriend(User u){
if (friendsList.isEmpty()){
System.out.println("Empty list, cannot remove.");
}
else{
Node c = friendsList.getFront();
while (c.getNext()!=null){
if (c.getNext().getData().equals(u)){ //condition: if the data is the same
c.setNext(c.getNext().getNext()); //change the link
c.getNext().setData(null); //set the next data to null (cut the link)
friendsList.setSize(friendsList.size()-1);
c = c.getNext();
}
}
}
}
Why is the code not running properly?
As another poster has mentioned, you are invoking the getNext() method twice in one code block.
Here's what I presume is what will work for you
while (c!=null){
if (c.getNext().getData().oldestFriend().getBirthYear()>c.getData().oldestFriend().getBirthYear()){
a = c.getNext().getData();
continue; //then skip the current iteration, so that your line below after the if statement, wont get called.
}
c = c.getNext();
}
Why dont you do this, because now it looks like you're calling that same method three times!
Instead, store whatever is returned from the getNext() into one variable, and then access that local variable and do whatever you want with it, analyse is however you like etc.
Do you know where the infinite loop is exactly? Maybe put a System.out.println("loop") before curr.getNext() and c.getNext() so see which one is failing?
Would add this as a comment, but I'm not yet allowed to :(
How is the semantic of
allUsers.getFront()
Does it just fetch the head or is it more like a pop-operation?
In case of a fetch, there might be an issue with the recursive call of
oldestFriend()
in the method oldestFriend().
Though in that case I would expect a StackOverflowException.
change your while by:
while (c.hasNext() {
Node oldC = c;
c = c.getNext();
if(c.getData().oldestFriend().getBirthYear() > oldC.getData().oldestFriend().getBirthYear()) {
a = c.getData();
}
}
Call only getNext() if there is next, and only once.

Updating method to remove loop

This is how I understand method getUser below :
Return a User object or null
Get a Set of users and assign them to userSer.
If the set is not empty begin iterating over the set but
return the first user within the set.
Here is the method :
private User getUser(UserDet arg)
{
Set<User> userSet = arg.getUsers(User.class);
if (CollectionUtils.isNotEmpty(userSet))
{
for (User user : userSet)
{
return user;
}
}
return null;
}
I think I could replace the method with this :
private User getUser(UserDet arg)
{
Set<User> userSet = arg.getUsers(User.class);
if (CollectionUtils.isNotEmpty(userSet))
{
return userSet.iterator().next();
}
else {
return null;
}
}
This new method removes the loop and just returns the first element in the set, same as original implemention. Is it correct?
Yes. Actually, it's pretty much almost the same thing, as a foreach loop is syntactic sugar for using an iterator from an Iterable.
Note, however, that you don't need the nonempty check in the first variant, since the loop won't iterate in the case of an empty set anyway.
yes both are same. in first implementation, control will return on first iteration of the loop from the function and consequently loop will end.
Yes it is correct, I'd even go for removing the CollectionUtils.isNotEmptySet and use the Iterator's hasNext method... If the set is guaranteed to be non-null.
It seems to be correct, but it will only make the method a bit easier to read, it will not optimize it in terms of performance. Still I think the change is good and you should do it.
Yes, it does pretty much the same, but if your spec says to start iterating then maybe you should - maybe this method will be extended in the future.
BTW: it is a good convention that your method has only one return statement (i.e. you can create a variable, which will be returned, assigned a null at the beginning and assign a user inside your loop)
Yes. Both the methods return the first element in the set. The first method seems to have been written for something else previously and changed then keeping the for loop intact.
In anycase, the second method that you're proposing won't give any significant performance benefit but should be a better way than the first one.
So in case, UserDet#getUsers(Class) never returns null (but an empty Set in case no user could be found), the shortest (and in my opinion most readable) form is:
private User getUser(UserDet arg) {
Set<User> userSet = arg.getUsers(User.class);
return userSet.isEmpty() ? null : userSet.iterator().next();
}
I would do this.
I won't run a loop and more over I'l add a null check.
private User getUser(UserDet arg) {
Set<User> userSet = arg.getUsers(User.class);
if (userSet != null && userSet.size() > 0) {
return userSet.iterator().next();
}
return null;
}

Need help stepping through a java iterator

Say I have already created an iterator called "iter" and an arraylist called "database". I want to be able to look through the arraylist and see if any element in the arraylist is equal to a String called "test". If it is, then I would like to add the element to another list.
while(iter.hasNext()) {
if(database.next() == test) {
database.next().add(another_list);
}
}
What am I doing wrong? I'm completely new to iterators in java. Do I need to write my own iterator class? Any code examples would be greatly appreciated. Thanks
The problem with your code is that every time you call .next(), it advances the iterator forward to the next position. This means that this code
if(database.next() == test) {
database.next().add(another_list);
}
Won't work as intended, because the first call to database.next() will not give back the same value as the second call to database.next(). To fix this, you'll want to make a temporary variable to hold on to the new value, as seen here:
while(iter.hasNext()) {
/* type */ curr = iter.next();
if(curr == test) {
curr.add(another_list);
}
}
(Filling in the real type of what's being iterated over in place of /* type */)
In many cases, though, you don't need to use iterators explicitly. Most of the Collections types implement the Iterable interface, in which case you can just write
/* container */ c;
for(/* type */ curr: c) {
if(curr == test) {
curr.add(another_list);
}
}
Hope this helps!
if(database.contains("test"))
{
another_list.add("test");
}
you can use the built in method contains(...)
you should use equals(...) for data comparisions
look at the javadoc to see if there is already a method present for your purpose

Categories

Resources