Debugging Challenge in regards to TreeSet

Debugging Challenge in regards to TreeSet - java

So this was going to be my question, but I actually figured out the problem while I was writing it. Perhaps this will be useful for others (I will remove the question if it's a duplicate or is deemed inappropriate for this site). I know of two possible solutions to my problem, but perhaps someone will come up with a better one than I thought of.
I don't understand why TreeSet isn't removing the first element here. The size of the my TreeSet is supposed to stay bounded, but appears to grow without bound.
Here is what I believe to be the relevant code:
This code resides inside of a double for loop. NUM_GROUPs is a static final int which is set to 100. newGroups is a TreeSet<TeamGroup> object which is initialized (with no elements) before the double for loop (the variables group and team are from the two for-each loops).
final TeamGroup newGroup = new TeamGroup(group, team);
newGroups.add(newGroup);
System.err.println("size of newGroups: " + newGroups.size());
if (newGroups.size() > NUM_GROUPS) {
System.err.println("removing first from newGroups");
newGroups.remove(newGroups.first());
System.err.println("new size of newGroups: "
+ newGroups.size());
}
I included my debugging statements to show that the problem really does appear to happen. I get the following types of output:
size of newGroups: 44011
removing first from newGroups
new size of newGroups: 44011
You see that although the if statement is clearly being entered, the size of the TreeSet<TeamGroup> teamGroups isn't being decremented. It would seem to me that the only way for this to happen is if the remove call doesn't remove anything--but how can it not remove something from a call to first() which should definitely be an element in the TreeSet?
Here is the compareTo method in my TeamGroup class (score is an int which could very reasonably be the same for many different TeamGroup objects hence why I use the R_ID field as a tie-breaker):
public int compareTo(TeamGroup o) {
// sorts low to high so that when you pop off of the TreeSet object, the
// lowest value gets popped off (and keeps the highest values).
if (o.score == this.score)
return this.R_ID - o.R_ID;
return this.score - o.score;
}
Here is the equals method for my TeamGroup class:
#Override
public boolean equals(final Object o) {
return this.R_ID == ((TeamGroup) o).R_ID;
}
...I'm not worried about a ClassCastException here because this is specifically pertaining to my above problem where I never try to compare a TeamGroup object with anything but another TeamGroup object--and this is definitely not the problem (at least not a ClassCastException problem).
The R_ID's are supposed to be unique and I guarantee this by the following:
private static final double WIDTH = (double) Integer.MAX_VALUE
- (double) Integer.MIN_VALUE;
private static final Map<Integer, Integer> MAPPED_IDS =
new HashMap<Integer, Integer>(50000);
...
public final int R_ID = TeamGroup.getNewID();
...
private static int getNewID() {
int randID = randID();
while (MAPPED_IDS.get(randID) != null) {
randID = randID();
}
MAPPED_IDS.put(randID, randID);
return randID;
}
private static int randID() {
return (int) (Integer.MIN_VALUE + Math.random() * WIDTH);
}

The problem is here:
return this.R_ID - o.R_ID;
It should be:
return Integer.compare(this.R_ID, o.R_ID);
Taking the difference of two int or Integer values works if the values are both guaranteed to be non-negative. However, in your example, you are using ID values across the entire range of int / Integer and that means that the subtraction can lead to overflow ... and an incorrect result for compareTo.
The incorrect implementation leads to situations where the compareTo method is not reflexive; i.e. integers I1, I2 and I3 where the compareTo method says that I1 < I2 and I2 < I3, but also I3 < I1. When you plug this into TreeSet, elements get inserted into the tree in the wrong place, and strange behaviours happen. Precisely what is happening is hard to predict - it will depend on the objects that are inserted, and the order they are inserted.
TreeSet.first() should definitely return an object which belongs to the set, right?
Probably ...
So then why can it not remove this object?
Probably because it can't find it ... because of the broken compareTo.
To understand what exactly is going on, you would been to single step through the TreeSet code, etcetera.

Related

Negative and positive return values of compare and compareTo

I read that the rule for the return value of these methods is that for obj1.compareTo(obj2) for example, if obj2 is under obj1 in the hierarchy, the return value is negative and if it's on top of obj1, then it's positive (and if it's equal then it's 0). However, in my class I saw examples where Math.signum was used in order to get -1 (for negative) and 1 (for positive) in the compareTo method.
Is there any reason for that?
EDIT:
Here is the code I meant:
Comparator comp = new Comparator() {
public int compare(Object obj1, Object obj2) {
Book book1 = (Book) obj1;
Book book2 = (Book) obj2;
int order = book1.getAuthor().compareTo(book2.getAuthor());
if (order == 0) {
order = (int) Math.signum(book1.getPrice() - book2.getPrice());
}
return order;
};

Is there any reason for using Math.signum
Yes there is.
order = (int) Math.signum(book1.getPrice() - book2.getPrice());
Suppose you have replace the above line with this
order = (int)(book1.getPrice() - book2.getPrice());
Now let us assume
book1.getPrice() returns 10.50
book2.getPrice() returns 10.40
If you do not use signum you will never have any compile time or run time error but value of order will be 0. This implies that book1 is equals to book2 which is logically false.
But if you use signum value of order will be 1 which implies book1 > book2.
But it must be mentioned that you should never make any assumption about compare function returning value between 1 and -1.
You can read official document for comparator http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html.

Any negative number will do to show that a < b. And any positive number will show that a > b. -1 and 1 serve that purpose just fine. There's no sense of being "more less than" or "more greater than"; they are binary attributes. The reason that any negative (or positive) value is permitted is probably historical; for integers it's common to implement the comparator by simple subtraction.

No.
PS: Frequent error in implementation is to use subtraction
public int compareTo(Object o) {
OurClass other = (OurClass)o; //Skip type check
return this.intField - other.intField;
}
It is wrong because if you call new OurClass(Integer.MIN_VALUE).compareTo(new OurClass(Integer.MAX_VALUE)) you get overflow. Probably Math.abs is attempt (failed) to deal with this problem.

The only reason I can see is that if you want to compare two ints for example (a and b), and you write
return a - b;
it might overflow. If you convert them to doubles and use (int)Math.signum( (double)a - (double)b ), you will definitely avoid that. But there are simpler ways of achieving the same effect, Integer.compare( a, b) for example.

immutable objects and lazy initialization.

http://www.javapractices.com/topic/TopicAction.do?Id=29
Above is the article which i am looking at. Immutable objects greatly simplify your program, since they:
allow hashCode to use lazy initialization, and to cache its return value
Can anyone explain me what the author is trying to say on the above
line.
Is my class immutable if its marked final and its instance variable
still not final and vice-versa my instance variables being final and class being normal.

As explained by others, because the state of the object won't change the hashcode can be calculated only once.
The easy solution is to precalculate it in the constructor and place the result in a final variable (which guarantees thread safety).
If you want to have a lazy calculation (hashcode only calculated if needed) it is a little more tricky if you want to keep the thread safety characteristics of your immutable objects.
The simplest way is to declare a private volatile int hash; and run the calculation if it is 0. You will get laziness except for objects whose hashcode really is 0 (1 in 4 billion if your hash method is well distributed).
Alternatively you could couple it with a volatile boolean but need to be careful about the order in which you update the two variables.
Finally for extra performance, you can use the methodology used by the String class which uses an extra local variable for the calculation, allowing to get rid of the volatile keyword while guaranteeing correctness. This last method is error prone if you don't fully understand why it is done the way it is done...

If your object is immutable it can't change it's state and therefore it's hashcode can't change. That allows you to calculate the value once you need it and to cache the value since it will always stay the same. It's in fact a very bad idea to implement your own hasCode function based on mutable state since e.g. HashMap assumes that the hash can't change and it will break if it does change.
The benefit of lazy initialization is that hashcode calculation is delayed until it is required. Many object don't need it at all so you save some calculations. Especially expensive hash calculations like on long Strings benefit from that.
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}
Edit: as pointed out by #assylias, using unsynchronized / non volatile code is only guaranteed to work if there is only 1 read of hashCode because every consecutive read of that field could return 0 even though the first read could already see a different value. Above version fixes the problem.
Edit2: replaced with more obvious version, slightly less code but roughly equivalent in bytecode
public int hashCode() {
int h = hashCode; // only read
return h != 0 ? h : (hashCode = a + b);
// ^- just a (racy) write to hashCode, no read
}

What that line means is, since the object is immutable, then the hashCode has to only be computed once. Further, it doesn't have to be computed when the object is constructed - it only has to be computed when the function is first called. If the object's hashCode is never used then it is never computed. So the hashCode function can look something like this:
#Override public int hashCode(){
synchronized (this) {
if (!this.computedHashCode) {
this.hashCode = expensiveComputation();
this.computedHashCode = true;
}
}
return this.hashCode;
}

And to add to other answers.
Immutable object cannot be changed. The final keyword works for basic data types such as int. But for custom objects it doesn't mean that - it has to be done internally in your implementation:
The following code would result in a compilation error, because you are trying to change a final reference/pointer to an object.
final MyClass m = new MyClass();
m = new MyClass();
However this code would work.
final MyClass m = new MyClass();
m.changeX();

Pre-condition vs Post-condition in java? [duplicate]

This question already has an answer here:
What are the differences pre condition ,post condition and invariant in computer terminology [closed]
(1 answer)
Closed 9 years ago.
For example I have the following code:
public class Calc(){
final int PI = 3.14; //is this an invariant?
private int calc(int a, int b){
return a + b;
//would the parameters be pre-conditions and the return value be a post-condition?
}
}
I am just confused on what exactly these terms mean? The code above is what I think it is, however can anyone point me into the right direction with my theory?

Your code is in a contract with other bits and pieces of code. The pre-condition is essentially what must be met initially in order for your code to guarantee that it will do what it is supposed to do.
For example, a binary search would have the pre-condition that the thing you are searching through must be sorted.
On the other hand, the post-condition is what the code guarantees if the pre-condition is satisfied. For example, in the situation of the binary search, we are guaranteed to find the location of what we were searching for, or return -1 in the case where we don't find anything.
The pre-condition is almost like another thing on top of your parameters. They don't usually affect code directly, but it's useful when other people are using your code, so they use it correctly.

A invariant is a combined precondition and postcondition. It has to be valid before and after a call to a method. A precondition has to be fullfilled before a method can be run and a postcondition afterwards.
Java has no mechanisms for the condition checking built in but, here's a little example.
public class Calc {
private int value = 0;
private boolean isValid() {
return value >= 0;
}
// this method has the validity as invariant. It's true before and after a successful call.
public void add(int val) {
// precondition
if(!isValid()) {
throw new IllegalStateException();
}
// actual "logic"
value += val;
// postcondition
if(!isValid()) {
throw new IllegalStateException();
}
}
}
As you can see the conditions can be violated. In this case you (normally) use exceptions in Java.

private int calc(int a, int b){
return a + b;
//would the parameters be pre-conditions and the return value be a post-condition?
}
Is a function that takes two int and returns an int, which is the summation of a and b.
You would normally call the calc function in main as
public static void main(String[] args)
{
int a = 3, b = 4;
int sum = calc(a, b);
}
when you do that, a copy of a and b is passed to calc but the original values of a and b are not affected by the calc function as parameters are passed by value in Java.

A precondition is something that has to be true about the parameters that a function takes. So it isn't enough to say what the variables are, but you need to say something about their nature. For example, a and b must be integers. A post condition states what must be true after the function completes. In your example, it would be the fact that your function must produce the sum of a and b. The precondition and post condition can actually result in two methods, especially in a language like Java. What if you had a precondition that stated simply "The two parameters must be numerical". Then you would have to account for not only integers, but floating points.
Hope that helps.

Just a word of warning, casting a floating-point number (3.14) to an int is going to leave you with trouble. You might want to cast it to a float:
final float PI = 3.14f;
final means that the variable can no longer be changed.
a and b are just parameters that you pass into calc(). Before, they can be called whatever you want them to be, but inside calc() you can refer to them as a and b.
So you can have this:
int foo = 5;
int bar = 7;
int sum = calc(foo, bar); //12

Overriding equals without custom class

I'm trying to store a set of possible choices and eliminate duplicates so I'm storing the choices I've made in a HashSet. I have two pieces of data for each step and the combination of both must not be unique in order for it to be considered a duplicate (e.g. [2,0], [0,2], [2,2] would all be new steps but then going to [2,2] would be a duplicate).
I believe I need to override equals in order to properly determine if the step is already in the HashSet but I am not using a custom class, just an array of Integers, so most of what I've found isn't applicable (to my knowledge). This seems like it may be useful, suggesting the possibility of subclassing HashSet but I'd like to avoid that if possible. I was hoping the equals method I have commented out would work but it was never called. Do I need to override hashCode() as well? I know they go hand in hand. Do I just need to go write my own class or is there another method of doing what I want?
import java.util.HashSet;
public class EP2 {
public static long count = 0;
public static HashSet<Integer []> visits = new HashSet<Integer []>();
//#Override
//public boolean equals(Object j){
// return true;
//}
public static void main(String[] args) {
int position = 0;
int depth = 0;
walk(position, depth);
System.out.println(count);
}
public static void walk(int position, int depth){
count++;
Integer[] specs = new Integer[2];
specs[0] = position;
specs[1] = depth;
visits.add(specs);
Integer[] specL = new Integer[]{position - 1, depth+1};
Integer[] specR = new Integer[]{position + 1, depth+1};
//doesn't avoid [0,2] duplicates
if(depth < 2){
if(!visits.contains(specL)){
walk(position - 1, depth+1); //walk left
}
if(!visits.contains(specR)){
walk(position + 1, depth+1); //walk right
}
}
}
}

In Java, hashCode() and equals(Object) go together. If you override one, you should override the other. When Java looks up an object in a HashSet, it first computes the hashCode to determine which bucket the object might be found in. Then it uses equals(Object) to see whether the set has the object. Additionally, changing an object that's in a HashSet will lead to problems, as it might end up in the wrong bucket, and never be found again.
You may want to write your own immutable class, Position, that contains a constructor, a position and depth variables, getters, equals(Object), and hashCode(). The members of the Integer[] arrays have meaning, so you should probably state those explicitly.

The problem is that equals() for an Array checks if the arrays are the same instance. In your case, they are probably not. See a good question and answers here.
Your HashSet will call equals() for all elements in the set, hence it will return false unless all arrays are the same instance.
Changing the array to a List would probably work, since it checks that all containing elements are equal to one another. For Integers, this of course works.
I would however, implement my own class. If you don't have any such restrictions, you should do so.

If you're simply trying to check for duplicates, write a custom method to check if a Set contains an int[].
public static boolean contains(HashSet<Integer []> set, int[] step) {
Iterator<Integer []> it = set.iterator();
boolean flag = false;
while(it.hasNext()) {
Integer[] inner = it.next();
if (step.length == inner.length) {
for (int i = 0; i < inner.length; i++) {
if (inner[i].equals(step[i]))
flag = true;
else
flag = false;
}
if (flag)
return true;
}
}
return false;
}
You should follow your rules. For example, if you know the size of the arrays is always going to be 2, then maybe you don't need to do the check and can quickly just check each value at the same index of each array.
You would call this method any time you wanted to add something to the Set.

How to keep a "things done" count in a recursive algorithm in Java?

I have a recursive algorithm which steps through a string, character by character, and parses it to create a tree-like structure. I want to be able to keep track of the character index the parser is currently at (for error messages as much as anything else) but am not keen on implementing something like a tuple to handle multiple returned types.
I tried using an Integer type, declared outside the method and passed into the recursive method, but because it's final, recursive call increments are "forgotten" when I return. (Because the increment of the Integer value makes the passed-by-value object reference point at a new object)
Is there a way to get something similar to work which won't pollute my code?

Since you've already discovered the pseudo-mutable integer "hack," how about this option:
Does it make sense for you to make a separate Parser class? If you do this, you can store the current state in a member variable. You probably need to think about how you're going to handle any thread safety issues, and it might be overkill for this particular application, but it might work for you.

It's kind of a hack, but sometimes I use an AtomicInteger, which is mutable, to do things like this. I've also seen cases where an int[] of size 1 is passed in.

The current solution I am using is:
int[] counter = {0};
and then pass it to the recursive algorithm:
public List<Thing> doIt (String aString, int[] counter) { ... }
and when I want to increment it:
counter[0]++;
Not super elegant, but it works...

Integers are immutable, which means that when you pass it as an argument it creates a copy rather than a reference to the same item. (explanation).
To get the behavior you're looking for, you can write your own class which is like Integer only mutable. Then, just pass it to the recursive function, it is incremented within the recursion, and when you access it again after the recursion is over it will still maintain its new values.
Edit: Note that using an int[] array is a variation on this method... In Java, arrays are also passed by reference rather than copied like primitives or immutable classes.

You could just use a static int class variable that gets incremented each time your doIt method is called.

You could also do:
private int recurse (int i) {
if (someConditionkeepOnGoing) {
i = recurse(i+1);
}
return i;
}

To be honest I would recode the function to make it a linear algorithm that uses a loop. This way you have no chance of running out of heap space if you are stepping through an extremely large string. Also, you would not need to have a the extra parameter just to keep track of the count.
This also would probably have the result of making the algorithm faster because it does not need to make a function call for every character.
Unless of course there is a specific reason it needs to be recursive.

One possibility I can think of is to store the count in a member variable of the class. This of course assumes that the public doIt method is only called by a single thread.
Another option is to refactor the public method to call a private helper method. The private method takes the list as a parameter and returns the count. For example:
public List<Thing> doIt(String aString) {
List<Thing> list = new ArrayList<Thing>();
int count = doItHelper(aString, list, 0);
// ...
return list;
}
private int doItHelper(String aString, List<Thing> list, int count) {
// ...
// do something that updates count
count = doItHelper(aString, list, count);
// ...
return count;
}
This assumes that you can do the error handling in the public doIt method, since the count variable isn't actually passed back to the caller. If you need to do that, you could of course throw an exception:
public List<Thing> doIt(String aString) throws SomeCustomException {
List<Thing> list = new ArrayList<Thing>();
int count = doItHelper(aString, list, 0);
// ...
if (someErrorOccurred) {
throw new SomeCustomException("Error occurred at chracter index " + count, count);
}
return list;
}
It's difficult to know whether that will help without knowing more about how your algorithm actually works.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Debugging Challenge in regards to TreeSet - java

Related

Negative and positive return values of compare and compareTo

immutable objects and lazy initialization.

Pre-condition vs Post-condition in java? [duplicate]

Overriding equals without custom class

How to keep a "things done" count in a recursive algorithm in Java?

Categories

Resources