Un-overiding hashCode - java

I have the following situation: I have many BSTs, and I want to merge isomorphic subtrees to save space.
I am hashing Binary Search Tree nodes into a "unique table" - basically a hash of BST nodes.
Nodes that have the same left and right child and the same key have the same hash code, and I have overridden equals for the node class appropriately.
Everything works, except that computing the hash is expensive - it involves computing the hash for the child nodes.
I would like to cache the hashed value for a node. The problem I have is the natural way of doing this, a HashMap from nodes to integers, will itself call the hash function on the nodes.
I've gotten around this by declaring a new field in the nodes, which I use to store the hash code. However, I feel this is not the right solution.
What I really want is to to map nodes to their hash codes using a hash which uses the node's address. I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work (inserts into the hash still call the node hash and equality functions.
I would appreciate insight into the best way of implementing the node to hash code cache. I've attached code below illustrating what's going on below.
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
class Bst {
int key;
String name;
Bst left;
Bst right;
public Bst( int k, String name, Bst l, Bst r ) {
this.key = k;
this.name = name;
this.left = l;
this.right = r;
}
public String toString() {
String l = "";
String r = "";
if ( left != null ) {
l = left.toString();
}
if ( right != null ) {
r = right.toString();
}
return key + ":" + name + ":" + l + ":" + r;
}
#Override
public boolean equals( Object o ) {
System.out.println("calling Bst's equals");
if ( o == null ) {
return false;
}
if ( !(o instanceof Bst) ) {
return false;
}
Bst n = (Bst) o;
if ( n == null || n.key != key ) {
return false;
} else if ( n.left != null && left == null || n.right != null && right == null ||
n.left == null & left != null || n.right == null && right != null ) {
return false;
} else if ( n.left != null && n.right == null ) {
return n.left.equals( left );
} else if ( n.left != null && n.right != null ) {
return n.left.equals( left ) && n.right.equals( right );
} else if ( n.left == null && n.right != null ) {
return n.right.equals( right );
} else {
return true;
}
}
#Override
public int hashCode() {
// the real hash function is more complex, entails
// calling hashCode on children if they are not null
System.out.println("calling Bst's hashCode");
return key;
}
}
public class Hashing {
static void p(String s) { System.out.println(s); }
public static void main( String [] args ) {
Set<Bst> aSet = new HashSet<Bst>();
Bst a = new Bst(1, "a", null, null );
Bst b = new Bst(2, "b", null, null );
Bst c = new Bst(3, "c", null, null );
Bst d = new Bst(1, "d", null, null );
a.left = b;
a.right = c;
d.left = b;
d.right = c;
aSet.add( a );
if ( aSet.contains( d ) ) {
p("d is a member of aSet");
} else {
p("d is a not member of aSet");
}
if ( a.equals( d ) ) {
p("a and d are equal");
} else {
p("a and d are not equal");
}
// now try casts to objects to avoid calling Bst's HashCode and equals
Set<Object> bSet = new HashSet<Object>();
Object foo = new Bst( a.key, a.name, a.left, a.right );
Object bar = new Bst( a.key, a.name, a.left, a.right );
bSet.add( foo );
p("added foo");
if ( bSet.contains( bar ) ) {
p("bar is a member of bSet");
} else {
p("bar is a not member of bSet");
}
}
}

Storing the hash in a field in the node feels like exactly the right solution to me. It's also what java.lang.String uses for its own hash code. Aside from anything else, it means that you can't possibly end up with cache entries for objects which can otherwise be collected, etc.
If you really want the value of hashCode that would be returned by the implementation in Object, you can use System.identityHashCode though. You shouldn't rely on this - or any other hash code - being unique though.
One other point: your tree is mutable at the moment by virtue of the fields being package access. If you cache the hash code the first time you call it, you won't "notice" if it would have changed due to fields changing. Basically you shouldn't change a node after you've used its hash code.

Java's built-in IdentityHashMap does what you're describing.
That said, Jon Skeet's answer sounds more like the right way to go.

storing the hash in a field can actually be equivalent to "caching" the value so that it does not have to be recomputed too frequently.
It's not necessarily a bad practice, but you have to make sure that you are clearing/recomputing it correctly whenever there is a change, which can be daunting if you have to notify of a change up or down a complex graph or tree.
If you want to use a hash code computed by the JVM (roughly based on the "RAM address" of the object, even if it's value is implementation specific), you can use System.identityHashCode(x), which does exactly that, and exactly what Object.hashCode does.

What I really want is to to map nodes to their hash codes using a hash which uses the node's address.
What do you mean by the node's address? There is no such concept in Java, and there is no unique identifier for objects that I know of, like the physical address in non VM based languages e.g. C++. References in Java are not memory addresses, and objects may be relocated in memory anytime by the GC.
I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work
Indeed, since hashCode is virtual, and is overridden in your node class, so always the subclass implementation will be called, regardless of the static type of the reference you have.
I am afraid any attempt to use a map to cache hash values bumps into the same chicken and egg problem, that - as you mention - the map needs the hash value itself first.
I don't see any better way than caching the hash values within the nodes as you did. You need to ensure though that the cached values are invalidated whenever the child nodes change. Wrong - as Jon's answer points out, changing the hashcode of an object after it is stored in a map breaks the map's internal integrity, so it must not happen.

Related

Comparing two large lists in java

I have to Array lists with 1000 objects in each of them. I need to remove all elements in Array list 1 which are there in Array list 2. Currently I am running 2 loops which is resulting in 1000 x 1000 operations in worst case.
List<DataClass> dbRows = object1.get("dbData");
List<DataClass> modifiedData = object1.get("dbData");
List<DataClass> dbRowsForLog = object2.get("dbData");
for (DataClass newDbRows : dbRows) {
boolean found=false;
for (DataClass oldDbRows : dbRowsForLog) {
if (newDbRows.equals(oldDbRows)) {
found=true;
modifiedData.remove(oldDbRows);
break;
}
}
}
public class DataClass{
private int categoryPosition;
private int subCategoryPosition;
private Timestamp lastUpdateTime;
private String lastModifiedUser;
// + so many other variables
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DataClass dataClassRow = (DataClass) o;
return categoryPosition == dataClassRow.categoryPosition
&& subCategoryPosition == dataClassRow.subCategoryPosition && (lastUpdateTime.compareTo(dataClassRow.lastUpdateTime)==0?true:false)
&& stringComparator(lastModifiedUser,dataClassRow.lastModifiedUser);
}
public String toString(){
return "DataClass[categoryPosition="+categoryPosition+",subCategoryPosition="+subCategoryPosition
+",lastUpdateTime="+lastUpdateTime+",lastModifiedUser="+lastModifiedUser+"]";
}
public static boolean stringComparator(String str1, String str2){
return (str1 == null ? str2 == null : str1.equals(str2));
}
public int hashCode() {
int hash = 7;
hash = 31 * hash + (int) categoryPosition;
hash = 31 * hash + (int) subCategoryPosition
hash = 31 * hash + (lastModifiedUser == null ? 0 : lastModifiedUser.hashCode());
return hash;
}
}
The best work around i could think of is create 2 sets of strings by calling tostring() method of DataClass and compare string. It will result in 1000 (for making set1) + 1000 (for making set 2) + 1000 (searching in set ) = 3000 operations. I am stuck in Java 7. Is there any better way to do this? Thanks.
Let Java's builtin collections classes handle most of the optimization for you by taking advantage of a HashSet. The complexity of its contains method is O(1). I would highly recommend looking up how it achieves this because it's very interesting.
List<DataClass> a = object1.get("dbData");
HashSet<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
return a;
And it's all done for you.
EDIT: caveat
In order for this to work, DataClass needs to implement Object::hashCode. Otherwise, you can't use any of the hash-based collection algorithms.
EDIT 2: implementing hashCode
An object's hash code does not need to change every time an instance variable changes. The hash code only needs to reflect the instance variables that determine equality.
For example, imagine each object had a unique field private final UUID id. In this case, you could determine if two objects were the same by simply testing the id value. Fields like lastUpdateTime and lastModifiedUser would provide information about the object, but two instances with the same id would refer to the same object, even if the lastUpdateTime and lastModifiedUser of each were different.
The point is that if you really want to want to optimize this, include as few fields as possible in the hash computation. From your example, it seems like categoryPosition and subCategoryPosition might be enough.
Whatever fields you choose to include, the simplest way to compute a hash code from them is to use Objects::hash rather than running the numbers yourself.
It is a Set A-B operation(only retain elements in Set A that are not in Set B = A-B)
If using Set is fine then we can do like below. We can use ArrayList as well in place of Set but in AL case for each element to remove/retain check it needs to go through an entire other list scan.
Set<DataClass> a = new HashSet<>(object1.get("dbData"));
Set<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
If ordering is needed, use TreeSet.
Try to return a set from object1.get("dbData") and object2.get("dbData") that skips one more intermediate collection creation.

Equivalent subtree

I have two trees. The tree Node is defined as
class Node{
String treeId;
String type; //Each node has type which has fixed value. For example, its color: RED, BLANK, GREEN
Set<Node> children;
String ref; //The ref is a string and allowed value are "0", "1",..."10". The value is null if it is not leaf.
};
For leaf, the children set is empty.
I am wondering whether there is some existing efficient work done how to identify equivalent substree for two given tree. The equivalent is defined as:
1) Both subtree leaves are setsets leaves of original tree.
2) Both subtrees leaves have same ref value.
3) for non-leaves node, the equivalent refers to both node have same type and equivalent children.
Thanks. It would be better if there is some Java library addressing this problem.
The input should are two tree roots while output is the Node that is root of equivalent subtree. An the the tree's height is 100~ and it has more than 500 nodes.
What i did now is that I added a new field for class Node.
class Cache{
Map<String, Set<String>> map = new LinkedHashMap<String, Set<Str>>();
}
The key of map is Node id while the value is a ref set this node of this nodeid can reach. The Cache initiated when Node is initialized.
During isEquivalent compare phase, check whether overlap exists between two root's ref set. Return false if none.
I think this can help reduce the number of comparison space.
I am not sure about 1) Both subtree leaves are leaves of original tree. requirement as it seems to conflict with how to identify equivalent substree for two given tree.. Otherwise following recursive method should be able to cover other two conditions. The haveSameOriginalTree(r1, r2) method may be implemented to satisfy the first condition that I couldn't understand. r1 and r2 are roots of two subtrees that need to be checked for equivalence.
bool areEquivalent(Node r1, Node r2)
{
if(r1.children == null && r2.children == null)
{
return (haveSameOriginalTree(r1, r2) && (r1.ref == r2.ref));
}
if((r1.children == null && r2.children != null) || (r1.children != null && r2.children == null))
{
return false;
}
// if here then both must be non-leaf nodes
if(r1.type != r2.type)
{
return false;
}
if(r1.children.getCount() != r2.children.getCount()) // not sure of correct syntax for Java Sets
{
return false;
}
for(int i=0; i<r1.children.getCount(); i++)
{
if(!areEquivalent(r1.children[i], r2.children[i])) // again please correct the syntax for Sets
{
return false;
}
}
return true;
}
Let me know what you think.
Update
Here is an iterative version of the above solution. It uses stack data structure which is allocated on the heap rather than pushed on function's call stack, so not hugely different from recursive but still better. Also, since we only hold references to Nodes (rather than copying the whole object), this shouldn't be that much of an additional memory overhead if we are already loading the original tree into memory.
bool areEquivalent(Node r1, Node r2)
{
Stack<Node> s1 = new Stack<Node>();
Stack<Node> s2 = new Stack<Node>();
Node n1, n2;
s1.Push(r1);
s2.Push(r2);
while(true) // Need a better check
{
if(s1.getCount() != s2.getCount())
{
return false;
}
if(s1.getCount() == 0) // if both stacks are empty then we've traversed both trees without failure.
{
return true;
}
n1 = s1.Pop();
n2 = s2.Pop();
if(!areEquivalentNodes(n1, n2))
{
return false;
}
foreach(Node child in n1.children)
{
s1.Push(child);
}
foreach(Node child in n2.children)
{
s2.Push(child);
}
}
}
// only checks the two nodes are equivalent. their childrens' equivalence will be handled by other calls to this method.
bool areEquivalentNodes(Node n1, Node n2)
{
if(n1.children.getCount() != n2.children.getCount())
{
return false;
}
if(n1.children.getCount() == 0) // if both are leaf nodes...
{
if(n1.ref != n2.ref)
{
return false;
}
}
else // both are non-leaf
{
if(n1.type != n2.type)
{
return false;
}
// the condition that children of non-leaf nodes be equivalent will be covered by subsequent calls this method...
}
return true;
}
Please note that both solutions expect children of two equivalent nodes in the same order. If children are not ordered then we will need to sort them before calling above code.
Let me know if this is better.

Java - map.get don't work, but element is in map?

So... all is in code:
// get vector...
SignVector v = ...;
//print to console: [1058, 5, 820 in flat]
System.out.println(v);
//size: 1
System.out.println("size: " + signs.size());
//check all elements...
for (Entry<SignVector, FakeSign> entry : signs.entrySet())
{
// get key
SignVector key = entry.getKey();
//print to console: [1058, 5, 820 in flat] (YaY! it's that key! like v)
System.out.println(key);
if (key.equals(v))
{
// print: "YaY: "
System.out.println("YaY: [1058, 5, 820 in flat]"+key);
}
}
//So second test... just get it from map: null
System.out.println(signs.get(v));
Why that return null?
In JavaDocs is written that: map.get using key.equals(k) so why my code return good object, but map.get return null?
Map:
private final Map<SignVector, FakeSign> signs = new HashMap<>()
Equals method form SignVector for #home user
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SignVector other = (SignVector) obj;
// w can't be null so I skip that
System.out.print(w.getName() + ", " + other.w.getName() + ", " + (w.getName().equals(other.w.getName()))); // this same
if (!w.getName().equals(other.w.getName()))
return false;
if (x != other.x)
return false;
if (y != other.y)
return false;
if (z != other.z)
return false;
return true;
}
But this method works good, always return that I want, x,y,z is int, and w is custom object.
The javadoc is a bit misleading, but it's relying on the fact that if you implement equals, you should also implement hashcode to be consistent. As the doc states:
Many methods in Collections Framework interfaces are defined in terms
of the equals method. For example, the specification for the
containsKey(Object key) method says: "returns true if and only if this
map contains a mapping for a key k such that (key==null ? k==null :
key.equals(k))."
This specification should not be construed to imply
that invoking Map.containsKey with a non-null argument key will cause
key.equals(k) to be invoked for any key k.
Implementations are free to
implement optimizations whereby the equals invocation is avoided, for
example, by first comparing the hash codes of the two keys. (The
Object.hashCode() specification guarantees that two objects with
unequal hash codes cannot be equal.)
More generally, implementations
of the various Collections Framework interfaces are free to take
advantage of the specified behavior of underlying Object methods
wherever the implementor deems it appropriate.
Let's take a look a the underlying implementation of get for an HashMap.
314 public V get(Object key) {
315 if (key == null)
316 return getForNullKey();
317 int hash = hash(key.hashCode());
318 for (Entry<K,V> e = table[indexFor(hash, table.length)];
319 e != null;
320 e = e.next) {
321 Object k;
322 if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
323 return e.value;
324 }
325 return null;
326 }
You see that is uses the hashcode of the object to find the possible entries in the table and THEN it uses equals to determine which value it has to return. Since the entry is probably null, the for loop is skipped and get returns null.
Override hashCode in your SignVector class to be consistent with equals and everything should work fine.
From the javadocs:
If this map permits null values, then a return value of null does not necessarily indicate that the map contains no mapping for the key; it's also possible that the map explicitly maps the key to null. The containsKey operation may be used to distinguish these two cases.
Unless you share with us how you built the map, we can't help you if this is the case. The code you shared should otherwise be working just fine.
http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#get%28java.lang.Object%29

Tree Implementation with Generic Types in Java

Firstly I ve searched about usage of Generic Types in java, however answers I ve found was way too simple or complicated. So here is my exact question.
I have three classes respectively PerfectTreeControl, Tree and Entry.
Tree has
public class Tree<K> {
public Entry <K> root;
Entry has
public class Entry<K> {
public K element;
public Entry<K> parent, left_child, right_child;
public Entry(K element) {
this.element = element;
}
public Entry(K element, Entry<K> left, Entry<K> right) {
left_child = left;
right_child = right;
this.element = element;
}
I am trying to understand what is the difference between Entry parent and Entry <K> parent? I know that K element can be used as integer, String or whatever I want, but does the same thing goes for the object? I tried to use Entry variables without parameter and it only said that Entry is a raw type and should be parameterized and it still working without error.
My second question is about checking out a tree whether its perfect or not. Here are the some codes I ve tried so far:
public class PerfectTreeControl {
public static boolean isPerfect(Tree<String> tree) {
Tree t1 = new Tree();
if( t1.isFull( tree.root ) ) {
int depth = t1.height(tree.root);
return t1.everyLeafHasSameDepth(tree.root, depth);
}
else
return false;
}
}
public class Tree<K> {
public Entry <K> root;
public boolean isLeaf(Entry e) {
return e.left_child == null &&
e.right_child == null;
}
public int height(Entry e) {
if( e == null ||
e.left_child == null &&
e.right_child == null )
return 0;
int left = height( e.left_child );
int right = height( e.right_child );
return 1 + Math.max(left, right);
}
public boolean isFull(Entry base) {
if( isLeaf(base) )
return true;
else
if( base.left_child != null && base.right_child != null ) {
return isFull(base.left_child) &&
isFull(base.right_child);
} else {
return false;
}
}
public int depth(Entry e) {
if( e == root ) {
return 0;
} else {
return 1 + depth(e.parent);
}
}
public boolean everyLeafHasSameDepth(Entry base, int depth) {
if( base == null )
return false;
else if(isLeaf(base) )
return depth( base ) == depth;
else {
return
everyLeafHasSameDepth(base.left_child, depth) &&
everyLeafHasSameDepth(base.right_child, depth);
}
}
entry class(I wrote it at the top of the page) As you can see, isPerfect method in the PerfectTreeControl class uses Tree -String- tree as a paramater and I have no idea what it is. In the Tree class, I tried Entry with and and again no difference. The code won't work properly, and I am totally confused.
Generics in Java are, fundamentally, a way to name a particular class within an object with knowing which class until that object is declared. This is useful because it allows the compiler to enforce consistency among references to that class.
More concretely, in your class Entry<K>, any time you reference K, the Java compiler will enforce that all references of type K are, in fact, treated as type K. For instance, if you create an object of type Entry<String>, the element member of that object must be of type String, the parent member must be of type Entry<String>, etc. If you had a method that returned a K, the compiler would recognize that the return value is String. If the compiler sees an inconsistency here - say, if you try to set member's value to an Integer - it will complain.
Keep in mind that qualities I describe in the example above is all in reference to the particular Entry<String> object that you've defined. If you instead define an Entry<Integer>, without updating your Entry class, the consistency is enforced within that new object - except this time with K meaning Integer.
If you create an object without specifying a type argument for K, you are using a "raw type". This prevents the compiler from enforcing consistency rules and it will assume that the type of K is Object. This means you'll have to start worrying about casting, which can be tedious to do properly.
To check if a tree is full (or "perfect"), the most intuitive approach is a recursive one. The recursive rule to use in this scenario is "if a tree's children are perfect and have the same depth, the tree is perfect."

Strange Java HashMap behavior - can't find matching object

I've been encountering some strange behavior when trying to find a key inside a java.util.HashMap, and I guess I'm missing something. The code segment is basically:
HashMap<Key, Value> data = ...
Key k1 = ...
Value v = data.get(k1);
boolean bool1 = data.containsKey(k1);
for (Key k2 : data.keySet()) {
boolean bool2 = k1.equals(k2);
boolean bool3 = k2.equals(k1);
boolean bool4 = k1.hashCode() == k2.hashCode();
break;
}
That strange for loop is there because for a specific execution I happen to know that data contains only one item at this point and it is k1, and indeed bool2, bool3 and bool4 will be evaluated to true in that execution. bool1, however, will be evaluated to false, and v will be null.
Now, this is part of a bigger program - I could not reproduce the error on a smaller sample - but still it seems to me that no matter what the rest of the program does, this behavior should never happen.
EDIT: I have manually verified that the hash code does not change between the time the object was inserted to the map and the time it was queried. I'll keep checking this venue, but is there any other option?
This behavior could happen if the hash code of the key were changed after it was inserted in to the map.
Here's an example with the behavior you described:
public class Key
{
int hashCode = 0;
#Override
public int hashCode() {
return hashCode;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Key other = (Key) obj;
return hashCode == other.hashCode;
}
public static void main(String[] args) throws Exception {
HashMap<Key, Integer> data = new HashMap<Key, Integer>();
Key k1 = new Key();
data.put(k1, 1);
k1.hashCode = 1;
boolean bool1 = data.containsKey(k1);
for (Key k2 : data.keySet()) {
boolean bool2 = k1.equals(k2);
boolean bool3 = k2.equals(k1);
boolean bool4 = k1.hashCode() == k2.hashCode();
System.out.println("bool1: " + bool1);
System.out.println("bool2: " + bool2);
System.out.println("bool3: " + bool3);
System.out.println("bool4: " + bool4);
break;
}
}
}
From the API description of the Map interface:
Note: great care must be exercised if
mutable objects are used as map keys.
The behavior of a map is not specified
if the value of an object is changed
in a manner that affects equals
comparisons while the object is a key
in the map. A special case of this
prohibition is that it is not
permissible for a map to contain
itself as a key. While it is
permissible for a map to contain
itself as a value, extreme caution is
advised: the equals and hashCode
methods are no longer well defined on
such a map.
Also, there are very specific requirements on the behavior of equals() and hashCode() for types used as Map keys. Failure to follow the rules here will result in all sorts of undefined behavior.
If you're certain the hash code does not change between the time the key is inserted and the time you do the contains check, then there is something seriously wrong somewhere. Are you sure you're using a java.util.HashMap and not a subclass of some sort? Do you know what implementation of the JVM you are using?
Here's the source code for java.util.HashMap.getEntry(Object key) from Sun's 1.6.0_20 JVM:
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
As you can see, it retrieves the hashCode, goes to the corresponding slot in the table, then does an equals check on each element in that slot. If this is the code you're running and the hash code of the key has not changed, then it must be doing an equals check which must be failing.
The next step would be for you to give us some more code or context - the hashCode and equals methods of your Key class at a minimum.
Alternatively, I would recommend hooking up to a debugger if you can. Watch what bucket your key is hashed to, and step through the containsKey check to see where it's failing.
Is this application multi-threaded? If so, another thread could change the data between the data.containsKey(k1) call and the data.keySet() call.
If equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.
For Reference:
http://www.ibm.com/developerworks/java/library/j-jtp05273.html
Perhaps the Key class looks like
Key
{
boolean equals = false ;
public boolean equals ( Object oth )
{
try
{
return ( equals ) ;
}
finally
{
equals = true ;
}
}
}

Categories

Resources