Compound String key in HashMap - java

We are storing a String key in a HashMap that is a concatenation of three String fields and a boolean field. Problem is duplicate keys can be created if the delimiter appears in the field value.
So to get around this, based on advice in another post, I'm planning on creating a key class which will be used as the HashMap key:
class TheKey {
public final String k1;
public final String k2;
public final String k3;
public final boolean k4;
public TheKey(String k1, String k2, String k3, boolean k4) {
this.k1 = k1; this.k2 = k2; this.k3 = k3; this.k4 = k4;
}
public boolean equals(Object o) {
TheKey other = (TheKey) o;
//return true if all four fields are equal
}
public int hashCode() {
return ???;
}
}
My questions are:
What value should be returned from hashCode(). The map will hold a total of about 30 values. Of those 30, there are about 10 distinct values of k1 (some entries share the same k1 value).
To store this key class as the HashMap key, does one only need to override the equals() and hashCode() methods? Is anything else required?

Just hashCode and equals should be fine. The hashCode could look something like this:
public int hashCode() {
int hash = 17;
hash = hash * 31 + k1.hashCode();
hash = hash * 31 + k2.hashCode();
hash = hash * 31 + k3.hashCode();
hash = hash * 31 + k4 ? 0 : 1;
return hash;
}
That's assuming none of the keys can be null, of course. Typically you could use 0 as the "logical" hash code for a null reference in the above equation. Two useful methods for compound equality/hash code which needs to deal with nulls:
public static boolean equals(Object o1, Object o2) {
if (o1 == o2) {
return true;
}
if (o1 == null || o2 == null) {
return false;
}
return o1.equals(o2);
}
public static boolean hashCode(Object o) {
return o == null ? 0 : o.hashCode();
}
Using the latter method in the hash algorithm at the start of this answer, you'd end up with something like:
public int hashCode() {
int hash = 17;
hash = hash * 31 + ObjectUtil.hashCode(k1);
hash = hash * 31 + ObjectUtil.hashCode(k2);
hash = hash * 31 + ObjectUtil.hashCode(k3);
hash = hash * 31 + k4 ? 0 : 1;
return hash;
}

In Eclipse you can generate hashCode and equals by Alt-Shift-S h.

Ask Eclipse 3.5 to create the hashcode and equals methods for you :)

this is how a well-formed equals class with equals ans hashCode should look like: (generated with intellij idea, with null checks enabled)
class TheKey {
public final String k1;
public final String k2;
public final String k3;
public final boolean k4;
public TheKey(String k1, String k2, String k3, boolean k4) {
this.k1 = k1;
this.k2 = k2;
this.k3 = k3;
this.k4 = k4;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
TheKey theKey = (TheKey) o;
if (k4 != theKey.k4) return false;
if (k1 != null ? !k1.equals(theKey.k1) : theKey.k1 != null) return false;
if (k2 != null ? !k2.equals(theKey.k2) : theKey.k2 != null) return false;
if (k3 != null ? !k3.equals(theKey.k3) : theKey.k3 != null) return false;
return true;
}
#Override
public int hashCode() {
int result = k1 != null ? k1.hashCode() : 0;
result = 31 * result + (k2 != null ? k2.hashCode() : 0);
result = 31 * result + (k3 != null ? k3.hashCode() : 0);
result = 31 * result + (k4 ? 1 : 0);
return result;
}
}

The implementation of your hashCode() doesn't matter much unless you make it super stupid. You could very well just return the sum of all the strings hash codes (truncated to an int) but you should make sure you fix this:
If your hash code implementation is slow, consider caching it in the instance. Depending on how long your key objects stick around and how they are used with the hash table when you get things out of it you may not want to spend longer than necessary calculating the same value over and over again. If you stick with Jon's implementation of hashCode() there is probably no need for it as String already cache its hashCode() for you.
This is however more of a general advice, since the mid 90's I've seen quite a few developers get stung on slow (and even worse, changing) hashCode() implementations.
Don't be sloppy when you create the equals() implementation. Your equals() above will be both ineffective and flawed. First of all you don't need to compare the values if the objects have different hash codes. You should also return false (and not a null pointer exception) if you get a null as the argument.
The rules are simple, this page will walk you through them.
Edit:
I have to ask one more thing... You say "Problem is duplicate keys can be created if the delimiter appears in the field value". Why is that?
If the format is key+delimiter+key+delimiter+key it really doesn't matter if there are one or more delimiters in the keys unless you get really unlucky with a combination of two keys and in that case you probably should have selected another delimiter (there are quite a few to choose from in unicode).
Anyway, Jon is right in his comment below... Don't do caching "until you've proven it's a good thing". It is a good practice always.

Have you taken a look at the specifications of hashCode()? Perhaps this will give you a better idea of what the function should return.

I do not know if this is an option for you but apache commons library provides an implementation for MultiKeyMap

For the hashCode, you could instead use something like
k1.hashCode() ^ k2.hashCode() ^ k3.hashCode() ^ k4.hashCode()
XOR is entropy-preserving, and this incorporates k4's hashCode in a much better way than the previous suggestions. Just having one bit of information from k4 means that if all your composite keys have identical k1, k2, k3 and only differing k4s, your hash codes will all be identical and you'll get a degenerate HashMap.

I thought your main concern was speed (based on your original post)? Why don't you just make sure you use a separator which does not occur in your (handfull of) field values? Then you can just create String key using concatenation and do away with all this 'key-class' hocus pocus. Smells like serious over-engineering to me.

Related

Comparing two large lists in java

I have to Array lists with 1000 objects in each of them. I need to remove all elements in Array list 1 which are there in Array list 2. Currently I am running 2 loops which is resulting in 1000 x 1000 operations in worst case.
List<DataClass> dbRows = object1.get("dbData");
List<DataClass> modifiedData = object1.get("dbData");
List<DataClass> dbRowsForLog = object2.get("dbData");
for (DataClass newDbRows : dbRows) {
boolean found=false;
for (DataClass oldDbRows : dbRowsForLog) {
if (newDbRows.equals(oldDbRows)) {
found=true;
modifiedData.remove(oldDbRows);
break;
}
}
}
public class DataClass{
private int categoryPosition;
private int subCategoryPosition;
private Timestamp lastUpdateTime;
private String lastModifiedUser;
// + so many other variables
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DataClass dataClassRow = (DataClass) o;
return categoryPosition == dataClassRow.categoryPosition
&& subCategoryPosition == dataClassRow.subCategoryPosition && (lastUpdateTime.compareTo(dataClassRow.lastUpdateTime)==0?true:false)
&& stringComparator(lastModifiedUser,dataClassRow.lastModifiedUser);
}
public String toString(){
return "DataClass[categoryPosition="+categoryPosition+",subCategoryPosition="+subCategoryPosition
+",lastUpdateTime="+lastUpdateTime+",lastModifiedUser="+lastModifiedUser+"]";
}
public static boolean stringComparator(String str1, String str2){
return (str1 == null ? str2 == null : str1.equals(str2));
}
public int hashCode() {
int hash = 7;
hash = 31 * hash + (int) categoryPosition;
hash = 31 * hash + (int) subCategoryPosition
hash = 31 * hash + (lastModifiedUser == null ? 0 : lastModifiedUser.hashCode());
return hash;
}
}
The best work around i could think of is create 2 sets of strings by calling tostring() method of DataClass and compare string. It will result in 1000 (for making set1) + 1000 (for making set 2) + 1000 (searching in set ) = 3000 operations. I am stuck in Java 7. Is there any better way to do this? Thanks.
Let Java's builtin collections classes handle most of the optimization for you by taking advantage of a HashSet. The complexity of its contains method is O(1). I would highly recommend looking up how it achieves this because it's very interesting.
List<DataClass> a = object1.get("dbData");
HashSet<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
return a;
And it's all done for you.
EDIT: caveat
In order for this to work, DataClass needs to implement Object::hashCode. Otherwise, you can't use any of the hash-based collection algorithms.
EDIT 2: implementing hashCode
An object's hash code does not need to change every time an instance variable changes. The hash code only needs to reflect the instance variables that determine equality.
For example, imagine each object had a unique field private final UUID id. In this case, you could determine if two objects were the same by simply testing the id value. Fields like lastUpdateTime and lastModifiedUser would provide information about the object, but two instances with the same id would refer to the same object, even if the lastUpdateTime and lastModifiedUser of each were different.
The point is that if you really want to want to optimize this, include as few fields as possible in the hash computation. From your example, it seems like categoryPosition and subCategoryPosition might be enough.
Whatever fields you choose to include, the simplest way to compute a hash code from them is to use Objects::hash rather than running the numbers yourself.
It is a Set A-B operation(only retain elements in Set A that are not in Set B = A-B)
If using Set is fine then we can do like below. We can use ArrayList as well in place of Set but in AL case for each element to remove/retain check it needs to go through an entire other list scan.
Set<DataClass> a = new HashSet<>(object1.get("dbData"));
Set<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
If ordering is needed, use TreeSet.
Try to return a set from object1.get("dbData") and object2.get("dbData") that skips one more intermediate collection creation.

Java integer pair in set [duplicate]

The following code is not giving me the result I'm expecting:
public static void main (String[] args) {
Set<Pair> objPair = new LinkedHashSet<Pair>();
objPair.add(new Pair(1, 0));
System.out.println("Does the pair (1, 0) exists already? "+objPair.contains(new Pair(1, 0)));
}
private static class Pair {
private int source;
private int target;
public Pair(int source, int target) {
this.source = source;
this.target = target;
}
}
The result will be:
Does the pair (1, 0) exists already? false
I can't understand why it's not working.
Or maybe I'm using the "contains" method wrong (or for the wrong reasons).
There is also another issue,
if I add the same value twice, it will be accepted, even being a set
objPair.add(new Pair(1, 0));
objPair.add(new Pair(1, 0));
It won't accept/recognize the class Pair I've created?
Thanks in Advance.
You need to override your hashCode and equals methods in your Pair class. LinkedHashSet (and other Java objects that use hash codes) will use them to locate and find your Pair objects.
Without your own hashCode() implementation, Java considers two Pair objects equal only if they are the exact same object and new, by definition, always creates a 'new' object. In your case, you want Pair objects to be consider equal if they have the same values for source and target -- to do this, you need to tell Java how it should test Pair objects for equality. (and to make hash maps work the way you expect, you also need to generate a hash code that is consistent with equals -- loosely speaking, that means equal objects must generate the same hashCode, and unequal objects should generate different hash codes.
Most IDEs will generate decent hashcode() and equals() methods for you. Mine generated this:
#Override
public int hashCode() {
int hash = 3;
hash = 47 * hash + this.source;
hash = 47 * hash + this.target;
return hash;
}
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final Pair other = (Pair) obj;
if (this.source != other.source) {
return false;
}
if (this.target != other.target) {
return false;
}
return true;
}

Java - map.get don't work, but element is in map?

So... all is in code:
// get vector...
SignVector v = ...;
//print to console: [1058, 5, 820 in flat]
System.out.println(v);
//size: 1
System.out.println("size: " + signs.size());
//check all elements...
for (Entry<SignVector, FakeSign> entry : signs.entrySet())
{
// get key
SignVector key = entry.getKey();
//print to console: [1058, 5, 820 in flat] (YaY! it's that key! like v)
System.out.println(key);
if (key.equals(v))
{
// print: "YaY: "
System.out.println("YaY: [1058, 5, 820 in flat]"+key);
}
}
//So second test... just get it from map: null
System.out.println(signs.get(v));
Why that return null?
In JavaDocs is written that: map.get using key.equals(k) so why my code return good object, but map.get return null?
Map:
private final Map<SignVector, FakeSign> signs = new HashMap<>()
Equals method form SignVector for #home user
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SignVector other = (SignVector) obj;
// w can't be null so I skip that
System.out.print(w.getName() + ", " + other.w.getName() + ", " + (w.getName().equals(other.w.getName()))); // this same
if (!w.getName().equals(other.w.getName()))
return false;
if (x != other.x)
return false;
if (y != other.y)
return false;
if (z != other.z)
return false;
return true;
}
But this method works good, always return that I want, x,y,z is int, and w is custom object.
The javadoc is a bit misleading, but it's relying on the fact that if you implement equals, you should also implement hashcode to be consistent. As the doc states:
Many methods in Collections Framework interfaces are defined in terms
of the equals method. For example, the specification for the
containsKey(Object key) method says: "returns true if and only if this
map contains a mapping for a key k such that (key==null ? k==null :
key.equals(k))."
This specification should not be construed to imply
that invoking Map.containsKey with a non-null argument key will cause
key.equals(k) to be invoked for any key k.
Implementations are free to
implement optimizations whereby the equals invocation is avoided, for
example, by first comparing the hash codes of the two keys. (The
Object.hashCode() specification guarantees that two objects with
unequal hash codes cannot be equal.)
More generally, implementations
of the various Collections Framework interfaces are free to take
advantage of the specified behavior of underlying Object methods
wherever the implementor deems it appropriate.
Let's take a look a the underlying implementation of get for an HashMap.
314 public V get(Object key) {
315 if (key == null)
316 return getForNullKey();
317 int hash = hash(key.hashCode());
318 for (Entry<K,V> e = table[indexFor(hash, table.length)];
319 e != null;
320 e = e.next) {
321 Object k;
322 if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
323 return e.value;
324 }
325 return null;
326 }
You see that is uses the hashcode of the object to find the possible entries in the table and THEN it uses equals to determine which value it has to return. Since the entry is probably null, the for loop is skipped and get returns null.
Override hashCode in your SignVector class to be consistent with equals and everything should work fine.
From the javadocs:
If this map permits null values, then a return value of null does not necessarily indicate that the map contains no mapping for the key; it's also possible that the map explicitly maps the key to null. The containsKey operation may be used to distinguish these two cases.
Unless you share with us how you built the map, we can't help you if this is the case. The code you shared should otherwise be working just fine.
http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#get%28java.lang.Object%29

Correct way to implement Map<MyObject,ArrayList<MyObject>>

I was asked this in interview. using Google Guava or MultiMap is not an option.
I have a class
public class Alpha
{
String company;
int local;
String title;
}
I have many instances of this class (in order of millions). I need to process them and at the end find the unique ones and their duplicates.
e.g.
instance --> instance1, instance5, instance7 (instance1 has instance5 and instance7 as duplicates)
instance2 --> instance2 (no duplicates for instance 2)
My code works fine
declare datastructure
HashMap<Alpha,ArrayList<Alpha>> hashmap = new HashMap<Alpha,ArrayList<Alpha>>();
Add instances
for (Alpha x : arr)
{
ArrayList<Alpha> list = hashmap.get(x); ///<<<<---- doubt about this. comment#1
if (list == null)
{
list = new ArrayList<Alpha>();
hashmap.put(x, list);
}
list.add(x);
}
Print instances and their duplicates.
for (Alpha x : hashmap.keySet())
{
ArrayList<Alpha> list = hashmap.get(x); //<<< doubt about this. comment#2
System.out.println(x + "<---->");
for(Alpha y : list)
{
System.out.print(y);
}
System.out.println();
}
Question: My code works, but why? when I do hashmap.get(x); (comment#1 in code). it is possible that two different instances might have same hashcode. In that case, I will add 2 different objects to the same List.
When I retrieve, I should get a List which has 2 different instances. (comment#2) and when I iterate over the list, I should see at least one instance which is not duplicate of the key but still exists in the list. I don't. Why?. I tried returning constant value from my hashCode function, it works fine.
If you want to see my implementation of equals and hashCode,let me know.
Bonus question: Any way to optimize it?
Edit:
#Override
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal()==this.getLocal()
&& guest.getCompany() == this.getCompany()
&& guest.getTitle() == this.getTitle();
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (title==null?0:title.hashCode());
result = prime * result + local;
result = prime * result + (company==null?0:company.hashCode());
return result;
}
it is possible that two different instances might have same hashcode
Yes, but hashCode method is used to identify the index to store the element. Two or more keys could have the same hashCode but that's why they are also evaluated using equals.
From Map#containsKey javadoc:
Returns true if this map contains a mapping for the specified key. More formally, returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)). (There can be at most one such mapping.)
Some enhancements to your current code:
Code oriented to interfaces. Use Map and instantiate it by HashMap. Similar to List and ArrayList.
Compare Strings and Objects in general using equals method. == compares references, equals compares the data stored in the Object depending the implementation of this method. So, change the code in Alpha#equals:
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal().equals(this.getLocal())
&& guest.getCompany().equals(this.getCompany())
&& guest.getTitle().equals(this.getTitle());
}
When navigating through all the elements of a map in pairs, use Map#entrySet instead, you can save the time used by Map#get (since it is supposed to be O(1) you won't save that much but it is better):
for (Map.Entry<Alpha, List<Alpha>> entry : hashmap.keySet()) {
List<Alpha> list = entry.getValuee();
System.out.println(entry.getKey() + "<---->");
for(Alpha y : list) {
System.out.print(y);
}
System.out.println();
}
Use equals along with hashCode to solve the collision state.
Steps:
First compare on the basis of title in hashCode()
If the title is same then look into equals() based on company name to resolve the collision state.
Sample code
class Alpha {
String company;
int local;
String title;
public Alpha(String company, int local, String title) {
this.company = company;
this.local = local;
this.title = title;
}
#Override
public int hashCode() {
return title.hashCode();
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Alpha) {
return this.company.equals(((Alpha) obj).company);
}
return false;
}
}
...
Map<Alpha, ArrayList<Alpha>> hashmap = new HashMap<Alpha, ArrayList<Alpha>>();
hashmap.put(new Alpha("a", 1, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("b", 2, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("a", 3, "t1"), new ArrayList<Alpha>());
System.out.println("Size : "+hashmap.size());
Output
Size : 2

Strange Java HashMap behavior - can't find matching object

I've been encountering some strange behavior when trying to find a key inside a java.util.HashMap, and I guess I'm missing something. The code segment is basically:
HashMap<Key, Value> data = ...
Key k1 = ...
Value v = data.get(k1);
boolean bool1 = data.containsKey(k1);
for (Key k2 : data.keySet()) {
boolean bool2 = k1.equals(k2);
boolean bool3 = k2.equals(k1);
boolean bool4 = k1.hashCode() == k2.hashCode();
break;
}
That strange for loop is there because for a specific execution I happen to know that data contains only one item at this point and it is k1, and indeed bool2, bool3 and bool4 will be evaluated to true in that execution. bool1, however, will be evaluated to false, and v will be null.
Now, this is part of a bigger program - I could not reproduce the error on a smaller sample - but still it seems to me that no matter what the rest of the program does, this behavior should never happen.
EDIT: I have manually verified that the hash code does not change between the time the object was inserted to the map and the time it was queried. I'll keep checking this venue, but is there any other option?
This behavior could happen if the hash code of the key were changed after it was inserted in to the map.
Here's an example with the behavior you described:
public class Key
{
int hashCode = 0;
#Override
public int hashCode() {
return hashCode;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Key other = (Key) obj;
return hashCode == other.hashCode;
}
public static void main(String[] args) throws Exception {
HashMap<Key, Integer> data = new HashMap<Key, Integer>();
Key k1 = new Key();
data.put(k1, 1);
k1.hashCode = 1;
boolean bool1 = data.containsKey(k1);
for (Key k2 : data.keySet()) {
boolean bool2 = k1.equals(k2);
boolean bool3 = k2.equals(k1);
boolean bool4 = k1.hashCode() == k2.hashCode();
System.out.println("bool1: " + bool1);
System.out.println("bool2: " + bool2);
System.out.println("bool3: " + bool3);
System.out.println("bool4: " + bool4);
break;
}
}
}
From the API description of the Map interface:
Note: great care must be exercised if
mutable objects are used as map keys.
The behavior of a map is not specified
if the value of an object is changed
in a manner that affects equals
comparisons while the object is a key
in the map. A special case of this
prohibition is that it is not
permissible for a map to contain
itself as a key. While it is
permissible for a map to contain
itself as a value, extreme caution is
advised: the equals and hashCode
methods are no longer well defined on
such a map.
Also, there are very specific requirements on the behavior of equals() and hashCode() for types used as Map keys. Failure to follow the rules here will result in all sorts of undefined behavior.
If you're certain the hash code does not change between the time the key is inserted and the time you do the contains check, then there is something seriously wrong somewhere. Are you sure you're using a java.util.HashMap and not a subclass of some sort? Do you know what implementation of the JVM you are using?
Here's the source code for java.util.HashMap.getEntry(Object key) from Sun's 1.6.0_20 JVM:
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
As you can see, it retrieves the hashCode, goes to the corresponding slot in the table, then does an equals check on each element in that slot. If this is the code you're running and the hash code of the key has not changed, then it must be doing an equals check which must be failing.
The next step would be for you to give us some more code or context - the hashCode and equals methods of your Key class at a minimum.
Alternatively, I would recommend hooking up to a debugger if you can. Watch what bucket your key is hashed to, and step through the containsKey check to see where it's failing.
Is this application multi-threaded? If so, another thread could change the data between the data.containsKey(k1) call and the data.keySet() call.
If equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.
For Reference:
http://www.ibm.com/developerworks/java/library/j-jtp05273.html
Perhaps the Key class looks like
Key
{
boolean equals = false ;
public boolean equals ( Object oth )
{
try
{
return ( equals ) ;
}
finally
{
equals = true ;
}
}
}

Categories

Resources