Setting key type in HashMap, how?

Setting key type in HashMap, how? - java

hi
I want to create a HashMap (java) that stores Expression, a little object i've created.
How do I choose what type of key to use? What's the difference for me between integer and String? I guess i just don't fully understand the idea behind HashMap so i'm not sure what keys to use.
Thanks!

Java HashMap relies on two things:
the hashCode() method, which returns an integer that is generated from the key and used inside the map
the equals(..) method, which should be consistent to the hash calculated, this means that if two keys has the same hashcode than it is desiderable that they are the same element.
The specific requirements, taken from Java API doc are the following:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
If you don't provide any kind of specific implementation, then the memory reference of the object is used as the hashcode. This is usually good in most situations but if you have for example:
Expression e1 = new Expression(2,4,PLUS);
Expression e2 = new Expression(2,4,PLUS);
(I don't actually know what you need to place inside your hashmap so I'm just guessing)
Then, since they are two different object although with same parameters, they will have different hashcodes. This could be or not be a problem for your specific situation.
In case it isn't just use the hasmap without caring about these details, if it is you will need to provide a better way to compute the hashcode and equality of your Expression class.
You could do it in a recursive way (by computing the hashcode as a result of the hashcodes of children) or in a naive way (maybe computing the hashcode over a toString() representation).
Finally, if you are planning to use just simple types as keys (like you said integers or strings) just don't worry, there's no difference. In both cases two different items will have the same hashcode. Some examples:
assert(new String("hello").hashCode() == new String("hello").hashCode());
int x = 123;
assert(new Integer(x).hashCode() == new Integer(123).hashCode());
Mind that the example with strings is not true in general, like I explained you before, it is just because the hashcode method of strings computes the value according to the content of the string itself.

The key is what you use to identify objects. You might have a situation where you want to identify numbers by their name.
Map<String,Integer> numbersByName = new HashMap<String,Integer>();
numbersByName.put("one",Integer.valueOf(1));
numbersByName.put("two",Integer.valueOf(2));
numbersByName.put("three",Integer.valueOf(3));
... etc
Then later you can get them out by doing
Integer three = numbersByName.get("three");
Or you might have a need to go the other way. If you know you're going to have integer values, and want the names, you can map integers to strings
Map<String,Integer> numbersByValue = new HashMap<String,Integer>();
numbersByValue.put(Integer.valueOf(1),"one");
numbersByValue.put(Integer.valueOf(2),"two");
numbersByValue.put(Integer.valueOf(3),"three");
... etc
And get it out
String three = numbersByValue.get(Integer.valueOf(3));

Keys and their associated values are both objects. When you get something from a HashMap, you have to cast it to the actual type of object it represents (we can do this because all objects in Java inherit the Object class). So, if your keys are strings and your values are Integers, you would do something like:
Integer myValue = (Integer)myMap.get("myKey");
However, you can use Java generics to tell the compiler that you're only going to be using Strings and Integers:
HashMap<String,Integer> myMap = new HashMap<String,Integer>();
See http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashMap.html for more details on HashMap.

If you do not want to look up the expressions, why do you want them to store in a map?
But if you want to, then the key is that item you use for lookup.

Related

How can I take into consideration the object itself when calculating a hash for an object in Java?

I was working on some algorithmic problems when I got to this and it seemed interesting to me. If I have two lists (so two different objects), with the same values, the hashcode is the same. After some reading, I understand that this is how it should behave. For example:
List<String> lst1 = new LinkedList<>(Arrays.asList("str1", "str2"));
List<String> lst2 = new LinkedList<>(Arrays.asList("str1", "str2"));
System.out.println(lst1.hashCode() + " " + lst2.hashCode());
...........
Result: 2640541 2640541
My purpose would be to differentiate between lst1 and lst2 in a list for example.
Is there a structure (like a HashSet for example) that takes into consideration the actual object and not only the values inside the object when calculating the hashcode for something?

Yes, you can use java's java.util.IdentityHashMap, or guava's identity hash set.
The hashes of the two lists must be equal, because the objects are equal. But the identity map and set above are based on the identity of the list objects, not their hash.

If I have two lists (so two different objects), with the same values, the hashcode is the same. After some reading, I understand that this is how it should behave.
Yes, this is part of the specification of java.util.List.
Is there a structure (like a HashSet for example) that takes into consideration the actual object and not only the values inside the object when calculating the hashcode for something?
My purpose would be to differentiate between lst1 and lst2 in a list for example
It is unclear what "in a list" means here. For example, Collection.contains() and List.equals() are defined in terms or members' equals() methods, and likewise the behavior of List.remove(Object). Although distinct objects, your two Lists will compare equal to each other, so those methods will not distinguish between them, neither directly nor as members of another list. You can always compare them for reference equality (==), however, to determine that they are not the same object despite being equals() each other.
As far as a collection that takes members' object identity into account, you could consider java.util.IdentityHashMap. Two such maps having keys and associated values that are pairwise equals() each other but not identical will not compare equals() to each other. Such sets will typically have different hash codes than each other, though that cannot be guaranteed. Note well, however, the warnings throughout the documentation of IdentityHashMap that although it implements the Map API, many of the behavioral details are inconsistent with the requirements of that interface.
Note also that
most of the above is relevant only for collections whose members are of a type that overrides equals() and hashCode(). The implementations of or inherited from Object differentiate between objects on a reference-equality basis, so the ordinary collections classes have no surprises for you there.
identical string literals are not required to represent distinct objects, so the lst1 and lst2 in your example code may in fact contain identical elements, in the reference equality sense.

Not generally in collections, because you generally want two collections with all the same items to be equal (which is why they implement it like this- equals will return true and the hash codes are the same).
You can subclass a list and have it not do that, it would just not be widely useful and would cause a lot of confusion if other programmers read your code. In that case, you'd just want equals to return the result of == and hashCode to return the integer value of the reference (the same thing that Object.equals does).

How to avoid duplicate strings in Java?

I want to be able to add specific words from a text into a vector. Now the problem is I want to avoid adding duplicate strings. The first thing that comes to my mind is to compare all strings before adding them, as the amount of entries grow, this becomes really inefficient solution. The only "time efficient" solution that I can think of is unordered_multimap container that has included in C++11. I couldn't find a Java equivalent of it. I was thinking to add strings to the map and at the end just copying all entries to the vector, in that way it would be a lot more efficient than the first solution. Now I wonder whether there is any Java library that does what I want? If not is there any C++ unordered_multimap container equivalent in Java that I couldn't find?

You can use a Set<String> Collection. It does not allow duplicates. You can choose then as implementantion:
1) HashSet if you do not care about the order of elements (Strings).
2) LinkedHashSet if you want to keep the elements in the inserting order.
3) TreeSet if you want the elements to be sorted.
For example:
Set<String> mySet = new TreeSet<String>();
mySet.add("a_String");
...
Vector is "old-fashioned" in Java. You had better avoid it.

You can use a set (java.util.Set):
Set<String> i_dont_allow_duplicates = new HashSet<String>();
i_dont_allow_duplicates.add(my_string);
i_dont_allow_duplicates.add(my_string); // wont add 'my_string' this time.
HashSet will do the job most effeciently and if you want to keep insertion order then you can use LinkedHashSet.

Use a Set. A HashSet will do fine if you do not need to preserve order. A LinkedHashSet works if you need that.

You should consider using a Set:
A collection that contains no duplicate elements. More formally, sets
contain no pair of elements e1 and e2 such that e1.equals(e2), and at
most one null element. As implied by its name, this interface models
the mathematical set abstraction.
HashSet should be good for your use:
HashSet class implements the Set interface, backed by a hash table
(actually a HashMap instance). It makes no guarantees as to the
iteration order of the set; in particular, it does not guarantee that
the order will remain constant over time. This class permits the null
element.
So simply define a Set like this and use it appropriately:
Set<String> myStringSet = new HashSet<String>();

Set<String> set = new HashSet<String>();
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

HashMap (Java) giving wrong results

I have created a hash map in which each entry corresponds to 3 values
Key object values ( which are two in number)
I have created a class ,whose object i create and store the results in a hash map
This is my code below in which i compare my incoming data with the previous values in the hash map.If the same data comes then i just increment the counter of that data. I have taken the print statements in the the for loop . though the two strings match but still my code never comes in the if loop for increment the counter.Why?
for(i=1;i<=hMap.size();i++)
{
String skey = Integer.toString(i);
if(hMap.get(skey).olddata==comingdata)
{
hMap.get(skey).counter= hMap.get(skey).counter+1;
}
}

You haven't given nearly enough information about the types involved, but, but I strongly suspect that this is the problem:
if(hMap.get(skey).olddata==comingdata)
That will be comparing references, rather than for equality, if olddata and comingdata are references of some kind. (EDIT: By the sounds of it, they're string references.)
My guess is that you want:
String skey = Integer.toString(i);
if(hMap.get(skey).olddata.equals(comingdata))
{
hMap.get(skey).counter= hMap.get(skey).counter+1;
}
Or rather more efficiently, avoiding pointless lookups:
WhateverType value = hMap.get(Integer.toString(i));
if (value.olddata.equals(comingdata))
{
value.counter++;
}
I'd also suggest that if you're always going to look up by an integer, why not use an Integer key instead of always converting the integer into a string?
Additionally, it's worth following Java naming conventions, and you should make your fields private if they're not already.
If none of this helps, please post more code. The chances of the problem being in HashMap rather than in your code are incredibly small.

It's not clear the type of olddata, but maybe you should compare the values using equals():
if (hMap.get(skey).olddata.equals(comingdata))
In Java, == is used for either comparing primitive data types for equality or comparing object types for identity. If you need to compare two object types for equality, then you must use the equals() method, which is defined for all objects since it's inherited from the Object class, being aware that you also must override equals() and hashCode() in your class, providing implementations meaningful for that class.

You don't compare objects with == in Java unless you're trying to see if they have the same reference value.
if (hMap.get(skey).olddata.equals(comingdata)) {
...
You also shouldn't be exposing olddata like that; it should be available via a getter; e.g. getOldData()

It should be
if(hMap.get(skey).olddata.equals(comingdata))

Do you actually mean comingdata.equals(hMap.get(skey).olddata)? Furthermore be aware that equals(Object) and hashCode() must be correclty implemented.

How to make class usable in different HashMaps in Java

I have a class Attribute which has 2 variables say int a,b;
I want to use class Attribute in two different HashSet.
The first hash set considers objects as equal when the value of a is same.
But the second hash set considers objects as equal when the value of b is same.
I know if I override the equals method the hashset will use the overriden version of equals to compare two objects but in this case I would need two different implementations of equals()
One way is to create two subclasses of attribute and provide them with different equals method but I want to know if there is a better way to do it such that I dont have to create subclass of Attribute.
Thanks.

One possible solution is to not use HashSet, but use TreeSet instead. It's the same Set interface, but there is a TreeSet constructor that lets you pass in a Comparator. That way you could leave the Attribute class unchanged- just create two different comparators and use it like
Set<Attribute> setA = new TreeSet<Attribute>(comparatorForA);
Set<Attribute> setB = new TreeSet<Attribute>(comparatorForB);
The comparator takes care of the equality check (e.g. if compare returns 0, the objects are equal)

Unfortunately there's no "Equalizer" class that can override the equals logic. There is such a thing for sorting, where you can either use natural sorting based on the Comparable implementation or provide your own Comparator. I've actually wondered why there's no such thing for equality checks.
Since the semantics of equality are defined by a class and could be considered a trait of that class, the two subclasses approach seems the most natural. Maybe someone knows a useful pattern for doing this in a more simple manner, but I've never encountered it.
EDIT: just thought of something... you could use two Map instances, like HashMap, with the first one using a as key and the second using b as key. It'd let you detect collisions. You could then simply link the attribute to the associated instance.

I did some thing different, Instead of using the HashSet, I have used HashMap where I have used int a as a key in first HashMap and the object is stored as value.
And in the other HashMap I have kept the key as int b and the object as value.
This provides me a way to Hash on both the variables a and b so I dont have to make any sub classes.
And also, I get O(1) time instead of O(log n). But I know I am paying the price by using some more memory but my main concern was time so I chose HashMap over TreeSet.
Thank you all for your comments and suggestions.

It would be very easy to modify HashMap and HashSet to accept hashing and equality-testing strategies.
public interface Hasher {
int hashCode(Object o);
}
public interface Equalizer {
int areEqual(Object o1, Object o2);
}

A simple solution is to bypass HashSet and use HashMap directly. For the first, store each Attribute using its a property as the key, and for the other use b.

I can propose a bit hacky but lesser effort solution :)
Swap the values of a and b when storing in second hashset so that uniqueness is defined by value of b and then when reading the class from hashset then swap the value of a and b again to retain the original state. So the same equals/hascode methods will serve the purpose.

Consistent hashcode for an object in java

In java Is it possible to get consistent hash code for an object when we are running the application multiple times

Sure. If it is a String for example, then String.hashCode() gives a consistent hashcode each time you run the application.
You only get into trouble if the hashcode incorporates something other than the values of the object's component fields; e.g. an identity hashcode. And of course, this means that the object class needs to override Object.hashcode() at some point, because that method gives you an identity hashcode.
FOLLOW UP
Judging from comments on other answers, the OP still seems to be pursuing the illusory goal of a unique hash function; i.e. some function that will map (for example) any String to a hashcode that is unique for all possible Strings.
Unfortunately this is impossible in the general case, and in this case. Furthermore, it is a simple matter to construct a proof that a String to int hash function that generates unique int values is mathematically impossible. (I won't bore you with the details ... but the basis of the proof is that there are more String values than int values.)
In fact, the only situation where such a hash function is possible is when the set of all possible values of input type has a size that is no greater than the number of possible values of the integer type. There are hash functions that will map a byte, char, short or int to a unique int, but a hash function that maps long values to unique int values is impossible.

It depends on implementation on hashCode() method of Object
It can also be
public int hashCode() {
return 1;
}

No, not for objects in general. Objects with their own hashcode method will probably be consistent across runs.

Implement/override the public int hashCode() method all objects have?

You have to decide what makes the object the same. Usually it is based on the content of one or more fields. In this case, you should make the hashCode based on these fields. (And equals())
However, I would suggest you shouldn't rely on the hashCode being the same between runs of the application. This is highly likely to break when you change code and very hard to fix when it does. e.g. if you add/remove a field which is part of the hashCode or change the way the hashCode is calculated or anything ti depends on, the hashCode will change.
What are you trying to do? This sounds like a problem where a different solution would be better.

Looking in the contract of hashCode:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the
hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then
calling the hashCode method on each of
the two objects must produce distinct
integer results. However, the
programmer should be aware that
producing distinct integer results for
unequal objects may improve the
performance of hashtables.
So it is not guaranteed that the hashCode is euqal between invocations. In reality, there are be quite some hashCode implementations that return the same value across invocations: String and all types used for boxing (like Integer) have a consistent return value for hashCode. Objects that only combine member hashCodes where each member has a consistent return value also feature this consistency. So, in practice it should be rather common to have a hashCode return value that is consistent accross invocations.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.