Improve if statement in Loop through Hashmap - java

I've created a loop which goes through my HashMap. Then I check whether the name of the current key (A) is equal to the key that might be added (B). The hash codes of key A and key B aren't necessarily equal when their names are. Therefore I check whether they are equal by transforming them into a string (with override of .equals()). The code is working but there most be a cleaner and easier way to do this.
This is my current code:
for (HashMap.Entry<Identifier, SetInterface<BigInteger>> entry : idAndSet.entrySet()) {
if (entry.getKey().isEqual(identifier)) {
factor = entry.getValue();
return factor;
}
}

The hash codes of key A and key B aren't necessarily equal when their names are.
That is not a good idea. Any class that acts as a key should override equals and hashCode. And it would be a good idea to make the class immutable as well (otherwise you could end up with some difficult debugging to do).
Once you do that, you can just do
Map<Indentifer, Object> map...;
Object value = map.get(id);
// or as of Java 8+
Object value = map.getorDefault(id, someDefaultValue);

You could misuse the Map.computeIfPresent method.
factor = map.computeIfPresent(identifier, (k,v) -> v);
return factor;
The method returns the value associated with the specified key, or null if none

Related

How to implement own hashing function for strings?

So this is the default algorithm that generates the hashcode for Strings:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.
How do I make it so that it uses the algorithm I created, instead of using the default one when I use the put() method for hashtables?
As of now I don't know what to do other than implementing a hash table from scratch.
Create a new class, and use String type field in it. For example:
public class MyString {
private final String value;
public MyString(String value) {
this.value = value;
}
public String getValue() {
return value;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MyString myString = (MyString) o;
return Objects.equals(value, myString.value);
}
#Override
public int hashCode() {
// use your own implementation
return value.codePoints().sum();
}
}
Add equals() and hashCode() methods with #Override annotation.
Note: here hashCode() operates only with ASCII values.
After that, you will be able to use new class objects in the desired data structure. Here you can find a detailed explanation of these methods and a contract between equals() and hashCode().
However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.
This is an extremely bad idea if you care at all about hash table efficiency. What you're thinking of as an overly-complicated hashing function is actually designed to give a uniform distribution of hash values throughout the entire 32-bit (or whatever) range. That gives the best possibility of uniformly distributing the hash keys (after you mod by the hash table size) in your buckets.
Your simple method of adding up the ASCII values of the individual characters has multiple flaws. First, you're limited in the range of values you can reasonably expect to generate. The highest value you can create is 255*n, where n is the length of the key. If your key is 10 characters in length, then you can't possibly generate more than 2,550 unique hash values. But there are 255^10 possible 10-character strings. Your collision rate will be very high.
The second problem is that anagrams generate the same hash value. "stop," "spot," and "tops" all generate the same hash value and will hash to the same bucket. Again, this will greatly affect your collision rate.
It's unclear to me why you want to replace the hashing function. If you're thinking it will result in better performance, you should think again. Sure, it will make generating the hash value faster, but it will result in very skewed key distribution, and correspondingly terrible hash table performance.

HashMap: Is there a way to search by hashcode and not by key?

I may sound completely wrong but:
I've read as many posts as I could find here about HashMap and hashcode . I did not get what I was looking for exactly. I am going to try to be as precise as I can.
Let's say I have a huge hashmap where :
keys are of type my_struct and values as well
Now, my_struct is made of 2 Lists that can have big sizes (so one entry has a respectable size on its own).
Keys and values have a special relation : values are for sure already keys in the dictionary (something like ancenstor - descendant relation).
I was wondering whether instead of storing values of my_struct , I could store an int and then using this "int as a key" to search for the relative entry. In pseudocode I could describe it as such:
HashMap<my_struct, int> h = new HashMap<>();
......
my_struct descendant = value;
int id = a(value); // returns an id for this value
h.put(ancenstor, id);
...
// after some time I want to find the corresponding value of id
int key = h.getValue(ancestor); // == id
if(h.contains(b(key)){
...
}
So basically I am looking for :
a method : a() that turns a mystruct -->int
a method : b() that turns an int ---> my struct
Of course, both should be a 1-1 functions.
After reading Java8 documentation a() must be int hashCode() but what about b() is there something in Java?
HashMap: Is there a way to search by hashcode and not by key?
Literally ... no.
OK, so I am assuming that this is a complete and accurate description of your real problem:
So basically I am looking for :
a method : a() that maps a my_struct --> int
a method : b() that maps an int --> my_struct
Of course, both should be a 1-1 functions.
Got that.
After reading Java8 documentation a() must be int hashCode()
That is incorrect. hashCode() is not 1-1. In general, multiple objects can have the same hashCode(). Note that even identity hashcodes (as returned by Object.hashCode) are not guaranteed unique.
You could implement a my_struct.hashCode method that returns a unique integer, but the only practical way to do this would be to allocate a unique number when you create each my_struct instance, and store it in a field of the object. And that has the problem that your my_struct.equals method has to return true if and only if the my_struct instances are the same instance.
But if you can live with those limitations, then a() can indeed be my_struct.hashCode.
If you generate the numbers for the my_struct objects sequentially starting at zero, you can add all of the my_struct instances to an ArrayList<my_struct> as you create them, then you can implement b() as theList.get(int).
In code (not thread-safe!!):
public class my_struct {
private static int next = 0;
private static List<my_struct> all = new ArrayList<>();
private int ordinal;
// other fields
public my_struct(...) {
// initialize fields
this.ordinal = next++;
all.add(this);
}
public boolean equals(Object other) {
return other instanceof my_struct &&
((my_struct) other).ordinal = this.ordinal;
}
public int hashCode() {
return ordinal;
}
public static my_struct(int ordinal) {
return all.get(ordinal);
}
}
But you should also be able to see that you don't have to use the ordinal as the hashcode and implement hashCode() and equals(Object) as above. It depends on what else you are doing with these structs.
Note that this is not the same as using an IdentityHashMap.
You might take a look at IdentityHashMap which does not rely on equals and only hashes on the identify hashCode itself. This means that you can have duplicate keys that might otherwise be considered equal. So keys are only considered equal if they are the same reference.
Check out IdentityHashMap in the JavaDoc
Since you are in charge of assigning ids, you can fake the other side of the relationship by keeping my_struct objects in an array list, and using their array index as the id.
This way, mapping from my_struct to id becomes a hash lookup, while mapping from id to my_struct becomes an array list look-up.
List<my_struct> mystructs = new ArrayList<my_struct>();
... add items to mystructs
Map<my_struct,Integer> toId = new HashMap<my_struct,Integer>();
for (int i = 0 ; i != mystructs.size() ; i++) {
toId.put(mystructs.get(i), i);
}
Going from my_struct to id:
int id = toId.get(my_struct_object);
going from id to my_struct:
my_struct my_struct_object = mystructs.get(id);

Why does HashSet allow equal items if hashcodes are different?

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:
Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.
In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
public class BadHashCodeClass {
/**
* A hashcode that will randomly return an integer, so it is unlikely to be the same
*/
#Override
public int hashCode(){
return new Random().nextInt();
}
/**
* An equal method that will always return true
*/
#Override
public boolean equals(Object o){
return true;
}
public static void main(String... args){
HashSet<BadHashCodeClass> hashSet = new HashSet<>();
BadHashCodeClass instance = new BadHashCodeClass();
System.out.println("Instance was added: " + hashSet.add(instance));
System.out.println("Instance was added: " + hashSet.add(instance));
System.out.println("Elements in hashSet: " + hashSet.size());
Iterator<BadHashCodeClass> iterator = hashSet.iterator();
BadHashCodeClass e = iterator.next();
BadHashCodeClass e2 = iterator.next();
System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
}
The results from the main method are:
Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true
As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).
I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?
I think what you're really trying to ask is:
"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"
The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.
I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.
The equals() documentations states (emphasis added):
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.
If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}
HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.
Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:
public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}
In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.
Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.
Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?
You've violated the contract of equals/hashCode basically:
From the hashCode() docs:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
and from equals:
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." If the two methods are not implemented consistently, all bets are off.
This shouldn't happen in real code, because you shouldn't be violating the contract in real code...
#Override
public int hashCode(){
return new Random().nextInt();
}
You are returning different has codes for same object every time it is evaluated. Obviously you will get wrong results.
add() function is as follows
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
and put() is
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
If you notice first has is calculated which is different in your case which is why object is added. equals() comes into picture only if hash are same for objects i.e collision has occured. Since in case hash are different equals() is never executed
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
Read more on what short circuiting is. since e.hash == hash is false nothing else is evaluated.
I hope this helps.
because hashcode() is really implemented very badly,
it will try to equate in each random bucket on each add(), if you return constant value from hashcode() it wouldn't let you enter any
It is not required that hash codes be different for all elements! It is only required that two elements are not equal.
HashCode is used first to find the hash bucket the object should occupy. If hadhcodes are different, objects are assumed to be not equal. If hashcodes are equal, then the equals() method is used to determine equality. The use of hashCode is an efficiency mechanism.
And...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change.

Hash table Java insert

I am new to Java and I am trying to learn about hash tables. I want to insert objects into my hash table and then be able to print all the objects from the hash table at the end. I am not sure I am doing doing this right because I have read that I need to override the get() method or hashCode() method but I am not sure why.
I am passing in String objects of student names. When I run the debugger after my inserts, it shows the key as "null" and the indexes of my inserts are at random places in the hash table. Ex. 1, 6, 10
This is how I have been adding. Can anyone tell me if this is correct and do I actually need to override things?
Thanks in advance!
CODE
Hashtable<String,String> hashTable=new Hashtable<String,String>();
hashTable.put("Donald", "Trump");
hashTable.put("Mike", "Myers");
hashTable.put ("Jimmer", "Markus");
You are doing things correctly. Remember, a Hashtable is not a direct-access structure. You can't "get the third item from a Hashtable", for example. There is no real meaning to the term "index" when you're talking about a Hashtable: numerical indexes of items mean nothing.
A Hashtable guarantees that it will hold key-value pairs for you, in a way that it will be very fast to conclude a value based on a key (for example: given Donald, you will get Trump very quickly). Of course, certain conditions have to be fulfilled for this to work right, but for your simple String-to-String example, that works.
You should read more about hash tables in general, to see how they really work behind the scenes.
EDIT (as per OP's request): you are asking about storing Student instances in your Hashtable. As I mentioned above, certain conditions have to be addressed for a Hashtable to work correctly. Those conditions are concerning the key part, not the value part.
If your Student instance is the value, and a simple String is the key, then there's nothing special for you to do, because the String primitive already answers all of the conditions required for a proper Hashtable key.
If your Student instance is the key, then the following conditions must be met:
Inside Student, you must override the hashCode method in such a way that subsequent invocations of hashCode will return exactly the same value. In other words, the expression x.hashCode() == x.hashCode() must always be true.
Inside Student, you must override the equals method in such a way that it will only return true for two identical instances of Student, and return false otherwise.
These conditions are enough for Student to function as a proper Hashtable key. You can further optimize things by writing a better hashCode implementation (read about it... it's quite long to type in here), but as long as you answer the aforementioned two, you're good to go.
Example:
class Student {
private String name;
private String address;
public int hashCode() {
// Assuming 'name' and 'address' are not null, for simplification here.
return name.hashCode() + address.hashCode();
}
public boolean equals (Object other) {
if (!(other instanceof Student) {
return false;
}
if (other == this) {
return true;
}
Student otherStudent = (Student) other;
return name.equals(otherStudent.name) && address.equals(otherStudent.address);
}
}
Try this code:
Hashtable<String,String> hashTable=new Hashtable<String,String>();
hashTable.put("Donald", "16 years old");
hashTable.put("Mike", "20 years old");
hashTable.put ("Jimmer", "18 years old");
Enumeration studentsNames;
String str;
// Show all students in hash table.
studentsNames = hashTable.keys();
while(studentsNames.hasMoreElements()) {
str = (String) studentsNames.nextElement();
txt.append("\n"+str + ": " + hashTable.get(str));
}

Key existence check in HashMap

Is checking for key existence in HashMap always necessary?
I have a HashMap with say a 1000 entries and I am looking at improving the efficiency.
If the HashMap is being accessed very frequently, then checking for the key existence at every access will lead to a large overhead. Instead if the key is not present and hence an exception occurs, I can catch the exception. (when I know that this will happen rarely). This will reduce accesses to the HashMap by half.
This might not be a good programming practice, but it will help me reduce the number of accesses. Or am I missing something here?
[Update] I do not have null values in the HashMap.
Do you ever store a null value? If not, you can just do:
Foo value = map.get(key);
if (value != null) {
...
} else {
// No such key
}
Otherwise, you could just check for existence if you get a null value returned:
Foo value = map.get(key);
if (value != null) {
...
} else {
// Key might be present...
if (map.containsKey(key)) {
// Okay, there's a key but the value is null
} else {
// Definitely no such key
}
}
You won't gain anything by checking that the key exists. This is the code of HashMap:
#Override
public boolean containsKey(Object key) {
Entry<K, V> m = getEntry(key);
return m != null;
}
#Override
public V get(Object key) {
Entry<K, V> m = getEntry(key);
if (m != null) {
return m.value;
}
return null;
}
Just check if the return value for get() is different from null.
This is the HashMap source code.
Resources :
HashMap source code Bad one
HashMap source code Good one
Better way is to use containsKey method of HashMap. Tomorrow somebody will add null to the Map. You should differentiate between key presence and key has null value.
Do you mean that you've got code like
if(map.containsKey(key)) doSomethingWith(map.get(key))
all over the place ? Then you should simply check whether map.get(key) returned null and that's it.
By the way, HashMap doesn't throw exceptions for missing keys, it returns null instead. The only case where containsKey is needed is when you're storing null values, to distinguish between a null value and a missing value, but this is usually considered bad practice.
Just use containsKey() for clarity. It's fast and keeps the code clean and readable. The whole point of HashMaps is that the key lookup is fast, just make sure the hashCode() and equals() are properly implemented.
if(map.get(key) != null || (map.get(key) == null && map.containsKey(key)))
You can also use the computeIfAbsent() method in the HashMap class.
In the following example, map stores a list of transactions (integers) that are applied to the key (the name of the bank account). To add 2 transactions of 100 and 200 to checking_account you can write:
HashMap<String, ArrayList<Integer>> map = new HashMap<>();
map.computeIfAbsent("checking_account", key -> new ArrayList<>())
.add(100)
.add(200);
This way you don't have to check to see if the key checking_account exists or not.
If it does not exist, one will be created and returned by the lambda expression.
If it exists, then the value for the key will be returned by computeIfAbsent().
Really elegant! 👍
I usually use the idiom
Object value = map.get(key);
if (value == null) {
value = createValue(key);
map.put(key, value);
}
This means you only hit the map twice if the key is missing
If key class is your's make sure the hashCode() and equals() methods implemented.
Basically the access to HashMap should be O(1) but with wrong hashCode method implementation it's become O(n), because value with same hash key will stored as Linked list.
The Jon Skeet answer addresses well the two scenarios (map with null value and not null value) in an efficient way.
About the number entries and the efficiency concern, I would like add something.
I have a HashMap with say a 1.000 entries and I am looking at improving
the efficiency. If the HashMap is being accessed very frequently, then
checking for the key existence at every access will lead to a large
overhead.
A map with 1.000 entries is not a huge map.
As well as a map with 5.000 or 10.000 entries.
Map are designed to make fast retrieval with such dimensions.
Now, it assumes that hashCode() of the map keys provides a good distribution.
If you may use an Integer as key type, do it.
Its hashCode() method is very efficient since the collisions are not possible for unique int values :
public final class Integer extends Number implements Comparable<Integer> {
...
#Override
public int hashCode() {
return Integer.hashCode(value);
}
public static int hashCode(int value) {
return value;
}
...
}
If for the key, you have to use another built-in type as String for example that is often used in Map, you may have some collisions but from 1 thousand to some thousands of objects in the Map, you should have very few of it as the String.hashCode() method provides a good distribution.
If you use a custom type, override hashCode() and equals() correctly and ensure overall that hashCode() provides a fair distribution.
You may refer to the item 9 of Java Effective refers it.
Here's a post that details the way.
Since java 1.8, you can simply use:
var item = mapObject.getOrDefault(key, null);
if(item != null)

Categories

Resources