JVM optimisation of hashCode() on List

JVM optimisation of hashCode() on List - java

Imagine a simple case:
class B{
public final String text;
public B(String text){
this.text = text;
}
}
class A {
private List<B> bs = new ArrayList<B>;
public B getB(String text){
for(B b :bs){
if(b.text.equals(text)){
return b;
}
}
return null;
}
[getter/setter]
}
Imagine that for each instance of A, the List<B> is large and we need to call getB(String) often. However assume that it is also possible for the list to change (add/remove element, or even being reassigned).
At this stage, the average complexity for getB(String) is O(n). In order to improved that I was wondering if we could use some clever caching.
Imagine we cache the List<B> in a Map<String, B> where the key is B.text. That would improve the performance but it won't work if the list is changed (new element or deleted element) or reassigned (A.bs points to a new reference).
To go around that I thought that, along with the Map<String, B>, we could store a hash of the list bs. When we call getB(String) method, we compute the hash of the list bs. If the hash hasn't changed, we fetch the result from the map, if it has we reload the map.
The problem is that computing the hash for a java.util.List goes through all the element of the list and computes their hash, which is at least O(n).
Question
What I'd like to know is whether the JVM will be faster at computing the hash for the List than going through my loop in the getB(String) method. May be that depends on the implementation of hash for B. If so what kind of things could work? In a nutshell, I'd like to know whether this is stupid or could bring some performance improvement.

Without actually explaining why, you seem for some reason to believe that it is essential to keep the list structure as well. The only reasonable reason for this is that you need the order of the collection to be kept consistent. If you switch to a "plain" map, the order of the values is no longer constant, e.g. kept in the order in which you add the items to the map.
If you need both to keep the order (list behaviour) and access individual items using a key, you can use a LinkedHashMap, which essentially joins the behaviour of a LinkedList and a HashMap. Even if LinkedHashMap.values() returns a collection and not a list, the list behaviour is guaranteed within the collection.
Another issue with your question is, that you cannot use the list's hash code to safely determine changes. If the hash code has changed, you are indeed sure that the list has changed as well. If two hash codes are identical, you can still not be sure that the lists are actually identical. E.g. if the hash code implementation is based on strings, the hash codes for "1a" and "2B" are identical.

If so what kind of things could work?
Simply put: don't let anything else mutate your list without you knowing about it. I suspect you currently have something like:
public List<String> getAllBs() {
return bs;
}
... and a similar setter. If you stop doing that, and instead just have appropriate mutation methods, then you can make sure that your code is the only code to mutate the list... which means you can either remember that your map is "dirty" or just mutate the map at the same time that you mutate the list.

You could implement your own class IndexedBArrayList which extends ArrayList<B>.
Then you add this functionality to it:
A private HashMap<String, B> index
All mutator methods of ArrayList are overridden to keep this index hash map updated in addition to calling the corresponding super-method.
A new public B getByString(String) method which uses the hash map

From your description it does not seem that you need a List<B>.
Replace the List with a HashMap. If you need to search for Bs the best data structure is the hashmap and not the list.

Related

ArrayMap put method pushes elements in strange order

I am using ArrayMap for first time in my project and I thought it works just like an array. I expected when I use .put method it inserts it at next index.
But in my case this is not true - after I added all elements one by one the first element I added ended up at index 4 which is kind of strange.
Here are the first three steps which I add elements:
1 - Salads:
2 - Soups:
3 - Appetizers:
So somehow on second step "Soup" element was inserted in index 0 instead of 1 as I was expecting, but strangely on third step "Appetizers" was inserted as expected after "Soup".
This is the code I am using to push key and value pair:
function ArrayMap<String, DMType> addElement(String typeKey, DMType type) {
ArrayMap<String, DMType> types = new ArrayMap<>();
types.put(typeKey, type);
return types;
}
Am I missing something about the behavior of ArrayMap?

Yeah it is misleading because of the name but ArrayMap does no gurantee order unlike arrays.
ArrayMap is a generic key->value mapping data structure that is
designed to be more memory efficient than a traditional HashMap.
ArrayMap is actually a Map:
public class ArrayMap extends SimpleArrayMap implements Map
If you want the Map functionality with order guranteed use LinkedHashMap instead.
LinkedHashMap defines the iteration ordering, which is normally the
order in which keys were inserted into the map (insertion-order).
documentation

I thought it works just like an array
No, it works like a map, because it is a map. It is similar to a HashMap, but more memory efficient for smaller data sets.
It's order shouldn't and doesn't matter. Under the hood, it is implemented using
an array which has an order since arrays do. This inherently gives the ArrayMap an order, but that is not part of it's API anyway. Just like which memory slot your Java objects are in, you shouldn't care about the order here either.

It doesn't work as an array, I don't see Array in the name but Map and the documentation clearly states that behaves as a generic key->value mapping, more efficient (memory wise) than traditional HashMap implementation.
Actually I don't see why you care about the order compared to the insertion one. Data is private inside the class and you have no way to obtain the element by the index, so you are basically wondering about a private implementation which is irrelevant for its usage.
If you really want to understand how it stores its data you should take a look at the source code.

ArrayMap does NOT work like an Array, instead, it works like a HashMap with performance optimizations.
The internal sequence of the key-value pair is not guaranteed as it is NOT part of the contract.
In your case, what you really want to use is probably an ArrayList<Element>, where the Element class is defined like this:
public class Element{
private final String typeKey;
private final DMType type;
public Element(String typeKey, DMType type){
this.typeKey = typeKey;
this.type = type;
}
}
If you don't want a new Class just to store the result, and you want to keep the sequence, you can use a LinkedHashMap<String, DMType>. As the document specifies:
Class LinkedHashMap
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map. (A key k is reinserted into a map m if m.put(k, v) is invoked when m.containsKey(k) would return true immediately prior to the invocation.)

Can I change the inner structure of objects in a HashTable while iterating over it?

Like, the title says. Can I change the inner structure of objects in a HashTable while iterating over its keys? I know I cant change the Map itself, or at least that it is risky to do so, but despite google searches I haven't found any clear or simple answer as to whether or not it is ok to change the attributes of the objects themselves in the hashmap. My gut feeling says no, since this would probably change the hash, but it would be good to know for certain. I am also interested in replacing the value for the keys while iterating over them. Is this possible?
Apologies if this has been answered a lot of times before.
To be short, will these two methods work as expected?
public class Manager {
private Hashtable<MyClassA, BufferedImage> ht1;
private Hashtable<MyClassB, JSlider> ht2;
private Image method1() {
for(MyClassB mcb: ht2.keySet()){
mcb.calculateStuff(ht2.get(mcb).getValue());
//CalculateStuff() doesnt change anything, but if it takes long, the JSliders might be
//changed by the user or a timer, resulting in a new hashCode(), and potentially problems.
}
}
private void method2(){
for(MyClassA mca: ht1.keySet()){
mca.changeInnerStructureOfA(); //Changes the fields of the object mca.
ht1.put(mca.calculateNewImage());
}
}

It is not allowed to mutate keys of a hash-based container in any situation, not only while iterating over the container. The reason for this is that any mutation that changes the value of hash function leaves your container in an invalid state, when the hashed key is sitting in the hash bucket that does not correspond to the key's hash value.
This is the reason behind a strong recommendation of using only immutable classes as keys in hash-based containers.
I am also interested in replacing the value for the keys while iterating over them. Is this possible?
No, this is not possible. In order to replace a key in a container with another key you need to remove the item first, and then re-insert it back with the new key. This, however, would trigger concurrent modification exception.
If you need to replace a significant number of keys, the best approach would be making a new hash container, and populate it with key-vale pairs as you iterate the original container.
If you need to replace only a small number of keys, make a list of objects describing the change (old key, new key, value), populate the list as you iterate then original container, and then walk the list of changes to make the alterations to the original container.

Is it possible/required to speed up HashMap operations on same entry?

Suppose I wish to check HashMap entry and then replace it:
if( check( hashMap.get(key) ) ) {
hashMap.put(key, newValue);
}
this will cause search procedure inside HashMap to run two times: once while get and another one while put. This looks ineffective. Is it possible to modify value of already found entry of Map?
UPDATE
I know I can make a wrapper and I know I have problems to mutate entry. But the question is WHY? May be HashMap remembers last search to improve repeated one? Why there are no methods to do such operation?

EDIT: I've just discovered that you can modify the entry, via Map.Entry.setValue (and the HashMap implementation is mutable). It's a pain to get the entry for a particular key though, and I can't remember ever seeing anyone do this. You can get a set of the entries, but you can't get the entry for a single key, as far as I can tell.
There's one evil way of doing it - declare your own subclass of HashMap within the java.util package, and create a public method which just delegates to the package-private existing method:
package java.util;
// Please don't actually do this...
public class BadMap<K, V> extends HashMap<K, V> {
public Map.Entry<K, V> getEntryPublic(K key) {
return getEntry(key);
}
}
That's pretty nasty though.
You wouldn't normally modify the entry - but of course you can change data within the value, if that's a mutable type.
I very much doubt that this is actually a performance bottleneck though, unless you're doing this a heck of a lot. You should profile your application to prove to yourself that this is a real problem before you start trying to fine-tune something which is probably not an issue.
If it does turn out to be an issue, you could change (say) a Map<Integer, String> into a Map<Integer, AtomicReference<String>> and use the AtomicReference<T> as a simple mutable wrapper type.

Too much information for a comment on your question. Check the documentation for Hashmap.
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
Constant time means that it always requires the same amount of time to do the get and put operations [O(1)]. The amount of time that is going to be required is going to be linear based on how many times you need to loop [O(n)].

You can change the entry if it is mutable. One example of where you might do this is
private final Map<String, List<String>> map = new LinkedHashMap<>();
public void put(String key, String value) {
List<String> list = map.get(key);
if (list == null)
map.put(key, list = new ArrayList<>());
list.add(value);
}
This allows you to update a value, but you can't find and replace a value in one operation.

Take a look at trove ( http://trove4j.sourceforge.net/ ), their maps do have several methods that might be what you want:
adjustOrPut
putIfAbsent
I don't know how this is implemented internally, but i would guess that since trove is made to be highly performant, there will be only one lookup.

Java Collection For Ensuring Uniqueness While Providing References

I have a structure like this:
public class Foo
{
public int A ;
public int B ;
public int C ;
}
I need to add these to a collection one-by-one in such a way that I end up with no more than one copy where A, B, and C are all equal. I also need references to the objects for another class, like this:
public class Bar
{
public Foo A ;
public Foo B ;
public Foo C ;
}
I tried using a TreeSet < Foo >, which worked to ensure uniqueness, but I cannot get a reference back out of a TreeSet (only a boolean of whether or not it is/was in the set), so I can't pass that reference on to Bar. I tried using a TreeMap < Foo , Integer > along with an ArrayList < Foo >, and that works to ensure uniqueness and to allow me to get references to the objects, but it wastes a massive amount of time and memory to maintain the ArrayList and the Integers.
I need a way to say "If this Foo is not yet in the collection, add it; Otherwise, give me the Foo already in the collection instead of the one I created to check for its presence in the collection.".
(It just occurred to me that I could do something like TreeMap < Foo , Foo >, and that would do what I want, but it still seems like a waste, even though it's nowhere near as much of one, so I'll continue with this question in hope of enlightenment.)
(And yes, I did implement Comparable to do the uniqueness-check in the trees; That part works already.)

I would use e.g. a TreeMap<Foo, Foo> object. When you put a new Foo in the map, specify it as both the key and the value. This lets you use get to return the Foo already in the collection. Note that you have to handle the case of a Foo already being in the map yourself.

A solution in Sorted collection in Java by Neil Coffey gave what I need, which is using ArrayList < Foo > and always doing Collections . binarySearch to get either the index of the element already in the list, or the point at which the element should be inserted into the list.
This maintains a constantly-sorted list at O(log n) time like a tree, but allows retrieval of existing instances at the same time. Unfortunately, it has O(n) insertion time, but that isn't the end of the world in this case, though it's still suboptimal.

To ensure uniqueness in a Set, you need to over-ride equals() and hashcode() so that two instances of Foo with the same A,B,C are .equals().
Ideally, anything you put in a Set should be immutable (i.e. your three ints should be final. From the documentation:
Great care must be exercised if mutable objects are used as set
elements. The behavior of a set is not specified if the value of an
object is changed in a manner that affects equals comparisons while
the object is an element in the set.
Unfortunately, Set doesn't provide any method that allows you to get the actual instance - you would need a Map or another collection as you have already tried.
Update another approach would be to create your own modified version of TreeSet based on the JDK source code to add a method to obtain the instance you need (extending the standard TreeSet won't do what you need because the relevant fields are private, unless you use reflection to access them).

Apparently a TreeList is based on a TreeMap thus making this approach redundant, but I thought I'd just comment on it anyway for completeness.
If a copy of a Foo object exists in the TreeList (e.g. as returned by contains) then you can retrieve the copy using the tailSet and first methods.

Duplicate values in the Set collection?

Is it possible to allow duplicate values in the Set collection?
Is there any way to make the elements unique and have some copies of them?
Is there any functions for Set collection for having duplicate values in it?

Ever considered using a java.util.List instead?
Otherwise I would recommend a Multiset from Google Guava (the successor to Google Collections, which this answer originally recommended -ed.).

The very definition of a Set disallows duplicates. I think perhaps you want to use another data structure, like a List, which will allow dups.
Is there any way to make the elements unique and have some copies of them?
If for some reason you really do need to store duplicates in a set, you'll either need to wrap them in some kind of holder object, or else override equals() and hashCode() of your model objects so that they do not evaluate as equivalent (and even that will fail if you are trying to store references to the same physical object multiple times).
I think you need to re-evaluate what you are trying to accomplish here, or at least explain it more clearly to us.

From the javadocs:
"sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element"
So if your objects were to override .equals() so that it would return different values for whatever objects you intend on storing, then you could store them separately in a Set (you should also override hashcode() as well).
However, the very definition of a Set in Java is,
"A collection that contains no
duplicate elements. "
So you're really better off using a List or something else here. Perhaps a Map, if you'd like to store duplicate values based on different keys.

Sun's view on "bags" (AKA multisets):
We are extremely sympathetic to the desire for type-safe collections. Rather than adding a "band-aid" to the framework that enforces type-safety in an ad hoc fashion, the framework has been designed to mesh with all of the parameterized-types proposals currently being discussed. In the event that parameterized types are added to the language, the entire collections framework will support compile-time type-safe usage, with no need for explicit casts. Unfortunately, this won't happen in the the 1.2 release. In the meantime, people who desire runtime type safety can implement their own gating functions in "wrapper" collections surrounding JDK collections.
(source; note it is old and possibly obsolete -ed.)
Apart from Google's collections API, you can use Apache Commons Collections.
Apache Commons Collections:
http://commons.apache.org/collections/
Javadoc for Bag

I don't believe that you can have duplicate values within a set. A set is defined as a collection of unique values. You may be better off using an ArrayList.

These sound like interview questions, so I'll answer them like interview questions...
Is it possible to allow duplicate values in the Set collection?
Yes, but it requires that the person implementing the Set violate the design contract upon which Set is built. Basically, I could write a class that extends Set and doesn't enforce Set's promises.
In addition, other violations are possible. I could use a Set implementation that relies upon Java's hashCode() contract. Then if I provided an Object that violates Java's hashcode contract, I might be able to place two objects into the set which are equal, but yeild different hashcodes (because they might not be checked in equality against each other due to being in different hash bucket chains.
Is there any way to make the elements unique and have some copies of them?
It basically depends on how you define uniqueness. If an object's uniqueness is determined by its value, then one can have multiple copies of the same unique object; however, if the object's uniqueness is determined by its instance, then by definition it would not be possible to have multiple copies of the same object. You could however have multiple references to them.
Is there any functions for Set collection for having duplicate values in it?
The Set interface doesn't have any functions for detecting / reporting duplicates; however, it is based on the Collections interface, which has to support the List interface, so it is possible to pass duplicates into a Set; however, a properly implemented Set will just ignore the duplicates, and present one copy of every element determined to be unique.

I don't think so. The only way would be to use a List. You can also trick with function equals(), hashcode() or compareTo() but it is going to be ankward.

NO chance.... you can not have duplicate values in SET interface...
If you want duplicates then you can try Array-List

As mentioned choose the right collection for the task and likely a List will be what you need. Messing with the equals(), hashcode() or compareTo() to break identity is generally a bad idea simply to wedge an instance into the wrong collection to start with. Worse yet it may break code in other areas of the application that depend on these methods producing valid comparison results and be very difficult to debug or track down such errors.

This question was asked to me also in an interview. I think the answer is, ofcourse Set will not allow duplicate elements and instead ArrayList or other collections should be used for the same, however overriding equals() for the type of the object being stored in the set will allow you to manipulate on the comparison logic. And hence you may be able to store duplicate elements in the Set. Its more of a hack, which would allow non-unique elements in the Set and ofcourse is not recommended in production level code.

You can do so by overriding hashcode as given below:
public class Test
{
static int a=0;
#Override
public int hashCode()
{
a++;
return a;
}
public static void main(String[] args)
{
Set<Test> s=new HashSet<Test>();
Test t1=new Test();
Test t2=t1;
s.add(t1);
s.add(t2);
System.out.println(s);
System.out.println("--Done--");
}
}

Well, In this case we are trying to break the purpose of specific collection. If we want to allow duplicate records simply use list or multimap.

Set will store unique values and if you wants to store duplicate values then for list,but still if you want duplicate values in set then create set of ArrayList so that you can put duplicate elements into it.
Set<ArrayList> s = new HashSet<ArrayList>();
ArrayList<String> arr = new ArrayList<String>();
arr.add("First");
arr.add("Second");
arr.add("Third");
arr.add("Fourth");
arr.add("First");
s.add(arr);

You can use Tree Map instead :
Key can be used as element you wish to store
and Value will be the frequency of input element.
The insertion and removal will require custom handling.
Insertion : Check if the map already contains the element , if yes then increment its frequency. O(log N)
Removal : if the element's frequency is 1 then remove it , else decrease frequency by 1. O(log N)
More details can be found in the java docs of tree map
Overall time complexity will remain same as TreeSet O(log N) but worse than a HashSet O(1)
firstEntry() -> provides smallest element entry, Time Complexity : O(Log N)
lastEntry() -> provides greatest element entry, Time Complexity : O(Log N)

public class SET {
public static void main(String[] args) {
Set set=new HashSet();
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
Iterator it=set.iterator();
while(it.hasNext()){
Object o=it.next();
System.out.println(o);
}
}
}
public class AB{
int id;
String email;
public AB() {
System.out.println("DC");
}
AB(int id,String email){
this.id=id;
this.email=email;
}
#Override public String toString() {
// TODO Auto-generated method stub return ""+id+"\t"+email;}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.