String IdentityHashMap vs HashMap performance

String IdentityHashMap vs HashMap performance - java

Identity HashMap is special implementation in Java which compares the objects reference instead of equals() and also uses identityHashCode() instead of hashCode(). In addition, it uses linear-probe hash table instead of Entry list.
Map<String, String> map = new HashMap<>();
Map<String, String> iMap = new IdentityHashMap<>();
Does that mean for the String keys IdentifyHashMap will be usually faster if tune correctly ?
See this example:
public class Dictionary {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("/usr/share/dict/words"));
String line;
ArrayList<String> list = new ArrayList<String>();
while ((line = br.readLine()) != null) {
list.add(line);
}
System.out.println("list.size() = " + list.size());
Map<String, Integer> iMap = new IdentityHashMap<>(list.size());
Map<String, Integer> hashMap = new HashMap<>(list.size());
long iMapTime = 0, hashMapTime = 0;
long time;
for (int i = 0; i < list.size(); i++) {
time = System.currentTimeMillis();
iMap.put(list.get(i), i);
time = System.currentTimeMillis() - time;
iMapTime += time;
time = System.currentTimeMillis();
hashMap.put(list.get(i), i);
time = System.currentTimeMillis() - time;
hashMapTime += time;
}
System.out.println("iMapTime = " + iMapTime + " hashMapTime = " + hashMapTime);
}
}
Tried very basic performance check. I am reading dictionary words (235K) & pushing into the both maps. It prints:
list.size() = 235886
iMapTime = 101 hashMapTime = 617
I think this is very good improvment to ignore, unless I am doing something wrong here.

How does IdentityHashMap<String,?> work?
To make IdentityHashMap<String,?> work for arbitrary strings, you'll have to String.intern() both the keys you put() and potential keys you pass to get(). (Or use an equivalent mechanism.)
Note: unlike stated in #m3th0dman's answer, you don't need to intern() the values.
Either way, interning a string ultimately requires looking it up in some kind of hash table of already interned strings. So unless you had to intern your strings for some other reason anyway (and thus already paid the cost), you won't get much of an actual performance boost out of this.
So why does the test show that you can?
Where your test is unrealistic is that you keep the exact list of keys you used with put() and you iterate across them one by one in list order. Note (the same could be achieved by inserting the elements into a LinkedHashMap and simply calling iterator() on its entry set.
What's the point of IdentityHashMap then?
There are scenarios where it is guaranteed (or practically guaranteed) that object identity is the same as equals(). Imagine trying to implement your own ThreadLocal class for example, you'll probably write something like this:
public final class ThreadLocal<T> {
private final IdentityHashMap<Thread,T> valueMap;
...
public T get() {
return valueMap.get( Thread.currentThread() );
}
}
Because you know that threads have no notion of equality beyond identity. Same goes if your map keys are enum values and so on.

You will see significantly faster performance on IdentityHashMap, however that comes at a substantial cost.
You must be absolutely sure that you will never ever have objects added to the map that have the same value but different identities.
That's hard to guarantee both now and for the future, and a lot of people make mistaken assumptions.
For example
String t1 = "test";
String t2 = "test";
t1==t2 will return true.
String t1 = "test";
String t2 = new String("test");
t1==t2 will return false.
Overall my recommendation is that unless you absolutely critically need the performance boost and know exactly what you are doing and heavily lock down and comment access to the class then by using IdentityHashMap you are opening yourself up to massive risks of very hard to track down bugs in the future.

Technically you can do something like this to make sure you have the same instance of the string representation:
public class StringIdentityHashMap extends IdentityHashMap<String, String>
{
#Override
public String put(String key, String value)
{
return super.put(key.intern(), value.intern());
}
#Override
public void putAll(Map<? extends String, ? extends String> m)
{
m.entrySet().forEach(entry -> put(entry.getKey().intern(), entry.getValue().intern()));
}
#Override
public String get(Object key)
{
if (!(key instanceof String)) {
throw new IllegalArgumentException();
}
return super.get(((String) key).intern());
}
//implement the rest of the methods in the same way
}
But this won't help you very much since intern() calls equals() to make sure the given String exists or not in the String pool so you end up with the performance of the typical HashMap.
This, however will only help you to improve memory and not CPU. There is no way to achieve better CPU usage and to be sure your program is correct (without possible using some internal knowledge of JVM which might change) because Strings can be in String pool or not and you cannot know if they are in without (not implicitly) calling equals().

Interestingly, IdentityHashMap can be SLOWER. I am using Class objects as keys, and seeing a ~50% performance INCREASE with HashMap over IdentityHashMap.
IdentityHashMap and HashMap are different internally, so if the equals() method of your keys is really fast, HashMap seems better.

Related

TreeSet Comparator

I have a TreeSet and a custom comparator.
I get the values from server according to the changes in the stock
ex: if time=0 then server will send all the entries on the stock (unsorted)
if time=200 then server will send entries added or deleted after the time 200(unsorted)
In client side i am sorting the entries. My question is which is more efficient
1> fetch all entries first and then call addAll method
or
2> add one by one
there can be millions of entries.
/////////updated///////////////////////////////////
private static Map<Integer, KeywordInfo> hashMap = new HashMap<Integer, KeywordInfo>();
private static Set<Integer> sortedSet = new TreeSet<Integer>(comparator);
private static final Comparator<Integer> comparator = new Comparator<Integer>() {
public int compare(Integer o1, Integer o2) {
int integerCompareValue = o1.compareTo(o2);
if (integerCompareValue == 0) return integerCompareValue;
KeywordInfo k1 = hashMap.get(o1);
KeywordInfo k2 = hashMap.get(o2);
if (null == k1.getKeyword()) {
if (null == k2.getKeyword())
return integerCompareValue;
else
return -1;
} else {
if (null == k2.getKeyword())
return 1;
else {
int compareString = AlphaNumericCmp.COMPARATOR.compare(k1.getKeyword().toLowerCase(), k2.getKeyword().toLowerCase());
//int compareString = k1.getKeyword().compareTo(k2.getKeyword());
if (compareString == 0)
return integerCompareValue;
return compareString;
}
}
}
};
now there is an event handler which gives me an ArrayList of updated entries,
after adding them to my hashMap i am calling
final Map<Integer, KeywordInfo> mapToReturn = new SubMap<Integer, KeywordInfo>(sortedSet, hashMap);

I think your bottleneck can be probably more network-related than CPU related. A bulk operation fetching all the new entries at once would be more network efficient.
With regards to your CPU, the time required to populate a TreeSet does not change consistently between multiple add()s and addAll(). The reason behind is that TreeSet relies on AbstractCollection's addAll() (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/AbstractCollection.java#AbstractCollection.addAll%28java.util.Collection%29) which in turn creates an iterator and calls multiple times add().
So, my advice on the CPU side is: choose the way that keeps your code cleaner and more readable. This is probably obtained through addAll().

In general it is less memory overhead when on being loaded alread data is stored. This should be time efficient too, maybe using small buffers. Memory allocation costs time too.
However time both solutions, in a separate prototype. You really have to test with huge numbers, as network traffic costs much too. That is a bit Test Driven Development, and adds to QA both quantitative statistics, as correctness of implementation.

The actual implementation is a linked list, so add one by one will be faster if you do it right. And i think in the near future this behaviour wont be change.
For your problem a Statefull comparator may help.
// snipplet, must not work fine
public class NaturalComparator implements Comparator{
private boolean anarchy = false;
private Comparator parentComparator;
NaturalComparator(Comparator parent){
this.parentComparator = parent;
}
public void setAnarchy(){...}
public int compare(A a, A b){
if(anarchy) return 1
else return parentCoparator.compare(a,b);
}
}
...
Set<Integer> sortedSet = new TreeSet<Integer>(new NaturalComparator(comparator));
comparator.setAnarchy(true);
sortedSet.addAll(sorted);
comparator.setAnarchy(false);

Should I cache System.getProperty("line.separator")?

Consider such method:
#Override
public String toString()
{
final StringBuilder sb = new StringBuilder();
for (final Room room : map)
{
sb.append(room.toString());
sb.append(System.getProperty("line.separator")); // THIS IS IMPORTANT
}
return sb.toString();
}
System.getProperty("line.separator") can be called many times.
Should I cache this value with public final static String lineSeperator = System.getProperty("line.separator")
and later use only lineSeperator?
Or System.getProperty("line.separator") is as fast as using a static field?

I see your question as presenting a false dichotomy. I would neither call getProperty every time, nor declare a static field for it. I'd simply extract it to a local variable in toString.
#Override
public String toString()
{
final StringBuilder sb = new StringBuilder();
final String newline = System.getProperty("line.separator");
for (final Room room : map) sb.append(room.toString()).append(newline);
return sb.toString();
}
BTW I have benchmarked the call. The code:
public class GetProperty
{
static char[] ary = new char[1];
#GenerateMicroBenchmark public void everyTime() {
for (int i = 0; i < 100_000; i++) ary[0] = System.getProperty("line.separator").charAt(0);
}
#GenerateMicroBenchmark public void cache() {
final char c = System.getProperty("line.separator").charAt(0);
for (int i = 0; i < 100_000; i++) ary[0] = (char)(c | ary[0]);
}
}
The results:
Benchmark Mode Thr Cnt Sec Mean Mean error Units
GetProperty.cache thrpt 1 3 5 10.318 0.223 ops/msec
GetProperty.everyTime thrpt 1 3 5 0.055 0.000 ops/msec
The cached approach is more than two orders of magnitude faster.
Do note that the overall impact of getProperty call against all that string building is very, very unlikely to be noticeable.

You do not need to fear that the line separator will change while your code is running, so I see no reason against caching it.
Caching a value is certainly faster than executing a call over and over, but the difference will probably be negligible.

If you have become aware of a performance problem that you know relates to this, yes.
If you haven't, then no, the lookup is unlikely to have enough overhead to matter.
This would fall under either or both of the general categories "micro-optimization" and "premature optimization." :-)
But if you're worried about efficiency, you probably have a much bigger opportunity in that your toString method is regenerating the string every time. If toString will be called a lot, rather than caching the line terminator, cache the generated string, and clear that whenever your map of rooms changes. E.g.:
#Override
public String toString()
{
if (cachedString == null)
{
final StringBuilder sb = new StringBuilder();
final String ls = System.getProperty("line.separator");
for (final Room room : map)
{
sb.append(room.toString());
sb.append(ls);
}
cachedString = sb.toString();
}
return cachedString;
}
...and when your map changes, do
cachedString = null;
That's a lot more bang for the buck (the buck being the overhead of an extra field). Granted it's per-instance rather than per-class, so (reference earlier comment about efficiency) only do it if you have a good reason to.

Since it's so easy to do, why not? At the very least the implementation of System.getProperty() will have to do a hash table lookup (even if cached internally) to find the property you are requesting, then the virtual method getString() will be called on the resulting Object. None of these are very expensive but will need to be called multiple times. Not to mention many String temporaries will be created and need GCing after.
If you move this out to the top of your loop and reuse the same value, you avoid all of these problems. So why not?

If the system property is guaranteed to remain constant during the application it can be cached but in general you will loose the feature of the property which is changing the behavior when you change it.
For instance a text generator could use the property to generate text for windows or for linux and allow the property to be changed dynamically in the application, why not ?
In general, catching a property implies making useless the function setProperty.

Hashmap vs Array performance

Is it (performance-wise) better to use Arrays or HashMaps when the indexes of the Array are known? Keep in mind that the 'objects array/map' in the example is just an example, in my real project it is generated by another class so I cant use individual variables.
ArrayExample:
SomeObject[] objects = new SomeObject[2];
objects[0] = new SomeObject("Obj1");
objects[1] = new SomeObject("Obj2");
void doSomethingToObject(String Identifier){
SomeObject object;
if(Identifier.equals("Obj1")){
object=objects[0];
}else if(){
object=objects[1];
}
//do stuff
}
HashMapExample:
HashMap objects = HashMap();
objects.put("Obj1",new SomeObject());
objects.put("Obj2",new SomeObject());
void doSomethingToObject(String Identifier){
SomeObject object = (SomeObject) objects.get(Identifier);
//do stuff
}
The HashMap one looks much much better but I really need performance on this so that has priority.
EDIT: Well Array's it is then, suggestions are still welcome
EDIT: I forgot to mention, the size of the Array/HashMap is always the same (6)
EDIT: It appears that HashMaps are faster
Array: 128ms
Hash: 103ms
When using less cycles the HashMaps was even twice as fast
test code:
import java.util.HashMap;
import java.util.Random;
public class Optimizationsest {
private static Random r = new Random();
private static HashMap<String,SomeObject> hm = new HashMap<String,SomeObject>();
private static SomeObject[] o = new SomeObject[6];
private static String[] Indentifiers = {"Obj1","Obj2","Obj3","Obj4","Obj5","Obj6"};
private static int t = 1000000;
public static void main(String[] args){
CreateHash();
CreateArray();
long loopTime = ProcessArray();
long hashTime = ProcessHash();
System.out.println("Array: " + loopTime + "ms");
System.out.println("Hash: " + hashTime + "ms");
}
public static void CreateHash(){
for(int i=0; i <= 5; i++){
hm.put("Obj"+(i+1), new SomeObject());
}
}
public static void CreateArray(){
for(int i=0; i <= 5; i++){
o[i]=new SomeObject();
}
}
public static long ProcessArray(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkArray(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkArray(String Identifier) {
SomeObject object;
if(Identifier.equals("Obj1")){
object=o[0];
}else if(Identifier.equals("Obj2")){
object=o[1];
}else if(Identifier.equals("Obj3")){
object=o[2];
}else if(Identifier.equals("Obj4")){
object=o[3];
}else if(Identifier.equals("Obj5")){
object=o[4];
}else if(Identifier.equals("Obj6")){
object=o[5];
}else{
object = new SomeObject();
}
object.kill();
}
public static long ProcessHash(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkHash(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkHash(String Identifier) {
SomeObject object = (SomeObject) hm.get(Identifier);
object.kill();
}
}

HashMap uses an array underneath so it can never be faster than using an array correctly.
Random.nextInt() is many times slower than what you are testing, even using array to test an array is going to bias your results.
The reason your array benchmark is so slow is due to the equals comparisons, not the array access itself.
HashTable is usually much slower than HashMap because it does much the same thing but is also synchronized.
A common problem with micro-benchmarks is the JIT which is very good at removing code which doesn't do anything. If you are not careful you will only be testing whether you have confused the JIT enough that it cannot workout your code doesn't do anything.
This is one of the reason you can write micro-benchmarks which out perform C++ systems. This is because Java is a simpler language and easier to reason about and thus detect code which does nothing useful. This can lead to tests which show that Java does "nothing useful" much faster than C++ ;)

arrays when the indexes are know are faster (HashMap uses an array of linked lists behind the scenes which adds a bit of overhead above the array accesses not to mention the hashing operations that need to be done)
and FYI HashMap<String,SomeObject> objects = HashMap<String,SomeObject>(); makes it so you won't have to cast

For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.

Logically, HashMap is definitely a fit in your case. From performance standpoint is also wins since in case of arrays you will need to do number of string comparisons (in your algorithm) while in HashMap you just use a hash code if load factor is not too high. Both array and HashMap will need to be resized if you add many elements, but in case of HashMap you will need to also redistribute elements. In this use case HashMap loses.

Arrays will usually be faster than Collections classes.
PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo
"The HashTable one looks much much
better "

The example is strange. The key problem is whether your data is dynamic. If it is, you could not write you program that way (as in the array case). In order words, comparing between your array and hash implementation is not fair. The hash implementation works for dynamic data, but the array implementation does not.
If you only have static data (6 fixed objects), array or hash just work as data holder. You could even define static objects.

ThreadLocal value access across different threads

Given that a ThreadLocal variable holds different values for different threads, is it possible to access the value of one ThreadLocal variable from another thread?
I.e. in the example code below, is it possible in t1 to read the value of TLocWrapper.tlint from t2?
public class Example
{
public static void main (String[] args)
{
Tex t1 = new Tex("t1"), t2 = new Tex("t2");
new Thread(t1).start();
try
{
Thread.sleep(100);
}
catch (InterruptedException e)
{}
new Thread(t2).start();
try
{
Thread.sleep(1000);
}
catch (InterruptedException e)
{}
t1.kill = true;
t2.kill = true;
}
private static class Tex implements Runnable
{
final String name;
Tex (String name)
{
this.name = name;
}
public boolean kill = false;
public void run ()
{
TLocWrapper.get().tlint.set(System.currentTimeMillis());
while (!kill)
{
// read value of tlint from TLocWrapper
System.out.println(name + ": " + TLocWrapper.get().tlint.get());
}
}
}
}
class TLocWrapper
{
public ThreadLocal<Long> tlint = new ThreadLocal<Long>();
static final TLocWrapper self = new TLocWrapper();
static TLocWrapper get ()
{
return self;
}
private TLocWrapper () {}
}

As Peter says, this isn't possible. If you want this sort of functionality, then conceptually what you really want is just a standard Map<Thread, Long> - where most operations will be done with a key of Thread.currentThread(), but you can pass in other threads if you wish.
However, this likely isn't a great idea. For one, holding a reference to moribund threads is going to mess up GC, so you'd have to go through the extra hoop of making the key type WeakReference<Thread> instead. And I'm not convinced that a Thread is a great Map key anyway.
So once you go beyond the convenience of the baked-in ThreadLocal, perhaps it's worth questioning whether using a Thread object as the key is the best option? It might be better to give each threads unique IDs (Strings or ints, if they don't already have natural keys that make more sense), and simply use these to key the map off. I realise your example is contrived, but you could do the same thing with a Map<String, Long> and using keys of "t1" and "t2".
It would also arguably be clearer since a Map represents how you're actually using the data structure; ThreadLocals are more like scalar variables with a bit of access-control magic than a collection, so even if it were possible to use them as you want it would likely be more confusing for other people looking at your code.

Based on the answer of Andrzej Doyle here a full working solution:
ThreadLocal<String> threadLocal = new ThreadLocal<String>();
threadLocal.set("Test"); // do this in otherThread
Thread otherThread = Thread.currentThread(); // get a reference to the otherThread somehow (this is just for demo)
Field field = Thread.class.getDeclaredField("threadLocals");
field.setAccessible(true);
Object map = field.get(otherThread);
Method method = Class.forName("java.lang.ThreadLocal$ThreadLocalMap").getDeclaredMethod("getEntry", ThreadLocal.class);
method.setAccessible(true);
WeakReference entry = (WeakReference) method.invoke(map, threadLocal);
Field valueField = Class.forName("java.lang.ThreadLocal$ThreadLocalMap$Entry").getDeclaredField("value");
valueField.setAccessible(true);
Object value = valueField.get(entry);
System.out.println("value: " + value); // prints: "value: Test"
All the previous comments still apply of course - it's not safe!
But for debugging purposes it might be just what you need - I use it that way.

I wanted to see what was in ThreadLocal storage, so I extended the above example to show me. Also handy for debugging.
Field field = Thread.class.getDeclaredField("threadLocals");
field.setAccessible(true);
Object map = field.get(Thread.currentThread());
Field table = Class.forName("java.lang.ThreadLocal$ThreadLocalMap").getDeclaredField("table");
table.setAccessible(true);
Object tbl = table.get(map);
int length = Array.getLength(tbl);
for(int i = 0; i < length; i++) {
Object entry = Array.get(tbl, i);
Object value = null;
String valueClass = null;
if(entry != null) {
Field valueField = Class.forName("java.lang.ThreadLocal$ThreadLocalMap$Entry").getDeclaredField("value");
valueField.setAccessible(true);
value = valueField.get(entry);
if(value != null) {
valueClass = value.getClass().getName();
}
Logger.getRootLogger().info("[" + i + "] type[" + valueClass + "] " + value);
}
}

It only possible if you place the same value in a field which is not ThreadLocal and access that instead. A ThreadLocal by definition is only local to that thread.

ThreadLocalMap CAN be access via Reflection and Thread.class.getDeclaredField("threadLocals") setAccssible(true), and so on.
Do not do that, though. The map is expected to be accessed by the owning thread only and accessing any value of a ThreadLocal is a potential data race.
However, if you can live w/ the said data races, or just avoid them (way better idea). Here is the simplest solution. Extend Thread and define whatever you need there, that's it:
ThreadX extends Thread{
int extraField1;
String blah2; //and so on
}
That's a decent solution that doesn't relies on WeakReferences but requires that you create the threads. You can set like that ((ThreadX)Thread.currentThread()).extraField1=22
Make sure you do no exhibit data races while accessing the fields. So you might need volatile, synchronized and so on.
Overall Map is a terribad idea, never keep references to object you do not manage/own explicitly; especially when it comes to Thread, ThreadGroup, Class, ClassLoader... WeakHashMap<Thread, Object> is slightly better, however you need to access it exclusively (i.e. under lock) which might damper the performance in heavily multithreaded environment. WeakHashMap is not the fastest thing in the world.
ConcurrentMap, Object> would be better but you need a WeakRef that has equals and hashCode...

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();
// ...
map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
if (!map.containsKey(name)) {
map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.
What is the best practice for using a putIfAbsent in this manner?

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}
(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)
Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)
Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());
(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.
Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).
The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.
import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());
The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.
This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:
The entire method invocation is performed atomically, so the function
is applied at most once per key. Some attempted update operations on
this map by other threads may be blocked while computation is in
progress, so the computation should be short and simple...
Note: I am a committer for Eclipse Collections.

By keeping a pre-initialized value for each thread you can improve on the accepted answer:
Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);
I recently used this with AtomicInteger map values rather than Set.

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.
Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.
For example:
private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
#Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};
private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();
And the actual logic...
// minimize object creation during thread contention
final Set<X> cached = threadCache.get();
Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}
// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}

My generic approximation:
public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
private static final long serialVersionUID = 42L;
public V initIfAbsent(final K key) {
V value = get(key);
if (value == null) {
value = initialValue();
final V x = putIfAbsent(key, value);
value = (x != null) ? x : value;
}
return value;
}
protected V initialValue() {
return null;
}
}
And as example of use:
public static void main(final String[] args) throws Throwable {
ConcurrentHashMapWithInit<String, HashSet<String>> map =
new ConcurrentHashMapWithInit<String, HashSet<String>>() {
private static final long serialVersionUID = 42L;
#Override
protected HashSet<String> initialValue() {
return new HashSet<String>();
}
};
map.initIfAbsent("s1").add("chao");
map.initIfAbsent("s2").add("bye");
System.out.println(map.toString());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String IdentityHashMap vs HashMap performance - java

Interestingly, IdentityHashMap can be SLOWER. I am using Class objects as keys, and seeing a ~50% performance INCREASE with HashMap over IdentityHashMap. IdentityHashMap and HashMap are different internally, so if the equals() method of your keys is really fast, HashMap seems better.

Related

TreeSet Comparator

Should I cache System.getProperty("line.separator")?

Hashmap vs Array performance

ThreadLocal value access across different threads

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

Categories

Resources