Task is to keep track of some running processes. Keeping that information in memory is just fine, so I'm using a concurrent hash map to store that data:
ConcurrentHashMap<String, ProcessMetaData> RUNNING_PROCESSES = new ConcurrentHashMap();
It's all good and fine with safely putting new objects in the map, problem is that state of those processes change so I have to update ProcessMetaData from time to time. I made ProcessMetaData immutable and use ConcurrentHashMap's compute() method to update values, but now the problem is ProcessMetaData gets more complicated and keeping it immutable gets hardly manageable. The question is - as long as I only update ProcessMetaData in atomic (as per javadoc) compute() method - the object may be mutable and overall things will still be thread-safe? Is my assumption correct?
As long as you only access the value within the function passed to compute, modifications made in that function are safe.
This, however, is a pointless theoretical view. The purpose of storing values into a collection or map, is to eventually retrieve and use them. And this is where the problems start.
The compute method returns the result value just like get returns the currently stored value. Once a caller starts using that value, this use may be concurrent to subsequent compute operations on the map. The get method may even retrieve the value while a compute operation is in progress. Allowing non-blocking retrieval operation is one of ConcurrentHashMap’s main features. Therefore, all kind of race conditions may occur.
So, using a mutable object and modifying an already stored value in compute is only safe, when you use the map as write-only memory, which is a far-fetched scenario. It might work when you use a different thread safe mechanism to ensure that all updates have been completed before starting to read the map, but your use case seems to be different.
Related
I have one doubt. What will happen if I get from map at same time when I am putting to map some data?
What I mean is if map.get() and map.put() are called by two separate processes at the same time. Will get() wait until put() has been executed?
It depends on which Map implementation you are using.
For example, ConcurrentHashMap supports full concurrency, and get() will not wait for put() to get executed, and stated in the Javadoc :
* <p> Retrieval operations (including <tt>get</tt>) generally do not
* block, so may overlap with update operations (including
* <tt>put</tt> and <tt>remove</tt>). Retrievals reflect the results
* of the most recently <em>completed</em> update operations holding
* upon their onset.
Other implementations (such as HashMap) don't support concurrency and shouldn't be used by multiple threads at the same time.
It might throw ConcurrentModificationException- not sure about it. It is always better to use synchronizedMap.This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map map = Collections.synchronizedMap(new HashMap(...));
Map is an interface, so the answer depends on the implementation you're using.
Generally speaking, the simpler implementations of this interface, such as HashMap and TreeMap are not thread safe. If you don't have some synchronization built around them, concurrently puting and geting will result in an undefined behavior - you may get the new value, you may get the old one, bust most probably you'd just get a ConcurrentModificationException, or something worse.
If you want to handle the same Map from different threads, either use one of the implementations of a ConcurrentMap (e.g., a ConcurrentHashMap), which guarantees a happens-before-sequence (i.e., if the get was fired before the put, you'd get the old value, even if the put is ongoing, and vise-versa), or synchronize the Map's access (e.g., by calling Collections#synchronizedMap(Map).
Background:
I have a large thread-pool in java each process has some internal state.
I would like to gather some global information about the states -- to do that I have an associative commutative aggregation function (e.g. sum -- mine needs to be plug-able though).
The solution needs to have a fixed memory consumption and be log-free in best case not disturbing the pool at all. So no thread should need to require a log (or enter a synchronized area) when writing to the data-structure. The aggregated value is only read after the threads are done, so I don't need an accurate value all the time. Simply collecting all values and aggregate them after the pool is done might lead to memory problems.
The values are going to be more complex datatypes so I cannot use AtomicInteger etc.
My general Idea for the solution:
Have a log-free collection where all threads put their updates to. I don't even need the order of the events.
If it gets to big run the aggregation function on it (compacting it) while the threads continue filling it.
My question:
Is there a data structure that allows for something like that or do I need to implement it from scratch? I couldn't find anything that directly matches my problem. If I have to implement from scratch what would be a good non-blocking collection class to start from?
If the updates are infrequent (relatively speaking) and the aggregation function is fast, I would recommend aggregrating every time:
State myState;
AtomicReference<State> combinedState;
do
{
State original = combinedState.get();
State newCombined = Aggregate(original, myState);
} while(!combinedState.compareAndSet(original, newCombined));
I don't quite understand the question but I would, at first sight, suggest an IdentityHashMap where keys are (references to) your thread objects and values are where your thread objects write their statistics.
An IdentityHashMap only relies on reference equality, as such there would never be any conflict between two thread objects; you could pass a reference to that map to each thread (which would then call .get(this) on the map to get a reference to the collecting data structure), which would then collect the data it wants. Otherwise you could just pass a reference to the collecting data structure to the thread object.
Such a map is inherently thread safe for your use case, as long as you create the key/value pair for that thread before starting the thread, and because no thread object will ever modify the map anyway since they won't have a referece to it. With some management smartness you can even remove entries from this map, even if the map is not even thread-safe, once the thread is done with its work.
When all is done, you have a map whose values contains all the data collected.
Hope this helps... Reading the question again, in any case...
The ConcurrentHashMap provides thread-safe but the docs state:
" However, even though all operations are thread-safe, retrieval operations do not entail locking"
So from this I understand that getting or setting a key and value are thread-safe, but modifying the actual VALUE of any given key isn't (by value I actaully mean the value or state of that object).
I'm just confused on how this works, at the moment I think things work like this.
The ConcurrentHashMap only gaurantees the key's are thread-safe in terms setting/getting them. But the object you put inside the map has to gaurd for concurrency by itself.
Is this correct?
But the object you put inside the map has to gaurd for concurrency by itself.
Your understanding is correct.
From the documentation:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
What the above is also saying is that there is no built-in mechanism for automatic locking of the hash map while the reading takes place. In particular, this means that get() operations can overlap with concurrent modifications performed by other threads.
The document goes on to explain the concurrency semantics:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
What you say is true by default -- there would be no way for the map to enforce the thread safety of either its keys or its values since these are objects that come from outside. What you read about the retrieval of objects, however, has nothing to do with that fact. The map doesn't block while retrieving a value so another update may be happening at the same time (these operation can overlap).
The basic idea of the ConcurrentHashMap is that only modifications use a lock, while retrieval-only operations don't. This is possible because the entire data structure and the operations on it are defined in a way that allows get() to only ever see a "consistent enough" state of the map to do its work. If there's currently an insert operation in progress, then get() either sees the result or it doesn't, but it won't ever see a partial result or even temporarily invalid data.
I have a instance of a object which performs very complex operation.
So in the first case I create an instance and save it it my own custom cache.
From next times whatever thread comes if he finds that a ready made object is already present in the cache they take it from the cache so as to be good in performance wise.
I was worried about what if two threads have the same instance. IS there a chance that the two threads can corrupt each other.
Map<String, SoftReference<CacheEntry<ClassA>>> AInstances= Collections.synchronizedMap(new HashMap<String, SoftReference<CacheEntry<ClassA>>>());
There are many possible solutions:
Use an existing caching solution like EHcache
Use the Spring framework which got an easy way to cache results of a method with a simple #Cacheable annotation
Use one of the synchronized maps like ConcurrentHashMap
If you know all keys in advance, you can use a lazy init code. Note that everything in this code is there for a reason; change anything in get() and it will break eventually (eventually == "your unit tests will work and it will break after running one year in production without any problem whatsoever").
ConcurrentHashMap is most simple to set up but it has simple way to say "initialize the value of a key once".
Don't try to implement the caching by yourself; multithreading in Java has become a very complex area with Java 5 and the advent of multi-core CPUs and memory barriers.
[EDIT] yes, this might happen even though the map is synchronized. Example:
SoftReference<...> value = cache.get( key );
if( value == null ) {
value = computeNewValue( key );
cache.put( key, value );
}
If two threads run this code at the same time, computeNewValue() will be called twice. The method calls get() and put() are safe - several threads can try to put at the same time and nothing bad will happen, but that doesn't protect you from problems which arise when you call several methods in succession and the state of the map must not change between them.
Assuming you are talking about singletons, simply use the "demand on initialization holder idiom" to make sure your "check" works across all JVM's. This will also make sure all threads which are requesting the same object concurrently wait till the initialization is over and be given back only valid object instance.
Here I'm assuming you want a single instance of the object. If not, you might want to post some more code.
Ok If I understand your problem correctly, you are worried that 2 objects changing the state of the shared object will corrupt each other.
The short answer is yes they will.
If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.
If the state should be writable but you don't actually need threads to see each others updates. You can simply load the object once in an immutable cache and just return copies to anyone who asks for the object.
Finally if your object needs to be writable and shared (for other reasons than it just being expensive to create). Then my friend you need to handle thread safety, I don't know your case but you should take a look at the synchronized keyword, Locks and java 5 concurrency features, Atomic types. I am sure one of them will satisfy your need and I sincerely wish that your case is one of the first 2 :)
If you only have a single instance of the Object, have a quick look at:
Thread-safe cache of one object in java
Other wise I can't recommend the google guava library enough, in particular look at the MapMaker class.
I want to make my class thread-safe without large overhead.
The instances will be seldom used concurrently, but it may happen.
Most of the class is immutable, there's only one mutable member used as a cache:
private volatile SoftReference<Map<String, Something>> cache
= new SoftReference(null);
which gets assigned in the constructor (not shared) like
Map<String, Something> tmp = new HashMap<String, Something>();
tmp.put("a", new Something("a");
tmp.put("b", new Something("b");
cache = new SoftReference(tmp);
After the assignment, the map gets never modified.
It's no problem, when two threads compute the cache in parallel, since the value will be the same.
The additional overhead of the word done twice is acceptable.
When a thread wouldn't see the value computed by another tread, it'd compute it unnecessary, and this is acceptable.
This wouldn't happen because of volatile.
When a thread sees value computed by another tread, it's fine.
The only possible problem would be a thread seeing inconsistent state (e.g. a partly filled map).
Can this happen?
Notes:
I really want the whole map being softly referenced, there's no use for a map using soft keys or values here.
I know about ConcurrentHashMap and will maybe use it anyway, but I'm curious, if using volatile only works.
The only possible problem would be a
thread seeing inconsistent state (e.g.
a partly filled map). Can this happen?
No. Actions performed within a thread must be performed as if they had been executed in order. Writing a volatile variable happens-before any read of that value. Hence, initialization of the map happens-before any thread reading the reference to the map from the field.
The problem with using a soft reference is that you can lose the whole map/cache after a GC. This means the performance of your application can be hit very hard. You are better off using a cache with an eviction policy so that you never have this problem.
The volatile doesn't make any operation safe here.
You haven't shown all your code, perhaps we could offer some suggestion on how you could improve your code e.g. your sample code should compile ;)