Can I use the standard Collections classes (as opposed to the concurrent ones) as long as I ensure the code makes no data changes on multiple threads. The code that I'm talking about is completely under my control, and I'm not mutating it after the initial (single-threaded) population phase.
I know that some classes such as DateFormat are not threadsafe because they store intermediate states as they are being used. Are the collections (ArrayList, Tree Map, etc.) safe though?
Collections are generally safe for concurrent reading, assuming they are safely published. Apart from that, I'd also recommend the collections are wrapped with the unmodifiable wrappers (such as Collections.unmodifiableList) and that the elements in them are immutable (but you probably already knew this).
Yes. In the Java API docs, each non-threadsafe collection has a warning similar to this one in TreeMap:
Note that this implementation is not synchronized. If multiple threads
access a map concurrently, and at least one of the threads modifies
the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with an existing
key is not a structural modification.)
Emphasis mine. As long as there are zero structural modifications, you should be just fine without external synchronization.
The collections are safe for read and your use case (initialize once then use) is fine.
The thing to be careful of if you try and extend this is that even if only one thread is modifying collections or the objects inside collections then that can have consequences for reader threads.
No, you must check two kinds:
Multiple Threads (as you wrote).
Same Thread in a current, open Iterration.
If you check this two, you are fine to use standard collections.
Regards.
Related
Can we use Hashmap's containsKey() method without synchronizing in an multi-threaded environment?
Note: Threads are only going to read the Hashmap. The map is initialized once, and is never modified again.
It really depends on how/when your map is accessed.
Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.
Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:
Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
In this case, you should use ConcurrentHashMap, or synchronize externally.
You shouldn't look at a single method this way. A HashMap is not meant to be used in a multi-threaded setup.
Having said that, the one exception would be: a map that gets created once (single threaded), and afterwards is "read" only. In other words: if a map doesn't get changed anymore, then you can have as many threads reading it as you want.
From that point of view, just containsKey() calls shouldn't call a problem. The problem arises when the information that this method relies on changes over time.
No, it is not thread-safe for any operations. You need to synchronise all access or use something like ConcurrentHashMap.
My favourite production system troubleshooting horror story is when we found that HashMap.get went into an infinite loop locking up 100% CPU forever because of missing synchronisation. This happened because the linked lists that were used within each bucket got into an inconsistent state. The same could happen with containsKey.
You should be safe if no one modifies the HashMap after it has been initially published, but better use an implementation that guarantees this explicitly (such as ImmutableMap or, again, a ConcurrentMap).
No. (No it is not. Not at all. 30 characters?)
It's complicated, but, mostly, no.
The spec of HashMap makes no guarantees whatsoever. It therefore reserves the right to blast yankee doodle dandy from your speakers if you try: You're just not supposed to use it that way.
... however, in practice, whilst the API of HashMap makes no guarantees, generally it works out. But, mind the horror story of #Thilo's answer.
... buuut, the Java Memory Model works like this: You should consider that each thread gets an individual copy of each and every field across the entire heap of the VM. These individual copies are then synced up at indeterminate times. That means that all sorts of code simply isn't going to work right; you add an entry to the map from one thread, and if you then access that map from another you won't see it even though a lot of time has passed – that's theoretically possible. Also, internally, map uses multiple fields and presumably these fields must be consistent with each other or you'll get weird behaviours (exceptions and wrong results). The JMM makes no guarantees about consistency either. The way out of this dilemma is that the JMM offers these things called 'comes-before/comes-after' relationships which give you guarantees that changes have been synced up. Using the 'synchronized' keyword is one easy way to get such relationships going.
Why not use a ConcurrentHashMap which has all the bells and whistles built in and does in fact guarantee that adding an entry from thread A and then querying it via containsKey from thread B will get you a consistent answer (which might still be 'no, that key is not in the map', because perhaps thread B got there slightly before thread A or slightly after but there's no way for you to know. It won't throw any exceptions or do something really bizarre such as returning 'false' for things you added ages ago all of a sudden).
So, whilst it's complicated, the answer is basically: Don't do that; either use a synchronized guard, or, probably the better choice: ConcurrentHashMap.
No, Read the bold part of HashMap documentation:
Note that this implementation is not synchronized.
So you should handle it:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
And suggested solutions:
This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method
#user7294900 is right.
If your application does not modifies the HashMap structurally which is build thread-safely and your application just invoke containsKey method, it's thread safe.
For instance, I've used HashMap like this:
#Component
public class SpringSingletonBean {
private Map<String, String> map = new HashMap<>();
public void doSomething() {
//
if (map.containsKey("aaaa")) {
//do something
}
}
#PostConstruct
public void init() {
// do something to initialize the map
}
}
It works well.
In an application which uses a BlockingQueue, I am facing a new requirement that can only be implemented by iterating over the elements present in the queue (to provide info about the current status of the elements in there).
According to the API Javadoc only the queueing methods of a BlockingQueue implementation are required to be thread-safe. Other API methods (eg. those inherited from the Collection interface) may not be used concurrently, though I am not sure whether this also applies to mere read access...
Can I safely use iterator() WITHOUT altering the producer/consumer threads which may normally interact with the queue at any time? I have no need for a 100% consistent iteration (it does not matter whether I see elements added/removed while iterating the queue), but I don't want to end up with nasty ConcurrentModificationExceptions.
Note that the application is currently using a LinkedBlockingQueue, but I am free to choose any other (unbounded) BlockingQueue implementation (including free open-source third-party implementations). Also, I don't want to rely on things that may break in the future, so I want a solution that is OK according to the API and does not just merely happen to work with the current JRE.
Actually, the Java 8 javadoc for BlockingQueue states this:
BlockingQueue implementations are thread-safe.
Nothing in the javadoc says1 that this only applies to the methods specified in the BlockingQueue API itself.
Can I safely use iterator() WITHOUT altering the producer/consumer threads which may normally interact with the queue at any time?
Basically, yes. The Iterator's behavior in the face of concurrent modifications is specified in the implementation class javadocs. For LinkedBlockingQueue, the javadoc specifies that the Iterator returned by iterator() is weakly consistent. That means (for example) that your application won't get a ConcurrentModificationException if the queue is modified while it is iterating, but the iteration is not guaranteed to see all queue entries.
1 - The javadoc mentions that the bulk operations may be non-atomic, but non-atomic does not mean non-thread-safe. What it means here is that some other thread may observe the queue in state where some entries have been added (or removed, or whatever) and others haven't.
#John Vint warns:
Keep in mind, this is as of Java 8 and can change.
If Oracle decided to alter the behavior specified in the javadoc, that would be an impediment to migration. Past history shows that Sun / Oracle avoid doing that kind of thing.
Yes, you can iterate over the entire queue. Looking at LinkedBlockingQueue and ArrayBlockingQueue implementations you do have a side effect. When constructing and operating the Iterator there are three places where full locks are acquired.
During construction
When invoking next()
When invoking remove()
Keep in mind, this is as of Java 8 and can change.
So, yes you do get to iterate safely, but you will effect the performace of puts and offers.
Now for your question, does BlockingQueue offer safe iteration? The answer there is it depends on the implementation. There could be a future BlockingQueue implementation that will throw a UnsupportedOperationException.
I have one doubt. What will happen if I get from map at same time when I am putting to map some data?
What I mean is if map.get() and map.put() are called by two separate processes at the same time. Will get() wait until put() has been executed?
It depends on which Map implementation you are using.
For example, ConcurrentHashMap supports full concurrency, and get() will not wait for put() to get executed, and stated in the Javadoc :
* <p> Retrieval operations (including <tt>get</tt>) generally do not
* block, so may overlap with update operations (including
* <tt>put</tt> and <tt>remove</tt>). Retrievals reflect the results
* of the most recently <em>completed</em> update operations holding
* upon their onset.
Other implementations (such as HashMap) don't support concurrency and shouldn't be used by multiple threads at the same time.
It might throw ConcurrentModificationException- not sure about it. It is always better to use synchronizedMap.This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map map = Collections.synchronizedMap(new HashMap(...));
Map is an interface, so the answer depends on the implementation you're using.
Generally speaking, the simpler implementations of this interface, such as HashMap and TreeMap are not thread safe. If you don't have some synchronization built around them, concurrently puting and geting will result in an undefined behavior - you may get the new value, you may get the old one, bust most probably you'd just get a ConcurrentModificationException, or something worse.
If you want to handle the same Map from different threads, either use one of the implementations of a ConcurrentMap (e.g., a ConcurrentHashMap), which guarantees a happens-before-sequence (i.e., if the get was fired before the put, you'd get the old value, even if the put is ongoing, and vise-versa), or synchronize the Map's access (e.g., by calling Collections#synchronizedMap(Map).
i have a function which inserts inside an arrayList strings passed as parameter.This function can be accessed by different threads,
public void adding(String newStringForEachInvocation){
arrayList.add(newStringForEachInvocation);
}
i want to keep the add method concurrently and my doubt is, if two threads have got two differents strings is it possible for them to compete for the same bucket?
Another alternative is using the blockingQueue , but anyway it could represent a mutual esclusion for threads competing for the same bucket or not?
Yes, ArrayList is not thread-safe, and all the accesses to the list must thus be synchronized if it's accessed by multiple threads (explicitely, and/or by wrapping it using a Collections.synchronizedList()). Anything could happen if you're not doing it (data corruption, exceptions, etc.).
There are alternative, non-blocking List implementations, like CopyOnWriteArrayList. But depending on the use case it could be faster or slower than using a synchronized list.
Use Collections.synchronizedList, all unitary operation on that list will be synchronized
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#synchronizedList(java.util.List)
Be careful though, if you are going to accomplish more than one operation on that list, like an iteration, use a synchronized block to ensure the integrity of the list, as specified on the javadoc :
It is imperative that the user manually synchronize on the returned list when iterating over it
If my requirements dictate that most of my accesses to the list are for reading and modifications if any are going to be minimal, why can't I just do either of the following
synchronize modifyList method and use ArrayList. All reads from arraylist will be unsynchronized
or
inside modifyList, do a Collections.synchronizedList(arrayList)
or
CopyOnWriteArrayList (not sure what it buys here)
Why would I use either ? which is better ?
For 1 & 2, I'm not sure what you're trying to accomplish by only synchronizing writes. If there are potential readers who might be iterating the list, or who are looking things up by index, then only synchronizing writes proves nothing. The readers will still be able to read while writes are in progress and may see dirty data or get exceptions (ConcurrentModification or IndexOutOfBounds.)
You would need to synchronize both your reads and writes if you want 'safe' iterating and getting while other threads make changes. At which point, you may as well have just used a Vector.
CopyOnWriteArrayList is purpose built for what you want to do. It buys safe synchronization-free iterators, while substantially increasing the cost of writes. It also had the advantage of doing what you want (or what it seems you want from the terse question :) ), entirely encapsulated within the JavaSE API, which reduces 'surprise' for future developers.
(do note that if you have multi-step processes involving reads with 'get' even using CopyOnWriteArrayList may not be entirely safe. You need to evaluate what your code actually does and if an interleaving modification would break the method that is getting.)
Another solution would be to use ReentrantReadWriteLock so you can use read-only locks on read operations (which don't block other reads) and a write lock for when you're writing (which will block until there are no reads, and won't allow any read locks until it's released.