Concurrently iterating over a BlockingQueue - java

In an application which uses a BlockingQueue, I am facing a new requirement that can only be implemented by iterating over the elements present in the queue (to provide info about the current status of the elements in there).
According to the API Javadoc only the queueing methods of a BlockingQueue implementation are required to be thread-safe. Other API methods (eg. those inherited from the Collection interface) may not be used concurrently, though I am not sure whether this also applies to mere read access...
Can I safely use iterator() WITHOUT altering the producer/consumer threads which may normally interact with the queue at any time? I have no need for a 100% consistent iteration (it does not matter whether I see elements added/removed while iterating the queue), but I don't want to end up with nasty ConcurrentModificationExceptions.
Note that the application is currently using a LinkedBlockingQueue, but I am free to choose any other (unbounded) BlockingQueue implementation (including free open-source third-party implementations). Also, I don't want to rely on things that may break in the future, so I want a solution that is OK according to the API and does not just merely happen to work with the current JRE.

Actually, the Java 8 javadoc for BlockingQueue states this:
BlockingQueue implementations are thread-safe.
Nothing in the javadoc says1 that this only applies to the methods specified in the BlockingQueue API itself.
Can I safely use iterator() WITHOUT altering the producer/consumer threads which may normally interact with the queue at any time?
Basically, yes. The Iterator's behavior in the face of concurrent modifications is specified in the implementation class javadocs. For LinkedBlockingQueue, the javadoc specifies that the Iterator returned by iterator() is weakly consistent. That means (for example) that your application won't get a ConcurrentModificationException if the queue is modified while it is iterating, but the iteration is not guaranteed to see all queue entries.
1 - The javadoc mentions that the bulk operations may be non-atomic, but non-atomic does not mean non-thread-safe. What it means here is that some other thread may observe the queue in state where some entries have been added (or removed, or whatever) and others haven't.
#John Vint warns:
Keep in mind, this is as of Java 8 and can change.
If Oracle decided to alter the behavior specified in the javadoc, that would be an impediment to migration. Past history shows that Sun / Oracle avoid doing that kind of thing.

Yes, you can iterate over the entire queue. Looking at LinkedBlockingQueue and ArrayBlockingQueue implementations you do have a side effect. When constructing and operating the Iterator there are three places where full locks are acquired.
During construction
When invoking next()
When invoking remove()
Keep in mind, this is as of Java 8 and can change.
So, yes you do get to iterate safely, but you will effect the performace of puts and offers.
Now for your question, does BlockingQueue offer safe iteration? The answer there is it depends on the implementation. There could be a future BlockingQueue implementation that will throw a UnsupportedOperationException.

Related

ConcurrentHashMap in Java locking mechanism for computeIfPresent

I'm using Java 8 and would like to know if the computeIfPresent operation of the ConcurrentHashMap does lock the whole table/map or just the bin containing the key.
From the documentation of the computeIfPresent method:
Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map
This looks like the whole map is locked when invoking this method for a key. Why does the whole map have to be locked if a value of a certain key is updated? Wouldn't it be better to just lock the bin containing the key/value pair?
Judging by implementation (Oracle JDK 1.8.0_101), just the corresponding bin is locked. This does not contradict the documentation snippet you've cited, since it mentions that some update operations may be blocked, not necessarily all. Of course, it'd be clearer if the docs stated explicitly what gets locked, but that'd be leaking implementation details to what is de facto a part of the interface.
If you look at the source code of ConcurrentHashMap#computeIfPresent, you'll notice that the synchronisation is made directly on the node itself.
So an operation will only block if you attempt to update any node that is being computed. You should not have any problems with other nodes.
From my understanding, the synchronization directly on nodes is actually the major add-on of ConcurrentHashMap vs old Hashtable.
If you look at the source code of hashtables, you'll notice that the synchronization is much wider. A contrario, any synchronization in ConcurrentHashMap happens directly on nodes.
The end of Hashtable documentation also suggest that :
[...] Hashtable is synchronized. If a thread-safe implementation is not
needed, it is recommended to use HashMap in place of Hashtable. If a
thread-safe highly-concurrent implementation is desired, then it is
recommended to use ConcurrentHashMap in place of Hashtable.

Are Java collections safe if read but not modified on multiple threads?

Can I use the standard Collections classes (as opposed to the concurrent ones) as long as I ensure the code makes no data changes on multiple threads. The code that I'm talking about is completely under my control, and I'm not mutating it after the initial (single-threaded) population phase.
I know that some classes such as DateFormat are not threadsafe because they store intermediate states as they are being used. Are the collections (ArrayList, Tree Map, etc.) safe though?
Collections are generally safe for concurrent reading, assuming they are safely published. Apart from that, I'd also recommend the collections are wrapped with the unmodifiable wrappers (such as Collections.unmodifiableList) and that the elements in them are immutable (but you probably already knew this).
Yes. In the Java API docs, each non-threadsafe collection has a warning similar to this one in TreeMap:
Note that this implementation is not synchronized. If multiple threads
access a map concurrently, and at least one of the threads modifies
the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with an existing
key is not a structural modification.)
Emphasis mine. As long as there are zero structural modifications, you should be just fine without external synchronization.
The collections are safe for read and your use case (initialize once then use) is fine.
The thing to be careful of if you try and extend this is that even if only one thread is modifying collections or the objects inside collections then that can have consequences for reader threads.
No, you must check two kinds:
Multiple Threads (as you wrote).
Same Thread in a current, open Iterration.
If you check this two, you are fine to use standard collections.
Regards.

Reorder queue in Java's ThreadPoolExecutor [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Java Executors: how can I set task priority?
I have a ThreadPoolExecutor built using a LinkedBlockingDequeue and I want to manipulate the underlying queue, however reading this in the documentation makes me very nervous.
Queue maintenance
Method getQueue() allows access to the work queue for purposes of monitoring and debugging. Use of this method for any other purpose is strongly discouraged. Two supplied methods, remove(java.lang.Runnable) and purge() are available to assist in storage reclamation when large numbers of queued tasks become cancelled.
Specifically I want to be able to
Check the queue to see if an element already exists. I assume this is fine as no locking should be necessary to just view the elements in the queue.
I want to reorder the queue based on some signal. This can obviously be troublesome. I was wondering if there is a preferred way to do this so that I won't mess up the queue for other uses.
Thanks
getQueue() will always return the exact BlockingQueue<Runnable> that you pass into the ThreadPoolExecutor.
The worry with the documentation is that you could easily run into issues with double-running if you cannot guarantee the thread safety of the BlockingQueue. If you use a PriorityBlockingQueue, and only use remove and add (or, more directly, offer), then you will be safe, and you can even do it directly from the getQueue().
In other words, whenever your signal tells you that some Runnable's priority has changed, then you should remove it and check the result of the remove (true if removed), and only if it was actually removed, then you should re-add it. You are not guaranteed that something won't be picked up inbetween those operations, but you are at least guaranteed that you will not double-run the Runnable, which could easily happen if done with contains -> remove -> add.
Either that, or you can write your own implementation of a BlockingQueue that uses a Comparator (like the PriorityBlockingQueue) that finds the highest priority whenever asked for new data. This sounds like a lot more work given the various interfaces involved.

Java concurrency - use which technique to achieve safety?

I have a list of personId. There are two API calls to update it (add and remove):
public void add(String newPersonName) {
if (personNameIdMap.get(newPersonName) != null) {
myPersonId.add(personNameIdMap.get(newPersonName)
} else {
// get the id from Twitter and add to the list
}
// make an API call to Twitter
}
public void delete(String personNAme) {
if (personNameIdMap.get(newPersonName) != null) {
myPersonId.remove(personNameIdMap.get(newPersonName)
} else {
// wrong person name
}
// make an API call to Twitter
}
I know there can be concurrency problem. I read about 3 solutions:
synchronized the method
use Collections.synchronizedlist()
CopyOnWriteArrayList
I am not sure which one to prefer to prevent the inconsistency.
1) synchronized the method
2) use Collections.synchronizedlist
3) CopyOnWriteArrayList ..
All will work, it's a matter of what kind of performance / features you need.
Method #1 and #2 are blocking methods. If you synchronize the methods, you handle concurrency yourself. If you wrap a list in Collections.synchronizedList, it handles it for you. (IMHO #2 is safer -- just be sure to use it as the docs say, and don't let anything access the raw list that is wrapped inside the synchronizedList.)
CopyOnWriteArrayList is one of those weird things that has use in certain applications. It's a non-blocking quasi-immutable list, namely, if Thread A iterates through the list while Thread B is changing it, Thread A will iterate through a snapshot of the old list. If you need non-blocking performance, and you are rarely writing to the list, but frequently reading from it, then perhaps this is the best one to use.
edit: There are at least two other options:
4) use Vector instead of ArrayList; Vector implements List and is already synchronized. However, it's generally frowned, upon as it's considered an old-school class (was there since Java 1.0!), and should be equivalent to #2.
5) access the List serially from only one thread. If you do this, you're guaranteed not to have any concurrency problems with the List itself. One way to do this is to use Executors.newSingleThreadExecutor and queue up tasks one-by-one to access the list. This moves the resource contention from your list to the ExecutorService; if the tasks are short, it may be fine, but if some are lengthy they may cause others to block longer than desired.
In the end you need to think about concurrency at the application level: thread-safety should be a requirement, and find out how to get the performance you need with the simplest design possible.
On a side note, you're calling personNameIdMap.get(newPersonName) twice in add() and delete(). This suffers from concurrency problems if another thread modifies personNameIdMap between the two calls in each method. You're better off doing
PersonId id = personNameIdMap.get(newPersonName);
if (id != null){
myPersonId.add(id);
}
else
{
// something else
}
Collections.synchronizedList is the easiest to use and probably the best option. It simply wraps the underlying list with synchronized. Note that multi-step operations (eg for loop) still need to be synchronized by you.
Some quick things
Don't synchronize the method unless you really need to - It just locks the entire object until the method completes; hardly a desirable effect
CopyOnWriteArrayList is a very specialized list that most likely you wouldn't want since you have an add method. Its essentially a normal ArrayList but each time something is added the whole array is rebuilt, a very expensive task. Its thread safe, but not really the desired result
Synchronized is the old way of working with threads. Avoid it in favor of new idioms mostly expressed in the java.util.concurrent package.
See 1.
A CopyOnWriteArrayList has fast read and slow writes. If you're making a lot of changes to it, it might start to drag on your performance.
Concurrency isn't about an isolated choice of what mechanism or type to use in a single method. You'll need to think about it from a higher level to understand all of its impacts.
Are you making changes to personNameIdMap within those methods, or any other data structures access to which should also be synchronized? If so, it may be easiest to mark the methods as synchronized; otherwise, you might consider using Collections.synchronizedList to get a synchronized view of myPersonId and then doing all list operations through that synchronized view. Note that you should not manipulate myPersonId directly in this case, but do all accesses solely through the list returned from the Collections.synchronizedList call.
Either way, you have to make sure that there can never be a situation where a read and a write or two writes could occur simultaneously to the same unsynchronized data structure. Data structures documented as thread-safe or returned from Collections.synchronizedList, Collections.synchronizedMap, etc. are exceptions to this rule, so calls to those can be put anywhere. Non-synchronized data structures can still be used safely inside methods declared to be synchronized, however, because such methods are guaranteed by the JVM to never run at the same time, and therefore there could be no concurrent reading / writing.
In your case from the code that you posted, all 3 ways are acceptable. However, there are some specific characteristics:
#3: This should have the same effect as #2 but may run faster or slower depending on the system and workload.
#1: This way is the most flexible. Only with #1 can you make the the add() and delete() methods more complex. For example, if you need to read or write multiple items in the list, then you cannot use #2 or #3, because some other thread can still see the list being half updated.
Java concurrency (multi-threading) :
Concurrency is the ability to run several programs or several parts of a program in parallel. If a time consuming task can be performed asynchronously or in parallel, this improve the throughput and the interactivity of the program.
We can do concurrent programming with Java. By java concurrency we can do parallel programming, immutability, threads, the executor framework (thread pools), futures, callables and the fork-join framework programmings.

synchronize versus Collection.synchronizedList versus CopyOnWriteArrayList

If my requirements dictate that most of my accesses to the list are for reading and modifications if any are going to be minimal, why can't I just do either of the following
synchronize modifyList method and use ArrayList. All reads from arraylist will be unsynchronized
or
inside modifyList, do a Collections.synchronizedList(arrayList)
or
CopyOnWriteArrayList (not sure what it buys here)
Why would I use either ? which is better ?
For 1 & 2, I'm not sure what you're trying to accomplish by only synchronizing writes. If there are potential readers who might be iterating the list, or who are looking things up by index, then only synchronizing writes proves nothing. The readers will still be able to read while writes are in progress and may see dirty data or get exceptions (ConcurrentModification or IndexOutOfBounds.)
You would need to synchronize both your reads and writes if you want 'safe' iterating and getting while other threads make changes. At which point, you may as well have just used a Vector.
CopyOnWriteArrayList is purpose built for what you want to do. It buys safe synchronization-free iterators, while substantially increasing the cost of writes. It also had the advantage of doing what you want (or what it seems you want from the terse question :) ), entirely encapsulated within the JavaSE API, which reduces 'surprise' for future developers.
(do note that if you have multi-step processes involving reads with 'get' even using CopyOnWriteArrayList may not be entirely safe. You need to evaluate what your code actually does and if an interleaving modification would break the method that is getting.)
Another solution would be to use ReentrantReadWriteLock so you can use read-only locks on read operations (which don't block other reads) and a write lock for when you're writing (which will block until there are no reads, and won't allow any read locks until it's released.

Categories

Resources