Which concurrent List for multiple writers one reader? - java

I'm implementing a logging for multiple threads, each will write into a List. After all threads have finished I will then dump the contents of the List into a file. Which concurrent List implementation should I use?
I'm considering the ConcurrentLinkedQueue.
The writing will be concurrent, but the reading will be done by one thread, after all other threads have finished writing.
I could use a List for each thread but then I would have the overhead of managing multiple Lists and I'm not sure it is worth it. Another option would be a synchronized List.
Bonus question: How do I make the last thread dump the list into the file? See Multiple threads arrive, the last should do the processing.

Please - don't.
Java has a logging framework already built in, and it is fully customizable to create different handlers and formatters. If you don't like the standard Java logging framework for whatever reason there are still many others available. All these frameworks handle concurrency.
With the time you are spending on creating a framework that has already existed and perfected for a long time you could have created stuff that added something new to the world.

If you can in fact guarantee that any attempt to read the queue will happen "once all other threads have finished writing" then you do not need a concurrent queue at all.
All you need is a regular, run-of-the-mill, non-concurrent queue, and a factory of "log entry consumers" which dispenses a log entry consumer to each thread that needs to write to the queue.
The use of a consumer hides the queue, so the only thing that threads can do is append to the queue. The implementation of the consumer simply guarantees synchronized access to the insertion point of queue.
The queue then does not need to be synchronized because if it is only to be read "after all other threads have finished writing" then obviously, there will be no concurrent access happening while it is being read.
That having been said:
Are you sure about the "after all other threads have finished writing" part?
How do you define "all other threads have finished writing"?
How do you know that "all other threads have finished writing"?
Why is it important to you that the list is only read after "all other threads have finished writing"?

Related

Java, why need to use synchronization? instead of using a single thread?

While reading about Java synchronized, I just wondered, if the processing should be in synchronization, why not just creating a single thread (not main thread) and process one by one instead of creating multiple threads.
Because, by 'synchronized', all other threads will be just waiting except single running thread. It seems like the only single thread is working in the time.
Please advise me what I'm missing it.
I would very appreciate it if you could give some use cases.
I read an example, that example about accessing bank account from 2 ATM devices. but it makes me more confused, the blocking(Lock) should be done by the Database side, I think. and I think the 'synchronized' would not work in between multiple EC2 instances.
If my thinking is wrong, please fix me.
If all the code you run with several threads is within a synchronized block, then indeed it makes no difference vs. using a single thread.
However in general your code contains parts which can be run on several threads in parallel and parts which can't. The latter need synchronization but not the former. By using several threads you can speed up the "parallelisable" bits.
Let's consider the following use-case :
Your application is a internet browser game. Every player has a score and can click a button. Every time a player clicks the button, their score is increased and their opponent's is decreased. The first player to reach 10 wins.
As per the nature of the game, and to single a unique winner, you have to consider the two counters increase (and the check for the winner) atomically.
You'll have each player send clickEvents on their own thread and every event will be translated into the increase of the owner's counter, the check on whether the counter reached 10 and the decrease of the opponent's counter.
This is very easily done by synchronizing the method which handles modifying the counters : every concurrent thread will try to obtain the lock, and when they do, they'll execute the code (and finally release the lock).
The locking mechanism is pretty lightweight and only requires a single keyword of code.
If we follow your suggestion to implement another thread that will handle the execution, we'd have to implement the whole thread management logic (more code), to initialize that Thread (more resource) and even so, to guarantee fairness in the handling of events, you still need a way for your client threads to pass the event to your executor thread. The only way I see to do so, is to implement a BlockingQueue, which is also synchronized to prevent the race condition that naturally occurs when trying to add elements from two other thread.
I honnestly don't see a way to resolve this very simple use-case without synchronization (or implementing your own locking algorithm that basically does the same).
You can have a single thread and process one-by-one (and this is done), but there are considerable overheads in doing so and it does not remove the need for synchronization.
You are in a situation where you are starting with multiple threads (for example, you have lots of simultaneous web sessions). You want to do a part of the processing in a single thread - let's say updating some common structure with some new data. You need to pass the new data to the single thread - how do you get it there? You would have to use some kind of message queue (or an equivalent thing) and have the single thread pick requests off the message queue and that would have have to be synchronized anyway, plus there is the overhead of managing the queue, plus the issue that you need to get a reply back from the single thread asynchronously. So you are back to square one.
This technique is used where the processing you need to do is considerable and you don't want to block your main threads for a long time.
In summary: having a single thread does not remove the need for synchronization.

Does a 'blocking' queue defeat the very purpose of multi threading

The ArrayBlockingQueue will block the producer thread if the queue is full and it will block the consumer thread if the queue is empty.
Does not this concept of blocking goes against the very idea of multi threading? if I have a 'main' thread and let us say I want to delegate all 'Logging' activities to another thread. So Basically inside my main thread,I create a Runnable to log the output and I put the Runnable on an ArrayBlockingQueue. The whole purpose of doing this is have the 'main' thread return immediately without wasting any time in an expensive logging operation.
But if the queue is full, the main thread will be blocked and will wait until a spot is available. So how does it help us?
The queue doesn't block out of spite, it blocks to introduce an additional quality into the system. In this case, it's prevention of starvation.
Picture a set of threads, one of which produces work units really fast. If the queue were to be allowed unbounded growth, potentially, the "rapid producer" queue could hog all the producing capacity. Sometimes, prevention of such side-effects is more important than having all threads unblocked.
I think this is the designer's decision. If he chose blocking mode ArrayBlockingQueue provides it with put method. If the desiner dont want blocking mode ArrayBlockingQueue has offer method which will return false when queue is full but then he needs to decide what to do with regected logging event.
In your example I would consider blocking to be a feature: It prevents an OutOfMemoryError.
Generally speaking, one of your threads is just not fast enough to cope with the assigned load. So the others must slow down somehow in order not to endanger the whole application.
On the other hand, if the load is balanced, the queue will not block.
Blocking is a necessary function of multithreading. You must block to have synchronized access to data. It does not defeat the purpose of multithreading.
I would suggest throwing an exception when the producer attempts to submit an item to a queue which is full. There are methods to test if the capacity is full beforehand I believe.
This would allow the invoking code to decide how it wants to handle a full queue.
If execution order when processing items from the queue is unimportant, I recommend using a threadpool (known as an ExecutorService in Java).
It depends on the nature of your multi threading philosophy. For those of us who favour Communicating Sequential Processes a blocking queue is nearly perfect. In fact, the ideal would be one where no message can be put into the queue at all unless the receiver is ready to receive it.
So no, I don't think that a blocking queue goes against the very purpose of multi-threading. In fact, the scenario that you describe (the main thread eventually getting stalled) is a good illustration of the major problem with the actor-model of multi-threading; you've no idea whether or not it will deadlock / block, and you can't exhaustively test for it either.
In contrast, imagine a blocking queue that is zero messages deep. That way for the system to work at all you'd have to find a way to ensure that the logger is always guaranteed to be able to receive a message from the main thread. That's CSP. It might mean that in your hypothetical logger thread you have to have application defined buffering (as opposed to some framework developer's best guess of how deep a FIFO should be), a fast I/O subsystem, checks for keeping up, ways of dealing with falling behind, etc. In short it doesn't let you get away with it, you're forced to address every aspect of your system's performance.
That is of course harder, but that way you end up with a system that's definitely OK rather than the questionable "maybe" that you have if your blocking queues are an unknown number of messages deep.
It sounds like you have the general idea right of why you'd use something like an ArrayBlockingQueue to talk between threads.
Having a blocking queue gives you the option to do something different in case something goes wrong with your background worker threads, rather than blindly adding more requests to the queue. If there is room in the queue, there is no blocking.
For your specific use case, though, I would use ExecutorService rather than reading/writing queues directly, which creates a pool of background worker threads:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
pool = Executors.newFixedThreadPool(poolSize);
pool.submit(myRunnable);
A multithreaded program is non-deterministic insofar as you can't say beforehand: n producer actions will take exactly as long as m consumer actions. Therefore, synchronization between n producers and m consumers is necessary in every case.
You'll want to choose the queue size so that the number of active producers and consumers is maximized most of the time. But the thread model of java does not guarantee that any consumer will run unless it is the only unblocked thread. (Yet, of course, on multi-core CPUs it is very likely that the consumer will run).
You have to make a choice about what to do when a Queue is full. In the case of an Array Blocking queue, that choice is to wait.
Another option would be to just throw away new Objects if the queue was full; you can achieve this with offer.
You have to make a trade-off.

Java, blocked queues

I've been reading about blocking queues and certain questions appeared. All the examples that i've read demonstrated only situations where there are only one consumer and one producer thread. The question is: suppose we have 1 producer and 3 consumers and in the current moment all consumers are called take() method but the queue is empty so they are all waiting for appearing first element. Which of the consumer threads will take the first element when it will appear? The consumer thread which called take() first?
I don't know if you can tell. The real question is: why do you need to know? All listeners should be equivalent. It should not matter which one handles a request. If you have to know, you designed and implemented it incorrectly.
check ArrayBlockingQueue(int capacity, boolean fair) if fair is true,then the queue accesses for threads blocked on insertion or removal, are processed in FIFO order.
Which of the consumer threads will take the first element when it will appear? The consumer thread which called take() first?
This is tied to the blocking queue implementation as well as the JVM in question but the short answer is most likely yes. Each of the threads will be waiting on a condition and the first thread in the wait queue will be awoken when the condition is signaled.
That said, you should not depend on this functionality since it is very dependent on the particulars of the blocking queue in question as well as the JVM and OS version.
I agree with duffymo, the idea of having multiple threads waiting indefinitely for some new elements to pop up in the queue does not sound very well structured.
Also, if you need to know which one of the consumers remove the element, that makes me think that the consumers are actually doing different things, giving life to different ouputs on different scenarios, depending on the order with which the consumers perform the take(). If that is the case you might want to have different queues for the different threads.
If you are not planning to change your code, what about having the threads to perform a poll on regular basis?

How to access the underlying queue of a ThreadpoolExecutor in a thread safe way

The getQueue() method provides access to the underlying blocking queue in the ThreadPoolExecutor, but this does not seem to be safe.
A traversal over the queue returned by this function might miss updates made to the queue by the ThreadPoolExecutor.
"Method getQueue() allows access to the work queue for purposes of monitoring and debugging. Use of this method for any other purpose is strongly discouraged."
What would you do if you wanted to traverse the workQueue used by the ThreadPoolExecutor? Or is there an alternate approach?
This is a continuation of..
Choosing a data structure for a variant of producer consumer problem
Now, I am trying the multiple producer multiple consumer, but I want to use some existing threadpool, since I don't want to manage the threadpool myself, and also I want a callback when ThreadPoolExecutor has finished executing some task alongwith the ability to examine in a thread safe way the "inprogress transactions" data structure.
You can override the beforeExecute and afterExecute methods to let you know that a task has started and finished. You can override execute() to know when a task is added.
The problem you have is that the Queue is not designed to be queried and a task can be consumed before you see it. One way around this is to create you own implementation of a Queue (perhaps overriding/wrapping a ConcurrentLinkedQueue)
BTW: The queue is thread-safe, however it is not guaranteed you will see every entry.
A ConcurrentLinkedQueue.iterator() is documented as
Returns an iterator over the elements in this queue in proper sequence. The returned iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.
If you wish to copy the items in the queue and ensure that what you have in the queue has not been executed, you might try this:
a) Introduce the ability to pause and resume execution. See: http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
b) first pause the queue, then copy the queue, then resume the queue.
And then i have my own question. The problem i see is that while you execute your "Runnable", that "Runnable" is not placed in the queue, but a FutureTask "wrapper", and i cannot find any way to determine just which one of my runnables i'm looking at. So, grabbing and examining the queue is pretty useless. Does anybody know aht i missed there?
If you are following Jon Skeet's advice in your accepted answer from your previous question, then you'll be controlling access to your queues via locks. If you acquire a lock on the in-progress queue then you can guarantee that a traversal will not miss any items in it.
The problem with this of course is that while you are doing the traverse all other operations on the queue (other producers and consumers trying to access it) will block, which could have a pretty dire effect on performance.

What collection supports multiple simultaneous insertions?

We are developing a Java application with several worker threads. These threads will have to deliver a lot of computation results to our UI thread. The order in which the results are delivered does not matter.
Right now, all threads simply push their results onto a synchronized Stack - but this means that every thread must wait for the other threads before results can be delivered.
Is there a data structure that supports simultaneous insertions with each insertion completing in constant time?
Thanks,
Martin
ConcurrentLinkedQueue is designed for high contention. Producers enqueue stuff on one end and consumers collect elements at the other end, so everything will be processed in the order it's added.
ArrayBlockingQueue is a better for lower contention, with lower space overhead.
Edit: Although that's not what you asked for. Simultaneuos inserts? You may want to give every thread one output queue (say, an ArrayBlockingQueue) and then have the UI thread poll the separate queues. However, I'd think you'll find one of the two above Queue implementations sufficient.
Right now, all threads simply push
their results onto a synchronized
Stack - but this means that every
thread must wait for the other threads
before results can be delivered.
Do you have any evidence indicating that this is actually a problem? If the computation performed by those threads is even the least little bit complex (and you don't have literally millions of threads), then lock contention on the result stack is simply a non-issue because when any given thread delivers its results, all others are most likely busy doing their computations.
Take a step back and evaluate whether performance is the key design consideration here. Don't think, know: does profiling back it up?
If not, I'd say a bigger concern is clarity and readability of design, and not introducing new code to maintain. It just so happens that, if you're using Swing, there is a library for doing exactly what you're trying to do, called SwingWorker.
Take a look at java.util.concurrent.ConcurrentLinkedQueue, java.util.concurrent.ConcurrentHashMap or java.util.concurrent.ConcurrentSkipListSet. They might do what you need. ConcurrentSkipListSet, for instance, claims to have "expected average log(n) time cost for the contains, add and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads."
Two other patterns you might want to look at are
each thread has its own collection, when polled it returns the collection and creates a new one, so the collection only holds the pending items between polls. The thread needs to protect operations on its collection, but there is no contention between threads. This is blocking (each thread cannot add to its collection while the UI thread pulls updates from it), but can reduce contention (no contention between threads).
each thread has its own collection, and appends the results to a common queue which is protected using a Lock.tryLock(). The thread continues processing if it fails to acquire the lock. This makes it less likely that a thread will block waiting for the shared queue.

Categories

Resources