Letting a thread wait vs stopping and starting - java

I have a consumer thread blocking on removing from a queue.
There are going to be periods during which I know nothing will be added to the queue.
My question is: is it worth adding the complexity of managing when to start/stop the thread, or should I just leave it waiting until queue starts getting elements again?

If the concurrent queue implementation that you're using is worth it's salt then the thread will not be busy-waiting for very long. Some implementations may do this briefly for performance reasons but after that then it will block and will not be consuming CPU cycles. Therefore the difference between a stopped thread and a blocked thread becomes more or less meaningless.
Use a concurrent queue. See Which concurrent Queue implementation should I use in Java?

When dealing with Multithreading its a best practice to just act when you have a performance problem. Otherwise I would just leave it like it is to avoid trouble.

I dont think there is a big impact on the performance since the thread is blocked (inactive waiting). It could make sense if the thread is holding expensive resources which can be released for that time. I would keep this as simple as possible, especially in a concurrent enviroment complexity can lead to strange errors.

Related

Why prefer wait/notify to while cycle?

I have some misunderstanding with advantages of wait/notify. As i understand processor core will do nothing helpful in both cases so what's the reason tro write complex wait/notify block codes instead of just waiting in cycle?
I'm clear that wait/notify will not steal processor time in case when two threads are executed on only one core.
"Waiting in a cycle" is most commonly referred to as a "busy loop" or "busy wait":
while ( ! condition()) {
// do nothing
}
workThatDependsOnConditionBeingTrue();
This is very disrespectful of other threads or processes that may need CPU time (it takes 100% time from that core if it can). So there is another variant:
while ( ! condition()) {
sleepForShortInterval();
// do nothing
}
workThatDependsOnConditionBeingTrue();
The small sleep in this variant will drop CPU usage dramatically, even if it is ~100ms long, which should not be noticeable unless your application is real-time.
Note that there will generally be a delay between when the condition actually becomes true and when sleepForShortInterval() ends. If, to be more polite to others, you sleep longer -- the delay will increase. This is generally unacceptable in real-time scenarios.
The nice way to do this, assuming that whatever condition() is checking is being changed from another thread, is to have the other thread wake you up when it finishes whatever you are waiting for. Cleaner code, no wasted CPU, and no delays.
Of course, it's quicker to implement a busy wait, and it may justified for quick'n'dirty situations.
Beware that, in a multithreaded scenario where condition() can be changed to false as well as true, you will need to protect your code between the while and the workThatDependsOnConditionBeingTrue() to avoid other threads changing its value in this precise point of time (this is called a race codition, and is very hard to debug after the fact).
I think you answered your question almost by saying
I'm clear that wait/notify will not steal processor time in case.
Only thing I would add is, this true irrespective of one core or multi-core. wait/notify wont keep the cpu in a busy-wait situation compared to while loop or periodic check.
what's the reason not to run core but wait? There's no helpful work in any case and you're unable to use core when it's in waiting state.
I think you are looking at it from a single application perspective where there is only one application with one thread is running. Think of it from a real world application (like web/app servers or standalone) where there are many threads running and competing for cpu cycles - you can see the advantage of wait/notify. You would definitely not want even a single thread to just do a busy-wait and burn the cpu cycles.
Even if it a single application/thread running on the system there are always OS process running and its related processes that keep competing for the CPU cycles. You don't want them to starve them because the application is doing a while busy-wait.
Quoting from Gordon's comment
waiting in cycle as you suggest you are constantly checking whether the thing you are waiting for has finished, which is wasteful and if you use sleeps you are just guessing with timing, whereas with wait/notify you sit idle until the process that you are waiting on tells you it is finished.
In general, your application is not the only one running on the CPU. Using non-spinning waiting is, first of all, an act of courtesy towards the other processes/threads which are competing for the CPU in order to do some useful job. The CPU scheduler cannot know a-priori if your thread is going to do something useful or just spin on a false flag. So, it can't tune itself based on that, unless you tell it you don't want to be run, because there's nothing for you to do.
Indeed, busy-waiting is faster than getting the thread to sleep, and that's why usually the wait() method is implemented in a hybrid way. It first spins for a while, and then it actually goes to sleep.
Besides, it's not just waiting in a loop. You still need to synchronize access to the resources you're spinning on. Otherwise, you'll fall victim of race conditions.
If you feel the need of a simpler interface, you might also consider using CyclicBarrier, CountDownLatch or a SynchronousQueue.

Are there any disadvantages of using a thread pool?

I know the thread pool is a good thing because it can reuse threads and thus save the cost of creating new threads. But my question is, are there any disadvantages of using a thread pool? In which situation is using a thread pool not as good as using just individual threads?
In which situation is using a thread pool not as good as using just individual threads?
The only time I can think of is when you have a single thread that only needs to do a single task for the life of your program. Something like a background thread attached to a permanent cache or something. That's about the only time I fork a thread directly as opposed to using an ExecutorService. Even then, using a Executor.newSingleThreadExecutor() would be fine. The overhead of the thread-pool itself is maybe a bit more logic and some memory but very hard to see a pressing downside.
Certainly anytime you need multiple threads to perform tasks, a thread-pool is warranted. What the ExecutorService code does is reduce the amount of code you need to write to manage the threads. The improvements in readability and code maintainability is a big win.
Threadpool is suitable only when you use it for operations that takes less time to complete. Threadpool threads are not suitable for long running operations, as it can easily lead to thread starvation.
If you require your thread to have a specific priority, then threadpool thread is not suitable.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large number of blocked thread pool threads might prevent tasks from starting.
You've got a bunch of different answers here. I think one reason for that is the question is incomplete. You are asking for "disadvantages of using a thread pool," but you didn't say, disadvantages compared to what?
A thread pool solves a particular problem. There are other problems where "thread" or "threads" is part of the solution, but "thread pool" is not. "Thread pool" usually is the answer, when the question is, how to achieve parallel execution of many, short-lived, CPU-intensive tasks, on a multi-processor system.
Threads are useful, even on a uni-processor, for other purposes. The first question I ask about any long-running thread, for example, is "what does it wait for." Threads are an excellent tool for organizing a program that has to wait for different kinds of event. You would not use a thread pool for that, though.
In addition to Gray's answer.
Other use-case is if you are using thread local or using thread as a key of some kind of hash table or stateful custom implementation of thread. In this case you have to care about cleaning the state when particular task finished using the thread even if it failed. Otherwise some surprises are possible: next task that uses thread that has some state can start functioning wrong.
Thread pools of limited size are dangerous if the tasks running on it exchange information via blocking queues - this may cause a thread starvation: What is starvation?. Good rule is to never use blocking operation in the tasks running on a thread pool.
Theads are better when you don't plan to stop using the thread. For instance in an infinite loop. Threadpools are best when doing many tasks that don't happen all at the same time. Especially when the tasks are short the overhead and clarity of using the same thread is bigger.
It depends on the situation you are going to utilize the thread pool. For example, if your system does not need to perform tasks in parallel, a threading pool would be in no use. It would keep unnecessary threads ready for a work that will never come. In such cases you can use a SingleThreadExecutor anyway. Check this link if you haven't it may clarify you about it: Thread Pool Pattern

Does a 'blocking' queue defeat the very purpose of multi threading

The ArrayBlockingQueue will block the producer thread if the queue is full and it will block the consumer thread if the queue is empty.
Does not this concept of blocking goes against the very idea of multi threading? if I have a 'main' thread and let us say I want to delegate all 'Logging' activities to another thread. So Basically inside my main thread,I create a Runnable to log the output and I put the Runnable on an ArrayBlockingQueue. The whole purpose of doing this is have the 'main' thread return immediately without wasting any time in an expensive logging operation.
But if the queue is full, the main thread will be blocked and will wait until a spot is available. So how does it help us?
The queue doesn't block out of spite, it blocks to introduce an additional quality into the system. In this case, it's prevention of starvation.
Picture a set of threads, one of which produces work units really fast. If the queue were to be allowed unbounded growth, potentially, the "rapid producer" queue could hog all the producing capacity. Sometimes, prevention of such side-effects is more important than having all threads unblocked.
I think this is the designer's decision. If he chose blocking mode ArrayBlockingQueue provides it with put method. If the desiner dont want blocking mode ArrayBlockingQueue has offer method which will return false when queue is full but then he needs to decide what to do with regected logging event.
In your example I would consider blocking to be a feature: It prevents an OutOfMemoryError.
Generally speaking, one of your threads is just not fast enough to cope with the assigned load. So the others must slow down somehow in order not to endanger the whole application.
On the other hand, if the load is balanced, the queue will not block.
Blocking is a necessary function of multithreading. You must block to have synchronized access to data. It does not defeat the purpose of multithreading.
I would suggest throwing an exception when the producer attempts to submit an item to a queue which is full. There are methods to test if the capacity is full beforehand I believe.
This would allow the invoking code to decide how it wants to handle a full queue.
If execution order when processing items from the queue is unimportant, I recommend using a threadpool (known as an ExecutorService in Java).
It depends on the nature of your multi threading philosophy. For those of us who favour Communicating Sequential Processes a blocking queue is nearly perfect. In fact, the ideal would be one where no message can be put into the queue at all unless the receiver is ready to receive it.
So no, I don't think that a blocking queue goes against the very purpose of multi-threading. In fact, the scenario that you describe (the main thread eventually getting stalled) is a good illustration of the major problem with the actor-model of multi-threading; you've no idea whether or not it will deadlock / block, and you can't exhaustively test for it either.
In contrast, imagine a blocking queue that is zero messages deep. That way for the system to work at all you'd have to find a way to ensure that the logger is always guaranteed to be able to receive a message from the main thread. That's CSP. It might mean that in your hypothetical logger thread you have to have application defined buffering (as opposed to some framework developer's best guess of how deep a FIFO should be), a fast I/O subsystem, checks for keeping up, ways of dealing with falling behind, etc. In short it doesn't let you get away with it, you're forced to address every aspect of your system's performance.
That is of course harder, but that way you end up with a system that's definitely OK rather than the questionable "maybe" that you have if your blocking queues are an unknown number of messages deep.
It sounds like you have the general idea right of why you'd use something like an ArrayBlockingQueue to talk between threads.
Having a blocking queue gives you the option to do something different in case something goes wrong with your background worker threads, rather than blindly adding more requests to the queue. If there is room in the queue, there is no blocking.
For your specific use case, though, I would use ExecutorService rather than reading/writing queues directly, which creates a pool of background worker threads:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
pool = Executors.newFixedThreadPool(poolSize);
pool.submit(myRunnable);
A multithreaded program is non-deterministic insofar as you can't say beforehand: n producer actions will take exactly as long as m consumer actions. Therefore, synchronization between n producers and m consumers is necessary in every case.
You'll want to choose the queue size so that the number of active producers and consumers is maximized most of the time. But the thread model of java does not guarantee that any consumer will run unless it is the only unblocked thread. (Yet, of course, on multi-core CPUs it is very likely that the consumer will run).
You have to make a choice about what to do when a Queue is full. In the case of an Array Blocking queue, that choice is to wait.
Another option would be to just throw away new Objects if the queue was full; you can achieve this with offer.
You have to make a trade-off.

How to avoid both deadlocks and using too many threads?

Using Executors.newFixedThreadPool(int nThreads) is a nice way to minimize the overhead of creating too many threads, but it may lead to a deadlock in case that all threads are waiting for another job which itself is waiting for a free thread from the pool. Sometimes the problem can be solved by using multiple thread pools, but sometimes it can't. I'm looking for something behaving similar to newFixedThreadPool except in case that all pooled threads are blocked - in such a case the pool should grow despite its predefined bound. Is there something like this?
Actually, the deadlock is not that important here. The real problem is "how to manage the number of running threads" rather than their total number. This can be also interesting when trying to keep the CPU fully utilized without creating needlessly many threads.
If you have a contention issue, this is a design problem. If you want a quick fix as you described, you will only be curing the symptoms, not the underlying sickness.
You should instead refactor your design to eliminate deadlock using some other means.
It's generally a bad idea to have threads in a pool blocked waiting for other threads in the same thread pool.
I would try to change the design to a non-blocking one. If a thread needs the result of another operation that is being processed by the same executor I would have it submit a task back to the executor to run after the second operation completes. Or place an object into a queue to be picked up later when the other job finishes.
Alternatively you can do what Swing does with modal dialogs and have the thread that is about to block start up a child thread to keep processing requests until the parent thread unblocks. This is tricky to get right though and would require you to manually manage the threads which is a lot less safe than using an Executor.
Executor.newCachedThreadPool(); A cached thread pool will check to see if there are any available threads. If there is, the thread pool will re use the thread. If it isnt, the thread pool will create a new thread. The threads time to live is 60 seconds, so after 60 seconds the extra threads will be terminated.

How expensive is synchronization?

I am writing a networking application using the java.nio api. My plan is to perform I/O on one thread, and handle events on another. To do this though, I need to synchronize reading/writing so that a race condition is never met.
Bearing in mind that I need to handle thousands of connections concurrently, is synchronization worth it, or should I use a single thread for I/O and event handling?
What kind of event handling are you doing? Where is the likely bottleneck? Do you even have a bottleneck?
Start with the simplest implementation and optimize away bottlenecks once you know them.
If you find that your network IO thread isn't reading fast enough because it spends too much time event handling, then make a buffer queue, synchronize to that and have an event handling thread work through the queue.
You might want to set a limit on the size of the queue so you don't end up running out of memory though. If the network thread is about to overfill the queue, have it wait until there's more space.
Premature optimization isn't fun for anyone.
However, to answer your question, synchronization between two threads is not likely going to be a bottleneck and you shouldn't worry about its overhead.
I think the efficiency of it will depend on how granular the synchronized section is.

Categories

Resources