I have a bit of an issue with an application running multiple Java threads.
The application runs a number of working threads that peek continuously at an input queue and if there are messages in the queue they pull them out and process them.
Among those working threads there is another verification thread scheduled to perform at a fixed period a check to see if the host (on which the application runs) is still in "good shape" to run the application. This thread updates an AtomicBoolean value which in turn is verified by the working thread before they start peeking to see if the host is OK.
My problem is that in cases with high CPU load the thread responsible with the verification will take longer because it has to compete with all the other threads. If the AtomicBoolean does not get updated after a certain period it is automatically set to false, causing me a nasty bottleneck.
My initial approach was to increase the priority of the verification thread, but digging into it deeper I found that this is not a guaranteed behavior and an algorithm shouldn't rely on thread priority to function correctly.
Anyone got any alternative ideas? Thanks!
Instead of peeking into a regular queue data structure, use the java.util.concurrent package's LinkedBlockingQueue.
What you can do is, run an pool of threads (you could use executer service's fixed thread pool, i.e., a number of workers of your choice) and do LinkedBlockingQueue.take().
If a message arrives at the queue, it is fed to one of the waiting threads (yeah, take does block the thread until there is something to be fed with).
Java API Reference for Linked Blocking Queue's take method
HTH.
One old school approach to throttling rate of work, that does not use a health check thread at all (and so by-passes these problems) is to block or reject requests to add to the queue if the queue is longer than say 100. This applies dynamic back pressure on to the clients generating the load, slowing them down when the worker threads are over loaded.
This approach was added to the Java 1.5 library, see java.util.concurrent.ArrayBlockingQueue. Its put(o) method blocks if the queue is full.
Are u using Executor framework (from Java's concurrency package)? If not give it a shot. You could try using ScheduledExecutorService for the verification thread.
More threads does not mean better performance. Usually if you have dual core, 2 threads gives best performance, 3 or more starts getting worse. Quad core should handle 4 threads best, etc. So be careful how much threads you use.
You can put the other threads to sleep after they perform their work, and allow other threads to do their part. I believe Thread.yield() will pause the current thread to give time to other threads.
If you want your thread to run continuously, I would suggest creating two main threads, thread A and B. Use A for the verification thread, and from B, create the other threads. Therefore thread A gets more execution time.
Seems you need to utilize Condition variables. Peeking will take cpu cycles.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/Condition.html
Related
I'm trying to run a number of jobs concurrently using Java's ForkJoinPool. The main task (which is already running in the pool) spawns all the jobs and then does a series of joins. I was sure that a task calling join would free the thread it is running in, but it seems like it is actually blocked on it, and therefore it is "wasting" the thread, i.e., since the number of threads equals the number of CPU cores, one core will be inactive.
I know that if I run invokeAll instead, then the first of the sub-jobs gets to run in the same thread, and indeed this works. However, this seems sub-optimal, because if the first task is actually a very fast one, i have the same problem. One of the threads is blocked waiting on join. There are more jobs than threads, so I would rather another one of the jobs gets started.
I can try and bypass all this manually but its not so nice, and it seems like I am redoing what ForkJoinPool is supposed to do.
So the question is: Am I understanding ForkJoinPool wrong? or if what I'm saying is correct, then is there simple way to utilize the threads more efficiently?
ForkJoinPool is designed to prevent you having to think about thread utilization in this way. The 'work stealing' algorithm ensures that each thread is always busy so long as there are tasks in the queue.
Check out these notes for a high-level discussion:
https://www.dre.vanderbilt.edu/~schmidt/cs891f/2018-PDFs/L4-ForkJoinPool-pt3.pdf
To see the ugly details go down the rabbit hole of the ForkJoinPool#awaitJoin source.
Roughly, if I'm reading the (very complex) code correctly: When a thread joins a sub-task, it attempts to complete that task itself, otherwise if the sub-task's worker queue is non-empty (i.e. it is also depending on other tasks), the joining thread repeatedly attempts to complete one of those tasks, via ForkJoinPool#tryHelpStealer, whose Javadoc entry provides some insight:
Tries to locate and execute tasks for a stealer of the given
task, or in turn one of its stealers, Traces currentSteal ->
currentJoin links looking for a thread working on a descendant
of the given task and with a non-empty queue to steal back and
execute tasks from. The first call to this method upon a
waiting join will often entail scanning/search, (which is OK
because the joiner has nothing better to do), but this method
leaves hints in workers to speed up subsequent calls. The
implementation is very branchy to cope with potential
inconsistencies or loops encountering chains that are stale,
unknown, or so long that they are likely cyclic.
Notice that ForkJoinTask does not extend Thread, so 'blocking' of the join operation means something different here than usual. It doesn't mean that the underlying thread is in a blocked state, rather it means that the computation of the current task is held up further up the call stack while join goes off and attempts to resolve the tree of sub-tasks impeding progress.
I am developing a Java application that has two threads:
A producer thread that feeds an ArrayBlockingQueue at a frequency of 10 KHz (It is really a C code through JNI).
A consumer thread that takes data from the queue, using take method, and then process it (you can't assume the processing time is always the same). Due to I am using take method, this thread can be blocked if no data is available in the queue.
I would like to know how can I monitor or profiling the consumer thread to know how many time it is waiting or blocked.
I am not interested in answers such as taking times with System.currentTimeMillis() and taking differences. I want to know how to analyze the whole thread life and sum up how many time has been in every thread state, if this is possible.
How do you do this kind of monitoring?
Thanks in advance!
Any decent Java Profiler can separate statistics by thread, even the otherwise rather basic JVisualVM included with the JDK. Here's a screenshot of JVisualVM watching itself:
The same information can also be displayed in a table:
The ArrayBlockingQueue will block the producer thread if the queue is full and it will block the consumer thread if the queue is empty.
Does not this concept of blocking goes against the very idea of multi threading? if I have a 'main' thread and let us say I want to delegate all 'Logging' activities to another thread. So Basically inside my main thread,I create a Runnable to log the output and I put the Runnable on an ArrayBlockingQueue. The whole purpose of doing this is have the 'main' thread return immediately without wasting any time in an expensive logging operation.
But if the queue is full, the main thread will be blocked and will wait until a spot is available. So how does it help us?
The queue doesn't block out of spite, it blocks to introduce an additional quality into the system. In this case, it's prevention of starvation.
Picture a set of threads, one of which produces work units really fast. If the queue were to be allowed unbounded growth, potentially, the "rapid producer" queue could hog all the producing capacity. Sometimes, prevention of such side-effects is more important than having all threads unblocked.
I think this is the designer's decision. If he chose blocking mode ArrayBlockingQueue provides it with put method. If the desiner dont want blocking mode ArrayBlockingQueue has offer method which will return false when queue is full but then he needs to decide what to do with regected logging event.
In your example I would consider blocking to be a feature: It prevents an OutOfMemoryError.
Generally speaking, one of your threads is just not fast enough to cope with the assigned load. So the others must slow down somehow in order not to endanger the whole application.
On the other hand, if the load is balanced, the queue will not block.
Blocking is a necessary function of multithreading. You must block to have synchronized access to data. It does not defeat the purpose of multithreading.
I would suggest throwing an exception when the producer attempts to submit an item to a queue which is full. There are methods to test if the capacity is full beforehand I believe.
This would allow the invoking code to decide how it wants to handle a full queue.
If execution order when processing items from the queue is unimportant, I recommend using a threadpool (known as an ExecutorService in Java).
It depends on the nature of your multi threading philosophy. For those of us who favour Communicating Sequential Processes a blocking queue is nearly perfect. In fact, the ideal would be one where no message can be put into the queue at all unless the receiver is ready to receive it.
So no, I don't think that a blocking queue goes against the very purpose of multi-threading. In fact, the scenario that you describe (the main thread eventually getting stalled) is a good illustration of the major problem with the actor-model of multi-threading; you've no idea whether or not it will deadlock / block, and you can't exhaustively test for it either.
In contrast, imagine a blocking queue that is zero messages deep. That way for the system to work at all you'd have to find a way to ensure that the logger is always guaranteed to be able to receive a message from the main thread. That's CSP. It might mean that in your hypothetical logger thread you have to have application defined buffering (as opposed to some framework developer's best guess of how deep a FIFO should be), a fast I/O subsystem, checks for keeping up, ways of dealing with falling behind, etc. In short it doesn't let you get away with it, you're forced to address every aspect of your system's performance.
That is of course harder, but that way you end up with a system that's definitely OK rather than the questionable "maybe" that you have if your blocking queues are an unknown number of messages deep.
It sounds like you have the general idea right of why you'd use something like an ArrayBlockingQueue to talk between threads.
Having a blocking queue gives you the option to do something different in case something goes wrong with your background worker threads, rather than blindly adding more requests to the queue. If there is room in the queue, there is no blocking.
For your specific use case, though, I would use ExecutorService rather than reading/writing queues directly, which creates a pool of background worker threads:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
pool = Executors.newFixedThreadPool(poolSize);
pool.submit(myRunnable);
A multithreaded program is non-deterministic insofar as you can't say beforehand: n producer actions will take exactly as long as m consumer actions. Therefore, synchronization between n producers and m consumers is necessary in every case.
You'll want to choose the queue size so that the number of active producers and consumers is maximized most of the time. But the thread model of java does not guarantee that any consumer will run unless it is the only unblocked thread. (Yet, of course, on multi-core CPUs it is very likely that the consumer will run).
You have to make a choice about what to do when a Queue is full. In the case of an Array Blocking queue, that choice is to wait.
Another option would be to just throw away new Objects if the queue was full; you can achieve this with offer.
You have to make a trade-off.
I am running a genome assembly program *Trinity, http://trinityrnaseq.sourceforge.net/, if interested) on one of the XSEDE resources. The hardware limits the number of threads to 2500, which the program always wants to exceed... It there an easy way to limit the number of threads executed? I have tried -XX:ParallelGCThreads=16, but this seems to introduce new errors.
So, is there a runtime command to limit the total number of threads??
Use an Executor or ExecutorService. Does what bragboy suggests but it's built in to Java.
You can use a custom queue that runs as a separate process that handles the number of threads limitation. The advantage of this is that you can opt to either limit the threads or you can still keep adding the number of threads. You will probably have a addToQueue(Thread t) class and subsequently a consumer consuming all these threads. The Queue will know how many threads are actively running. The daemon process will fire the consume() method of this queue at will if the threads are well within the range. And after every thread finishes or compeltes it job, it reports back to the queue. The Queue that you maintain can be a priority queue if you feel there should be a priority on the running tasks. This not only removes the dependency on the JVM but also makes your program look cleaner.
I need to make a program with a limited amount of threads (currently using newFixedThreadPool) but I have the problem that all threads get created from start, filling up memory at alarming rate.
I wish to prevent this. Threads should only be created shortly before they are executed.
e.g.: I call the program and instruct it to use 2 threads in the pool. The program should create & launch the first 2 Threads immediately (obviously), create the next 2 to wait for the previous 2, and at that point wait until one or both of the first 2 ended executing.
I thought about extending executor or FixedThreadPool or such. However I have no clue on how to start there and doubt it is the best solution. Easiest would have my main Thread sleeping on intervals, which is not really good either...
Thanks in advance!
Have you tried taking a look at ThreadPoolExecutor ? Using the right constructor parameters, you could easily tweak the number and keep-alive time of the created threads.
Looking at the details in your post...
I call the program and instruct it to use 2 threads in the pool. The program should create & launch the first 2 Threads immediately (obviously), create the next 2 to wait for the previous 2, and at that point wait until one or both of the first 2 ended executing.
Your problem is much more about synchronizing tasks execution than in fact pooling threads. From what you say here, you want to have 2 threads executing any number of tasks; if you don't want to have 100 jobs running at the same time, don't create a 100 threads pool...
I would suggest either using a BlockingQueue to control your Runnables, or create a 2 threads pool using a ThreadPoolExecutor, and feed it all your tasks. It will execute them when threads are available.
Does that make sense with what you try to achieve here?
I don't think you should manipulate the thread pool implementation. If you create the threads shortly before execution, you lose the main benefit of the pool, that recycles your threads.
Maybe you should reduce the maximum number of threads in the pool. If you instruct the pool to create too many of them, the total out-of-heap memory used for their stack spaces will consume all available memory. I assume that this is the kind of OutOfMemoryError you have (?).
If you're looking at this from a performance perspective, then it's best to take the hit in memory when you first start up the application than constantly get bombarded with allocating and deallocating memory while the program is running.
If it's using too much memory when you start the application, then it will still be too much memory later. You should throttle down the size of the thread pool.
There are additional benefits to using a thread pool, such as if you lose a thread along the way, the thread pool will automatically create a new one to replace it, keeping your thread pool at a constant size.
If this isn't the type of benefit that you're looking for, then you may wish to handle the threads in memory manually, and avoid the thread pool.