ArrayBlockingQueue with priority waiting list - java

I currently have a Spring dispatcher ensuring various concurrency limitation policies based on bounded queues.
Basically, multiple request types are handled, some memory expensive, other less, and the request threads happening to hit the memory expensive tasks put a token in a bounded blocking queue (ArrayBlockingQueue), so that only N of them end up actually running, while the other end up waiting.
Now, the waiting list is internally managed by a ReentrantLock, which in turns leverages a Condition implementation fund in AbstractQueuedLongSynchronizer that uses a linked list, which notifies the longest waiting thread when a token is removed from the queue.
Now I need a different behavior, so that the list maintained by the Condition is sorted by a user defined priority too (straight one, no counter-starvation measures needed for lower priority requests).
Unfortunately the classes in question have a wall of "final" declarations making it hard to inject this seemingly small behavioral change.
Is there any concurrent data structure out there providing the behavior I'm looking for, or that would allow customization?
Alternatively, suggestions to implement it without rewriting ArrayBlockinQueue/ReentrantLock/Condition from scratch?
Note: really looking for a bounded blocking queue with priority in the waiting list, other approaches requiring a redesign of the whole application, secondary execution thread pools and the like are unfortunately not feasible (time and material limitations)

Related

Project loom: what makes the performance better when using virtual threads?

To give some context here, I have been following Project Loom for some time now. I have read The state of Loom. I have done asynchronous programming.
Asynchronous programming (provided by Java NIO) returns the thread to the thread pool when the task waits and it goes to great lengths to not block threads. And this gives a large performance gain, we can now handle many more request as they are not directly bound by the number of OS threads. But what we lose here, is the context. The same task is now NOT associated with just one thread. All the context is lost once we dissociate tasks from threads. Exception traces do not provide very useful information and debugging is difficult.
In comes Project Loom with virtual threads that become the single unit of concurrency. And now you can perform a single task on a single virtual thread.
It's all fine until now, but the article goes on to state, with Project Loom:
A simple, synchronous web server will be able to handle many more requests without requiring more hardware.
I don't understand how we get performance benefits with Project Loom over asynchronous APIs? The asynchrounous API:s make sure to not keep any thread idle. So, what does Project Loom do to make it more efficient and performant that asynchronous API:s?
EDIT
Let me re-phrase the question. Let's say we have an http server that takes in requests and does some crud operations with a backing persistent database. Say, this http server handles a lot of requests - 100K RPM. Two ways of implementing this:
The HTTP server has a dedicated pool of threads. When a request comes in, a thread carries the task up until it reaches the DB, wherein the task has to wait for the response from DB. At this point, the thread is returned to the thread pool and goes on to do the other tasks. When DB responds, it is again handled by some thread from the thread pool and it returns an HTTP response.
The HTTP server just spawns virtual threads for every request. If there is an IO, the virtual thread just waits for the task to complete. And then returns the HTTP Response. Basically, there is no pooling business going on for the virtual threads.
Given that the hardware and the throughput remain the same, would any one solution fare better than the other in terms of response times or handling more throughput?
My guess is that there would not be any difference w.r.t performance.
We don't get benefit over asynchronous API. What we potentially will get is performance similar to asynchronous, but with synchronous code.
The answer by #talex puts it crisply. Adding further to it.
Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.
With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.
With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.
When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.
We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.
Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.
This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.
When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?
One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.
This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.
With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.
The http server has a dedicated pool of threads ....
How big of a pool? (Number of CPUs)*N + C? N>1 one can fall back to anti-scaling, as lock contention extends latency; where as N=1 can under-utilize available bandwidth. There is a good analysis here.
The http server just spawns...
That would be a very naive implementation of this concept. A more realistic one would strive for collecting from a dynamic pool which kept one real thread for every blocked system call + one for every real CPU. At least that is what the folks behind Go came up with.
The crux is to keep the {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} from fighting over internal resources; thus they do not lean on system based blocking mechanisms until absolutely necessary This falls under the banner of lock avoidance, and might be accomplished with various queuing strategies (see libdispatch), etc.. Note that this leaves the PEA divorced from the underlying system thread, because they are internally multiplexed between them. This is your concern about divorcing the concepts. In practice, you pass around your favourite languages abstraction of a context pointer.
As 1 indicates, there are tangible results that can be directly linked to this approach; and a few intangibles. Locking is easy -- you just make one big lock around your transactions and you are good to go. That doesn't scale; but fine-grained locking is hard. Hard to get working, hard to choose the fineness of the grain. When to use { locks, CVs, semaphores, barriers, ... } are obvious in textbook examples; a little less so in deeply nested logic. Lock avoidance makes that, for the most part, go away, and be limited to contended leaf components like malloc().
I maintain some skepticism, as the research typically shows a poorly scaled system, which is transformed into a lock avoidance model, then shown to be better. I have yet to see one which unleashes some experienced developers to analyze the synchronization behavior of the system, transform it for scalability, then measure the result. But, even if that were a win experienced developers are a rare(ish) and expensive commodity; the heart of scalability is really financial.

LinkedBlockingQueue's thread-safety with many producers in a producer-consumer scenario

I'm trying to model a situation in Java in which many producers (at least 2) access the same LinkedBlockingQueue at a fixed rate. They produce, put, and then start over again.
I was wondering whether this could eventually lead to race conditions between those producers which try to gain write access on the queue at the same time. Are java.util.concurrent.BlockingQueue's implementations already set up to handle such an issue, or should I manually create mutexes in order to avoid this kind of problems?
Thank you for your attention.
java's blocking queues are thread-safe for single operations such as take and put but are not for multiple operations of put or take operations such as addAll which is not being performed atomically.
so in your case the answer is no, you should not handle the thread-safety yourself unless you would like the producers to produce multiple products and put them all in one operation.

Are regular Queues inappropriate to use when multithreading in Java?

I am trying to add asynchronous output to a my program.
Currently, I have an eventManager class that gets notified each frame of the position of any of the moveable objects currently present in the main loop (It's rendering a scene; some objects change from frame to frame, others are static and present in every frame). I am looking to record the state of each frame so I can add in the functionality to replay the scene.
This means that I need to store the changing information from frame to frame, and either hold it in memory or write it to disk for later retrieval and parsing.
I've done some timing experiments, and recording the state of each object to memory increased the time per frame by about 25% (not to mention the possibility of eventually hitting a memory limit). Directly writing each frame to disk takes (predictably) even longer, close to twice as long as not recording the frames at all.
Needless to say, I'd like to implement multithreading so that I won't lose frames per second in my main rendering loop because the process is constantly writing to disk.
I was wondering whether it was okay to use a regular queue for this task, or if I needed something more dedicated like the queues discussed in this question.
In my situation, there is only one producer (the main thread), and one consumer (the thread I want to asynchronously write to disk). The producer will never remove from the queue, and the consumer will never add to it - so do I need a specialized queue at all?
Is there an advantage to using a more specialized queue anyway?
Yes, a regular Queue is inappropriate. Since you have two threads you need to worry about boundary conditions like an empty queue, full queue (assuming you need to bound it for memory considerations), or anomalies like visibility.
A LinkedBlockingQueue is best suited for your application. The put and take methods use different locks so you will not have lock contention. The take method will automatically block the consumer writing to disk if it somehow magically caught up with the producer rendering frames.
It sounds like you don't need a special queue, but if you want the thread removing from the queue to wait until there's something to get, try the BlockingQueue. It's in the java.util.concurrent package, so it's threadsafe for sure. Here are some relevant quotes from that page:
A Queue that additionally supports operations that wait for the queue
to become non-empty when retrieving an element, and wait for space to
become available in the queue when storing an element.
...
BlockingQueue implementations are designed to be used primarily for
producer-consumer queues, but additionally support the Collection
interface.
...
BlockingQueue implementations are thread-safe.
As long as you're already profiling your code, try dropping a BlockingQueue in there and see what happens!
Good luck!
I don't think it will matter much.
If you have 25% overhead serializing a state in memory, that will still be there with a queue.
Disk will be even more expensive.
The queue blocking mechanism will be cheap in comparison.
One thing to watch for is your queue growing out of control: disk is slow no matter what, if it can't consume queue events fast enough you're in trouble.

Best way to configure a Threadpool for a Java RIA client app

I've a Java client which accesses our server side over HTTP making several small requests to load each new page of data. We maintain a thread pool to handle all non UI processing, so any background client side tasks and any tasks which want to make a connection to the server. I've been looking into some performance issues and I'm not certain we've got our threadpool set up as well as possible. Currently we use a ThreadPoolExecutor with a core pool size of 8, we use a LinkedBlockingQueue for the work queue so the max pool size is ignored. No doubt there's no simple do this certain thing in all situations answer, but are there any best practices. My thinking at the moment is
1) I'll switch to using a SynchronousQueue instead of a LinkedBlockingQueue so the pool can grow to the max pool size figure.
2) I'll set the max pool size to be unlimited.
Basically my current fear is that occasional performance issues on the server side are causing unrelated client side processing to halt due to the upper limit on the thread pool size. My fear with unbounding it is the additional hit on managing those threads on the client, possibly just the better of 2 evils.
Any suggestions, best practices or useful references?
Cheers,
Robin
It sounds like you'd probably be better of limiting the queue size: does your application still behave properly when there are many requests queued (is it acceptable for all task to be queued for a long time, are some more important to others)? What happens if there are still queued tasks left and the user quits the application? If the queue growing very large, is there a chance that the server will catch-up (soon enough) to hide the problem completely from the user?
I'd say create one queue for requests whose response is needed to update the user interface, and keep its queue very small. If this queue gets too big, notify the user.
For real background tasks keep a separate pool, with a longer queue, but not infinite. Define graceful behavior for this pool when it grows or when the user wants to quit but there are tasks left, what should happen?
In general, network latencies are easily orders of magnitude higher than anything that can be happening in regards to memory allocation or thread management on the client side. So, as a general rule, if you are running into a performance bottle neck, look first and foremost to the networking link.
If the issue is that your server simply can not keep up with the requests from the clients, bumping up the threads on the client side is not going to help matters: you'll simply progress from having 8 threads waiting to get a response to more threads waiting (and you may even aggravate the server side issues by increasing its load due to higher number of connections it is managing).
Both of the concurrent queues in JDK are high performers; the choice really boils down to usage semantics. If you have non-blocking plumbing, then it is more natural to use the non-blocking queue. IF you don't, then using the blocking queues makes more sense. (You can always specify Integer.MAX_VALUE as the limit). If FIFO processing is not a requirement, make sure you do not specify fair ordering as that will entail a substantial performance hit.
As alphazero said, if you've got a bottleneck, your number of client side waiting jobs will continue to grow regardless of what approach you use.
The real question is how you want to deal with the bottleneck. Or more correctly, how you want your users to deal with the bottleneck.
If you use an unbounded queue, then you don't get feedback that the bottleneck has occurred. And in some applications, this is fine: if the user is kicking off asynchronous tasks, then there's no need to report a backlog (assuming it eventually clears). However, if the user needs to wait for a response before doing the next client-side task, this is very bad.
If you use LinkedBlockingQueue.offer() on a bounded queue, then you'll immediately get a response that says the queue is full, and can take action such as disabling certain application features, popping a dialog, whatever. This will, however, require more work on your part, particularly if requests can be submitted from multiple places. I'd suggest, if you don't have it already, you create a GUI-aware layer over the server queue to provide common behavior.
And, of course, never ever call LinkedBlockingQueue.put() from the event thread (unless you don't mind a hung client, that is).
Why not create an unbounded queue, but reject tasks (and maybe even inform the user that the server is busy (app dependent!)) when the queue reaches a certain size? You can then log this event and find out what happened on the server side for the backup to occur, Additionally, unless you are connecting to a multiple remote servers there is probably not much point having more than a couple of threads in the pool, although this does depend on your app and what it does and who it talks to.
Having an unbounded pool is usually dangerous as it generally doesn't degrade gracefully. Better to log the problem, raise an alert, prevent further actions being queued and figure out how to scale the server side, if the problem is there, to prevent this happening again.

Is there a pattern for this queueing system, and example Java code?

I have a component that I wish to write and it's the kind of thing that feels like a common pattern. I was hoping to find the common name for the pattern if there is one, and examples of how to go about implementing it.
I have a service that queues requests and processes them one at a time. I have a number of client threads which make the requests. The key is that the calling threads must block until their own particular request is serviced.
E.g. if there are 10 threads, all making a request, then the 10th thread will block for longest while it waits for its request to make it to the front of the queue, and to be processed. In brief pseodocode, a call would be as simple as:
service.processMessage(myMessage); /* block whilst it enqueues, waits, */
/* processes and returns */
I know what you're thinking - why bother having threads at all? Let's just say there are design constraints well outside my control.
Also, this should run on JavaME, which means an infuriating subset of real Java, and no swanky external libraries.
If you do not have any requirements on the total ordering of handling requests (i.e., you don't mind arbitrarily mixing requests from different threads independent of the order they "arrive" in), you could simply make processMessage() synchronized, I guess.

Categories

Resources