There is a list of N resources, each of them can be queried by at most a single thread at a time.
There are several threads that need to do the same thing at approximately the same time: query each of the resources (each thread has a different query), in arbitrary order, and collect the responses.
If each thread loops over the resources in the same order, from 0 to N-1, then they will probably have to wait for each other, which is not efficient.
I thought of letting the threads loop over the resources in a random permutation, but this seems too complex and also not so efficient, for example, for 2 resources and 2 threads, in half the cases they will choose the same order and wait for each other.
Is there a simple and more efficient way to solve this?
Somehow, my answer to this is that there is not. You have no control over the threads and the order they access the resources, and, most of all (which is the entire point of synchronizing) you don't know how they will be scheduled for execution. Even trying to get them to access to free resources would have unpredictable results, because last thread created could be scheduled first.
The only thing that comes to my mind is partitioning. That is, dividing the resources and/or threads to access/be accessed in parts. Note that this is not a simple issue: As you said, having less threads and less resources would increase chances that a thread tries to access an already locked resource.
Maybe you should think the other way around. The key is to prevent starvation of the resources, i.e. minimize the time the resource is not processing requests when it can. Generally queues are useful here.
Given that you have at most 8 different resources, I'd give each resource its own thread and a queue. The querying threads will first put all their requests onto these queues and then wait, while the resource consumes from this queue. This way, the resource will be saturated with requests and starvation is minimized.
just remove the resource from the list when its in use, so each thread does:
remove resource from list
process it
put it back
obviously list access needs to be synchronized. This way you will never have 2 threads trying to use the same resource.
Related
I currently have a Spring dispatcher ensuring various concurrency limitation policies based on bounded queues.
Basically, multiple request types are handled, some memory expensive, other less, and the request threads happening to hit the memory expensive tasks put a token in a bounded blocking queue (ArrayBlockingQueue), so that only N of them end up actually running, while the other end up waiting.
Now, the waiting list is internally managed by a ReentrantLock, which in turns leverages a Condition implementation fund in AbstractQueuedLongSynchronizer that uses a linked list, which notifies the longest waiting thread when a token is removed from the queue.
Now I need a different behavior, so that the list maintained by the Condition is sorted by a user defined priority too (straight one, no counter-starvation measures needed for lower priority requests).
Unfortunately the classes in question have a wall of "final" declarations making it hard to inject this seemingly small behavioral change.
Is there any concurrent data structure out there providing the behavior I'm looking for, or that would allow customization?
Alternatively, suggestions to implement it without rewriting ArrayBlockinQueue/ReentrantLock/Condition from scratch?
Note: really looking for a bounded blocking queue with priority in the waiting list, other approaches requiring a redesign of the whole application, secondary execution thread pools and the like are unfortunately not feasible (time and material limitations)
So let me give you an idea of what I'm trying to do:
I've got a program that records statistics, lots and lots of them, but it records them as they happen one at a time and puts them into an ArrayList, for example:
Please note this is an example, I'm not recording these stats, I'm just simplifying it a bit
User clicks -> Add user_click to array
User clicks -> Add user_click to array
Key press -> Add key_press to array
After each event(clicks, key presses, etc) it checks the size of the ArrayList, if it is > 150 the following happens:
A new thread is created
That thread is given a copy of the ArrayList
The original ArrayList is .clear()'ed
The new thread combines similar items so user_click would now be one item with a quantity of 2, instead of 2 items with a quantity of 1 each
The thread processes the data to a MySQL db
I would love to find a better approach to this, although this works just fine. The issue with threadpools and processing immediately is there would be literally thousands of MySQL queries per day without combining them first..
Is there a better way to accomplish this? Is my method okay?
The other thing to keep in mind is the thread where events are fired and recorded can't be slowed down so I don't really want to combine items in the main thread.
If you've got code examples that would be great, if not just an idea of a good way to do this would be awesome as-well!
For anyone interested, this project is hosted on GitHub, the main thread is here, the queue processor is here and please forgive my poor naming conventions and general code cleanliness, I'm still(always) learning!
The logic described seems pretty good, with two adjustments:
Don't copy the list and clear the original. Send the original and create a new list for future events. This eliminates the O(n) processing time of copying the entries.
Don't create a new thread each time. Events are delayed anyway, since you're collecting them, so timeliness of writing to database is not your major concern. Two choices:
Start a single thread up front, then use a BlockingQueue to send list from thread 1 to thread 2. If thread 2 is falling behind, the lists will simply accumulate in the queue until thread 2 can catch up, without delaying thread 1, and without overloading the system with too many threads.
Submit the job to a thread pool, e.g. using an Executor. This would allow multiple (but limited number of) threads to process the lists, in case processing is slower than event generation. Disadvantage is that events may be written out of order.
For the purpose of separation of concern and reusability, you should encapsulate the logic of collecting events, and sending them to thread in blocks for processing, in a separate class, rather than having that logic embedded in the event-generation code.
That way you can easily add extra features, e.g. a timeout for flushing pending events before reaching normal threshold (150), so events don't sit there too long if event generation slows down.
I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?
Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).
The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )
While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.
I have a fixed number n of identical resources that need to be shared between n or more threads. Whenever a thread needs to use a resource, it can take any available one, for which it runs an indetermininate amount of time (i.e. usage times are not uniform) and then release it.
What is a good Java data structure to manage this scenario? I can only think of one way to do it, which is by using a LinkedBlockingQueue and the take and put operations as locking and releasing a resource, respectively. I'd just like a suggestion from the concurrency experts:
For those who are curious: The resources that need to be shared are identical copies of a non-reentrant FORTRAN library for computing multivariate normal CDFs and moments. Spectacular numerical library, but written in an age where thread-safe code wasn't something to be worried about. In this case we make n copies of the library, where n = Runtime.getRuntime().availableProcessors() .
EDIT: I don't want to create the overhead of threads to execute this library. It is already being called from multiple threads; the calling threads should just be able to lock a resource and get on with it.
UPDATE: See https://stackoverflow.com/a/19039878/586086 for the motivation and the implementation.
The pattern you're describing is a resource pool. A thread-safe queue is a reasonable way to handle the situation when the resources are fairly simple, though you might also consider a pool library such as pool4j.
Create a singleton class with a list of fixed resource, and associated flag to mark each resource as available or unavailable, and 2 synchronized methods, something like:
synchronized Resource getResource(){
find an unavailable resource, mark it as unavailable and return it
}
synchronized int returnResource(Resource r){
find the matching resource on list and mark it as available.
}
I am trying to add asynchronous output to a my program.
Currently, I have an eventManager class that gets notified each frame of the position of any of the moveable objects currently present in the main loop (It's rendering a scene; some objects change from frame to frame, others are static and present in every frame). I am looking to record the state of each frame so I can add in the functionality to replay the scene.
This means that I need to store the changing information from frame to frame, and either hold it in memory or write it to disk for later retrieval and parsing.
I've done some timing experiments, and recording the state of each object to memory increased the time per frame by about 25% (not to mention the possibility of eventually hitting a memory limit). Directly writing each frame to disk takes (predictably) even longer, close to twice as long as not recording the frames at all.
Needless to say, I'd like to implement multithreading so that I won't lose frames per second in my main rendering loop because the process is constantly writing to disk.
I was wondering whether it was okay to use a regular queue for this task, or if I needed something more dedicated like the queues discussed in this question.
In my situation, there is only one producer (the main thread), and one consumer (the thread I want to asynchronously write to disk). The producer will never remove from the queue, and the consumer will never add to it - so do I need a specialized queue at all?
Is there an advantage to using a more specialized queue anyway?
Yes, a regular Queue is inappropriate. Since you have two threads you need to worry about boundary conditions like an empty queue, full queue (assuming you need to bound it for memory considerations), or anomalies like visibility.
A LinkedBlockingQueue is best suited for your application. The put and take methods use different locks so you will not have lock contention. The take method will automatically block the consumer writing to disk if it somehow magically caught up with the producer rendering frames.
It sounds like you don't need a special queue, but if you want the thread removing from the queue to wait until there's something to get, try the BlockingQueue. It's in the java.util.concurrent package, so it's threadsafe for sure. Here are some relevant quotes from that page:
A Queue that additionally supports operations that wait for the queue
to become non-empty when retrieving an element, and wait for space to
become available in the queue when storing an element.
...
BlockingQueue implementations are designed to be used primarily for
producer-consumer queues, but additionally support the Collection
interface.
...
BlockingQueue implementations are thread-safe.
As long as you're already profiling your code, try dropping a BlockingQueue in there and see what happens!
Good luck!
I don't think it will matter much.
If you have 25% overhead serializing a state in memory, that will still be there with a queue.
Disk will be even more expensive.
The queue blocking mechanism will be cheap in comparison.
One thing to watch for is your queue growing out of control: disk is slow no matter what, if it can't consume queue events fast enough you're in trouble.