ThreadPoolExecutor - Specifying Which Thread Handles a Given Task - java

Is there a good way to implement an execution policy that determines which Thread will handle a given task based on some identification scheme? or is this even a good approach?
I have a requirement to process 1-many files, which I will receive in interleaved chunks. as the chunks arrive I want to make a task out of processing that chunk. The catch is that I do no have the luxury of making the processing code thread-safe, so once a thread in the pool has processed a chunk from a file, i need that same thread to process the rest of that file. I don't care if a thread is processing several files at once, but I cannot have more than one thread from a pool processing the same file at once.
the book "Java Concurrency in Practice" states that you can use execution policies to determine "in what thread will a task be executed?", but I do not grasp how.
Thanks

Well, you could write your own ThreadPoolExecutor - but in general there's no way of doing this. The whole point of a thread pool is that you just throw work at it, without caring which thread gets which task. It sounds like you'll need to manage the threads yourself in this case, keeping a map of which thread is handling which file.
Do you know when a file has been finished? If not, you're going to potentially have problems with an ever-growing map...

A good idea might be a Thread per file:
HashMap<String, MyThreadImplementer> fileToThreadMap...
class MyThreadImplementer implements Runnable {
int maxNumParts;
private List<FileChunk> chunkList...
private List<FileChunk> doneChunks...
public MyThreadImplementer(int maxNumberOfParts) {
maxNumParts=maxNumberOfParts;
}
public void run() {
while( doneChunks.size() < maxNumParts ) {
Thread.sleep(...)
if ( !chunkList.isEmpty() ) {
process each chunk in list and mvoe to done chunks
}
}
}
}
But you'd need to be careful you don't process 1000 files, and thereby create 1000 threads.

You say that you "do no have the luxury of making the processing code thread-safe", but this does not imply that you need to map files to specific threads. It just means that you can't start processing the next chunk from a file until the last chunk from that file has finished processing.
Taking advantage of java.util.concurrent, you could maintain a Map<String, LinkedBlockingQueue<FileChunk>> (assuming filename as key) in the main thread and assign each chunk to the queue for the respective file as chunks come in. Then have one Runnable blocking on each queue.
That way, only one thread at a time would be processing any given file. And you wouldn't need to directly mess with threads or maintain multiple thread pools.

Related

Which concurrent List for multiple writers one reader?

I'm implementing a logging for multiple threads, each will write into a List. After all threads have finished I will then dump the contents of the List into a file. Which concurrent List implementation should I use?
I'm considering the ConcurrentLinkedQueue.
The writing will be concurrent, but the reading will be done by one thread, after all other threads have finished writing.
I could use a List for each thread but then I would have the overhead of managing multiple Lists and I'm not sure it is worth it. Another option would be a synchronized List.
Bonus question: How do I make the last thread dump the list into the file? See Multiple threads arrive, the last should do the processing.
Please - don't.
Java has a logging framework already built in, and it is fully customizable to create different handlers and formatters. If you don't like the standard Java logging framework for whatever reason there are still many others available. All these frameworks handle concurrency.
With the time you are spending on creating a framework that has already existed and perfected for a long time you could have created stuff that added something new to the world.
If you can in fact guarantee that any attempt to read the queue will happen "once all other threads have finished writing" then you do not need a concurrent queue at all.
All you need is a regular, run-of-the-mill, non-concurrent queue, and a factory of "log entry consumers" which dispenses a log entry consumer to each thread that needs to write to the queue.
The use of a consumer hides the queue, so the only thing that threads can do is append to the queue. The implementation of the consumer simply guarantees synchronized access to the insertion point of queue.
The queue then does not need to be synchronized because if it is only to be read "after all other threads have finished writing" then obviously, there will be no concurrent access happening while it is being read.
That having been said:
Are you sure about the "after all other threads have finished writing" part?
How do you define "all other threads have finished writing"?
How do you know that "all other threads have finished writing"?
Why is it important to you that the list is only read after "all other threads have finished writing"?

Creating multiple threads for writing files to Disk, Java

I have an piece of code which writes an object to disk as an when the object is put into the LinkedBlockingQueue.
As of now, this is Single threaded. I need to make it multi threaded as the contents are being written to the different files on disk.and therefore, there is no harm in writing them independently.
I am not sure if i can use ThreadPool here as i dont know when the object will be placed on the queue!! now if i decide to have a fixedThreadPool of 5 threads, how do i distribute it among multiple objects?
Any suggestions are highly appreciated.
here is my existing code. I want to Spawn a new thread as and when i get a new object in the queue.
how do i distribute it among multiple objects?
Well, you don't have to worry about task distribution. All you need to do is submit a runnable or callable(which describes your task) and it will be handed to an idle thread in the pool or if all the threads are busy processing, this new task will wait in queue.
Below is what you can try...
1) Create a thread pool that suits your need best.
ExecutorService es = Executors.newFixedThreadPool(desiredNoOfThreads);
2) As you already have the queue in place -
while (true) {
//peek or poll the queue and check for non null value
//if not null then create a runnable or callable and submit
//it to the ExecutorService
}
Generally speaking, if your files are on the same physical device you will get no performance benefit since storage devices work synchronously on read/write operations. So your will get your threads blocked on I/O, which can lead to poorer speed, and definitely will waste threads that could be doing useful work.

Java Thread storing

So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)
Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.
I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.
I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.

Consuming from many queues

I have a large number of state machines. Occasionally, a state machine will need to be moved from one state to another, which may be cheap or expensive and may involve DB reads and writes and so on.
These state changes occur because of incoming commands from clients, and can occur at any time.
I want to parallelise the workload. I want a queue saying 'move this machine from this state to this state'. Obviously the commands for any one machine need to be performed in sequence, but I can be moving many machines forward in parallel if I have many threads.
I could have a thread per state machine, but the number of state machines is data-dependent and may be many hundreds or thousands; I don't want a dedicated thread per state machine, I want a pool of some sort.
How can I have a pool of workers but ensure that the commands for each state machine are processed strictly sequentially?
UPDATE: so imagine the Machine instance has a list of outstanding commands. When an executor in the thread pool has finished consuming a command, it puts the Machine back into the thread-pool's task queue if it has more outstanding commands. So the question is, how to atomically put the Machine into the thread pool when you append the first command? And ensure this is all thread safe?
I suggest you this scenario:
Create thread pool, probably some of fix size with Executors.newFixedThreadPool
Create some structure (probably it would be a HashMap) which holds one Semaphore for each state machine. That semaphores will have a value of 1 and would be fair semaphores to keep sequence
In Runnable which will do the job on the begging just add semaphore.aquire() for semaphore of its state machine and semaphore.release() at the end of run method.
With size of thread pool you will control level of parallelism.
I suggest another approach. Instead of using a threadpool to move states in a state machine, use a threadpool for everything, including doing the work. After doin some work resulting in a state-change the state-change event should be added to the queue. After the state-change is processed, another do-work event should be added to the queue.
Assuming that the state transition is work-driven, and vice-versa, asequential processing is not possible.
The idea with storing semaphores in a special map is very dangerous. The map will have to be synchronized (adding/removing objs is thread-unsafe) and there is relatively large overhead of doing the searches (possibly synchronizing on the map) and then using the semaphore.
Besides - if you want to use a multithreaded architecture in your application, I think that you should go all the way. Mixing different architectures may proove troublesome later on.
Have a thread ID per machine. Spawn the desired number of threads. Have all the threads greedily process messages from the global queue. Each thread locks the current message's server to be used exclusively by itself (until it's done processing the current message and all messages on its queue), and the other threads puts messages for that server on its internal queue.
EDIT: Handling message pseudo-code:
void handle(message)
targetMachine = message.targetMachine
if (targetMachine.thread != null)
targetMachine.thread.addToQueue(message);
else
targetMachine.thread = this;
process(message);
processAllQueueMessages();
targetMachine.thread = null;
Handling message Java code: (I may be overcomplicating things slightly, but this should be thread-safe)
/* class ThreadClass */
void handle(Message message)
{
// get targetMachine from message
targetMachine.mutexInc.aquire(); // blocking
targetMachine.messages++;
boolean acquired = targetMachine.mutex.aquire(); // non-blocking
if (acquired)
targetMachine.threadID = this.ID;
targetMachine.mutexInc.release();
if (!acquired)
// can put this before release, it may speed things up
threads[targetMachine.threadID].addToQueue(message);
else
{
process(message);
targetMachine.messages--;
while (true)
{
while (!queue.empty())
{
process(queue.pop());
targetMachine.messages--;
}
targetMachine.mutexInc.acquire(); // blocking
if (targetMachine.messages > 0)
{
targetMachine.mutexInc.release();
Thread.sleep(1);
}
else
break;
}
targetMachine.mutex.release();
targetMachine.mutexInc.release();
}
}

java: Patterns for Monitoring worker threads?

and excuse the lack of knowledge on multithreaded apps, but I am new to the field.
Is there a pattern or common used methodology for monitoring the 'job completion' or 'job status' of worker threads from a monitor (a class that acts as a monitor)?
What I have currently done is create a list of workers and create one thread for each worker. After all threads have started i am looping over the worker list and 'checking their status' by making a call to a method.
At that time I couldn't come up with a different solution, but being new to the field, I don't know if this is the way to go, or if there are other solutions or patterns that I should study.
Depending on what you want, there are many ways that you can do this.
If you just want to wait until all the threads finish (i.e. all you care about is having everything finish before moving on), you can use Thread.join():
try {
for (Thread t: threadsIWaitOn)
t.join();
} catch (InterruptedException iex) {
/* ... handle error ...
}
If you want a more fine-grained control over the thread status and want to be able, at any time, to know what threads are doing, you can use the Thread.getState() function. This returns a Thread.State object that describes whether the thread is running, blocked, new, etc., and the Javadoc specifically says that it's designed for monitoring the state of a thread rather than trying to synchronize on it. This might be want you want to do.
If you want even more information than that - say, how to get a progress indicator for each thread that counts up from 0 to 100 as the thread progresses - then another option might be to create a Map from Threads to AtomicIntegers associating each thread with a counter, then pass the AtomicInteger into the constructor of each thread. That way, each thread can continuously increment the counters, and you can have another thread that continuously polls the progress.
In short, you have a lot of options based on what it is that you're trying to accomplish. Hopefully something in here helps out!
Use a ThreadPool and Executor, then you get a Future<> and you can poll for their completion and some more nice stuff, too. I can appreciate this book for you: Java Concurrency in Practice
Try to use any kind of synchronization. For example, wait on some kind of monitor/semaphore until job is done / whatever you need.

Categories

Resources