I have a problem with controlling queue of threads. My application uses MySQL database to read parts of data, processing it and showing results on screen. There is about 20 tasks in queue, but only one is processing at once. Also user might cancel current queue, and "ask" for another part of data from database. Now I'm using ThreadPoolExecutor with "discard oldest policy", because I am removing "oldest" part of data from queue, and replace it with "new" part, when user wants it. My problem is that even if corePoolSize is set to 1, sometimes there are 4 threads running at once. I don't know how to cleanly empty queue of not-started tasks. Now I'm using getQueue() and removing all items from list, but I read it is bad way. Also before adding new task to queue I want to check if there is no such task(with the same part of data) in queue and I have no idea how to achieve that.
Concluding:
1. Is there any way to force ThreadPoolExecutor to run just 1 task at once?
2. How to check queue to prevent duplicates of tasks (they are distinguished by String value)?
3. How to remove cleanly 'old' tasks from queue?
Or maybe I should use some other mechanism?
Any help is appreciated.
1. Is there any way to force ThreadPoolExecutor to run just 1 task at once?
Executors.newSingleThreadExecutor()
2. How to check queue to prevent duplicates of tasks (they are distinguished by String
Create your own list of tasks names. Add task name to this list before send task to executor and remove task name after task is executed.
3. How to remove cleanly 'old' tasks from queue?
When you add task to an executor you will get Future object. You can use future to cancel tasks. Or you can shutdown executor and create new one )
Perhaps using Executors.newSingleThreadExecutor() will give you a better guarantee that there is exactly one thread in the pool.
Related
Looking for an approach to solve a multi threading problem.
I have N number of tasks say 100. I need to run this 100 tasks using limited number of threads say 4. Task size is huge , so I dont want to create all the tasks together. Each task will be created only when a free thread is available from the pool. Any recommended solution for the same.
You could use a BlockingQueue to define the tasks. Have one thread create the tasks and add them to the queue using put, which blocks until there's space in the queue. Then have each worker thread just pull the next task off of the queue. The queue's blocking nature will basically force that first thread (that's defining the tasks) to not get too far ahead of the workers.
This is really just a case of the producer-consumer pattern, where the thing being produced and consumed is a request to do some work.
You'll need to specify some way for the whole thing to finish once all of the work is done. One way to do this is to put N "poison pills" on the queue when the generating thread has created all of the tasks. These are special tasks that just tell the worker thread to exit (rather than doing some work and then asking for the next item). Since each thread can only read at most one poison pill (because it exits after it reads it), and you put N poison pills in the queue, you'll ensure that each of your N threads will see exactly one poison pill.
Note that if the task-generating thread consumes resources, like a database connection to read tasks from, those resources will be held until all of the tasks have been generated -- which could be a while! That's not generally a good idea, so this approach isn't a good one in those cases.
If can get the number of active threads at a certain point of time from the thread pool you can solve your problem. To do that you can use ThreadPoolExecutor#getActiveCount. Once you have the number of the active thread then you can decide you should create a task or not.
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(5);
executor.getActiveCount();
Note: ExecutorService does not provide getActiveCount method, you
have to use ThreadPoolExecutor. ThreadPoolExecutor#getActiveCount
Returns the approximate number of threads that are actively
executing tasks.
I have a database table called jobs and some producer service inserting data to this table. I need to create a consumer service to process this data.
I got a server with 8 core 16 threads and I create a thread pool with 16 threads.
ExecutorService executorService = Executors.newFixedThreadPool(16);
I will fetch 16 records from database and distrubute this data to consumer threads. After all threads completes their job I will fetch another 16 records.(I really don't know my solution is efficient or not)
How can I distrubute these tasks to consumer threads? Do I need to use BlockingQueue?
Executor service has a queue to buffer your task when the thread is not available.
You need to write another thread which will submit task periodically to the executor service and also check the completion strategy .i.e if executorservice queue is full required to handle.
The javadoc:s for ExecutorService might come in handy here.
Create the work as implementing Callable, put them into a collection and use executorService.invokeAll(<Collection of Callable), check for the Futures to complete.
Or just use executorService.submit(<task>)
There is no need to "batch" the records. Just submit them to the executor service.
If you are concerned that you might overwhelm the JVM's heap by filling the executor service's queue, then create your executor service using (for example) an ArrayBlockingQueue as the work queue. That will cause the executor to reject requests if the work queue gets too long. Various other strategies are possible.
If you are going to do fancy things with your ExecutorService, I recommend that you read the javadocs for ThreadPoolExecutor. The API is rich and complex, and warrants thorough reading before you choose a specific implementation approach.
While that approach will work, it may still leave some unused computation power, assuming that 16 is the optimal number of threads.
I'd rather use a pull-based approach, in which threads "pull" entries to process:
Option 1: Retrieve all records and use a parallel stream:
List<Record> allValues = //fetch
allValues.parallelStream().forEach(...do your processing...);
//You can even have a better version that reads data from the result set as needed:
Stream.generate(() -> {
resultSet.next();
return rs.getObject(1); //Read/create the value from the record
});
Option 2: Use a queue of some sort based on all the data retrieved from the DB and then create implementations of callable that processes queue (they loop, each thread remaining busy until the queue is exhausted). You can then use the executor service, launch those tasks:
Queue<Object> records; //Create the queue of records
ExecutorService es; // Instantiate the executor service with desired capacity
//Execute the runnable that processes the queue. Only ending when there's nothing on the queue.
for (int i = 0; i < 16; i++) {
es.execute(() -> {
while(!records.isEmpty()) {
//You need to handle this as this
//check and a poll() call may need to be synced.
Object next = records.poll();
//process
}
});
}
I would recommend to use dedicated ArrayBlockingQueue of size twice as much as the number of available processors (2x16=32, in your case).
A separate thread reads records from database and puts them in the queue. If the queue is full, that thread would wait when a space for the next record become available. If records are processed faster than the reading thread is able to retrieve them from the database, a number of reading threads can be used, all running the same read-put loop.
Consumer threads simply take next records from the queue and process them, in a loop.
Append:
alternative approach is to wrap each record with an object of type Runnable (or Callable) and submit it to the executor service. One, smaller drawback is that additional wrapper object has to be created. The greater drawback is that the input executor queue may become overloaded. Then, depending of what kind of queue is used, that queue either throw RejectedExecutionException, or consume all available core memory. The ArrayBlockingQueue in case of overflow simply suspends the producer threads.
I have a scenario wherein same Tasks get assigned multiple times to an ExecutorService. I want to avoid that, Is there a way to do it?
I have Tasks with a String constructor.
Task task1 = new Task ("A");
than I execute this task
executor.execute (task1);
Then I create another task with same string.
Task task2 = new Task ("A");
Lets say I cannot avoid this from happening.
Now I execute this task.
executor.execute (task2).
I want only one of these tasks to be executed, since both tasks are similar in nature.
How?
At first, I would have implemented a queue interface and passed it to the executor service. My implementation would be using a hashset to hold the memory, next to a regular collection to hold the tasks as the queue requires. Adding to my queue therefore would have involved checking the hashset first. Maybe a linkedhashset rolling off eldest entries...
However, the executorservice submit() doesn't fully rely on the queue. See the javadoc of ThreadPoolExecutor. The queue could refuse the task, but the executor could spawn a thread instead and still accept the task. Besides, that implies you could intervene in the executor's construction anyway.
So, presuming control on the executor's class, it seems you must extend an executorservice and have the knowledge there instead of a in a custom queue. You only need to override submit() and throw the rejection exception.
Of course, this reveals the rejection to the submitter. You should deal with that gracefully. You cannot hide this fact because you cannot return a Future if there was no submission. Unless you wire up the old Future to the new one (your knowledge hashmap contains Futures). This may cause difficulties though since every Future has a 'done' state and is cancel-able... Your own Future would be delegating to the original Futures. I think task rejection is simpler.
I have a thread pool of m threads. Let's say m were 10 and fix. Then there are n queues with the possibility of n becoming large (like 100'000 or more). Every queue holds tasks to be executed by those m threads. Now, very important, every queue must be worked off sequentially task by task. This is a requirement to make sure that tasks are executed in the order they were added to the queue. Otherwise the data could become inconsistent (same as, say, with JMS queues).
So the question is now how to make sure that the tasks in those n queues are processed by the available m threads in a way that no task added to the same queue can be executed "at the same time" by different threads.
I tried to solve this problem myself and figured out that it is quite demanding. Java ThreadPoolExecutor is nice, but you would have to add quite a bit of functionality that is not easy to develop. So the question is whether anyone knows of some framework or system for Java that already solves this problem?
Update
Thanks to Adrian and Tanmay for their suggestions. The number of queues may be very large (like 100'000 or more). So one thread per queue is unhappily not possible although it would be simple and easy. I will look into the fork join framework. Looks like an interesting path to pursue.
My current first iteration solution is to have a global queue to which all tasks are added (using a JDK8 TransferQueue, which has very little locking overhead). Tasks are wrapped into a queue stub with the lock of the queue and its size. The queue itself does not exist physically, only its stub.
An idle thread first needs to obtain a token before it can access the global queue (the token would be a single element in a blocking queue, e.g. JDK8 TransferQueue). Then it does a blocking take on the global queue. When a task was obtained, it checks whether the queue lock of the task's queue stub is down. Actually, I think just using an AtomicBoolean would be sufficient and create less lock contention than a lock or synchronized block.
When the queue lock is obtained, the token is returned to the global queue and the task is executed. If it is not obtained, the task is added to a 2nd level queue and another blocking take from the global queue is done. Threads need to check whether the 2nd level queue is empty and take a task from it to be executed as well.
This solution seems to work. However, the token every thread needs to acquire before being allowed to access the global queue and the 2nd level queue looks like a bottleneck. I believe it will create high lock contention. So, I'm not so happy with this. Maybe I start with this solution and elaborate on it.
Update 2
All right, here now the "best" solution I have come up with so far. The following queues are defined:
Ready Queue (RQ): Contains all tasks that can be executed immediately by any thread in the thread pool
Entry Queue (EQ): Contains all tasks the user wants to be executed as well as internal admin tasks. The EQ is a priority queue. Admin tasks have highest priority.
Channels Queues (CQ): For every channel there is an internal channel queue that is used to preserve the ordering of the tasks, e.g. make sure task are executed sequentially in the order they were added to EQ
Scheduler: Dedicated thread that takes tasks from EQ. If the task is a user task it is added to the CQ of the channel the task was added to. If the head of the CQ equals the just inserted user task it is also added to the EQ (but remains in the CQ) so that it is executes as soon as the next thread of the thread pool becomes available.
If a user task has finished execution an internal task TaskFinished is added to RQ. When executed by the scheduler, the head is taken from the associated CQ. If the CQ is not empty after the take, the next task is polled (but not taken) from the CQ and added to the RQ. The TaskFinished tasks have higher priority than user tasks.
This approach contains in my opinion no logical errors. Note that EQ and RQ need to be synchronized. I prefer using TransferQueue from JDK8 which is very fast and where checking for it to be empty or not, polling the head item is also very fast. The CQs need not be synchronized as they are always accessed by the Scheduler only.
So far I'm quite happy with this solution. What makes me think is whether the Scheduler could turn into a bottleneck. If there are much more tasks in the EQ than it can handle the EQ might grow building up some backlog. Any opinions about that would be appreciated :-)
You can use Fork Join Framework if you are working in Java 7 or Java 8.
You can create a RecursiveTask using popped first element from each queue.
Remember to provide a reference to the queues to the corresponding RecursiveTasks.
Invoke all of the at once. (In a loop or stream).
Now at the end of the compute method (after processing of a task is completed), create another RecursiveTask by popping another element from the corresponding queue and call invoke on it.
Notes:
Each task will be responsible for extracting new element from the queue, so all tasks from the queue would be executed sequentially.
There should be a new RecursiveTask created and invoked separately for each element in the queues. This ensures that some queues do not hog the threads and starvation is avoided.
Using an ExecutorService is also a viable option, but IMO ForkJoin's API if friendlier for your use case
Hope this helps.
One simple solution would be to create a task whenever an element is added to an empty queue. This task would be responsible for only that queue and would end when the queue has been worked off. Ensure that the Queue implementations are thread-safe and the task stops after removing the last element.
EDIT: These tasks should be added to a ThreadPoolExecutor with an internal queue, for example one created by ExecutorService.newFixedThreadPool, which will work off the tasks in parallel with a limited number of threads.
Alternatively, just divide the queues among a fixed number of threads:
public class QueueWorker implements Runnable {
// should be unique and < NUM_THREADS:
int threadId;
QueueWorker(int threadId) {
this.threadId = threadId;
}
#Override
public void run() {
int currentQueueIndex = threadId;
while (true) {
Queue currentQueue = queues.get(currentQueue);
// execute tasks until empty
currentQueueIndex += NUM_THREADS;
if (currentQueueIndex > queues.size()) {
currentQueueIndex = threadId;
}
}
}
}
I'm writing an application in Java which uses ExecutorService for running multiple threads.
I wish to submit multiple tasks (thousands at a time) to the Executor as Callables and when done, retrieve their result. The way I'm approaching this is each time I call submit() function, I get a Future which I store in an ArrayList. Later I pass the List to a thread which keeps iterating over it, calling future.get() function with a timeout to see if the task completed. is this the right approach or is to too inefficient?
EDIT --- More info ---
Another problem is that each Callable takes different amount of processing time. So if I simply take the first element out of the List and call get() on it, it will block while results of others may become available and the program will not know. That is why I need to keep iterating with timeouts.
thanks in advance
is this the right approach or is to too inefficient?
This is not the correct approach per se. You are needlessly iterating over ArrayList checking for task completion.
This is very simple: Just use: CompletionService. You can easily wrap your existing executor into it. From JavaDocs:
Producers submit tasks for execution. Consumers take completed tasks
and process their results in the order they complete.
In essence, CompletionService provides a way to get the result back simply by calling take(). It is a blocking function and the caller will block until the results are available.
Note that the call to Future.get will block until the answer is available. So you don't need another thread to iterate over the array list.
See here https://blogs.oracle.com/CoreJavaTechTips/entry/get_netbeans_6