I hava a java program, a section of it is compute intensive, like this
for i = 1 :512
COMPUTE INTENSIVE SECTION
end
I want to split it into multithread, make it faster when running.
COMPUTE INTENSIVE SECTION is not sequential-wise. It means running i=1 first or i=5 fist are the same...
Can anybody give me a grand guide about this. How to do it?
Thanks indeed!
Happy Thanksgiving!
You should read the Concurrency Trail of the Java Tutorial. Especially Executors and Thread Pools should be relevant for you.
Basically, you create a thread pool (which is an Executor) through one of the factory methods in the Executors class and submit Runnable instances to it:
for(int i = 0; i < 512; i++){
executor.execute(new Runnable(){public void run(){
// your heavy code goes here
}});
}
Sounds like a thread pool would be good. Basically, you whip up a collection of N different threads, then request them in a loop. The request blocks until a thread is available.
ThreadPool pool = Executors.newFixedThreadPool(10); // 10 threads in the pool
ArrayList<Callable> collectionOfCallables = new ArrayList<Callable>( );
for (...) {
Callable callable = new Callable<Foo>() { public Foo call() { COMPUTE INTENSIVE SECTION } }
collectionOfCallables.add(callable);
}
ArrayList<Future<Foo>> results = pool.invokeAll( collectionOfCallables );
pool.awaitTermination(5, TimeUnit.MINUTES ); // blocks till everything is done or 5 minutes have passed.
With the Future's you really don't need to await termination. get()ing the result from a future will block until the corresponding thread is done (or canceled).
Look at any Java multi-threading tutorial, either the official one:
http://download.oracle.com/javase/tutorial/essential/concurrency/index.html
or some of the others, e.g.:
Very nice in my opinion - http://www.ibm.com/developerworks/java/tutorials/j-threads/section2.html
Short one - http://www.tutorialspoint.com/java/java_multithreading.htm
A bit succinct and touches a bit more then basics - http://www.vogella.de/articles/JavaConcurrency/article.html
Similar to Sean Patrick Floyd's answer, but a bit less verbose with a lambda expression:
ExecutorService es = Executors.newCachedThreadPool();
for(int i = 0; i < 512; i++){
es.execute(() -> {
// code goes here
});
}
If you can split your intensive action to recursive smaller sub tasks, ForkJoinPool is ideal for you.
If your server is running with 8 core CPU, you can set the pool size as 8
ForkJoinPool forkJoinPool = new ForkJoinPool(8);
OR
you can use Executor Service FixedThreadPool by moving compute intensive task to Callable as below
ExecutorService executorService = Executors.newFixedThreadPool(8);
Future future = executorService.submit(new Runnable() {
public void run() {
System.out.println("Your compute intensive task");
}
});
future.get(); //returns null if the task has finished correctly.
There is one advantage with ForkJoinPool. Idle threads will steal jobs from busy threads from blokcingQueue where your Runnable/Callable tasks have been submitted.
Java 8 added one more new API in Executors : newWorkStealingPool
If you need to wait for completion of all tasks, use can use invokeAll() on ExecutorService.
Have a look at this article by Benjamin for advanced concurrent APIs using Java 8
Related
Is there a Java class such that:
Executable tasks can be added via an id, where all tasks with the same id are guaranteed to never run concurrently
The number of threads can be limited to a fixed amount
A naive solution of a Map would easily solve (1), but it would be difficult to manage (2). Similarly, all thread pooling classes that I know of will pull from a single queue, meaning (1) is not guaranteed.
Solutions involving external libraries are welcome.
For each id, you need a SerialExecutor, described in the documentation of java.util.concurrent.Executor. All serial executors delegate work to a ThreadPoolExecutor with given corePoolSize.
Opimized version of SerialExecutor can be found at my code samples.
If you don't find something that does this out of the box, it shouldn't be hard to roll your own. One thing you could do is to wrap each task in a simple class that reads on a queue unique per id, e.g.:
public static class SerialCaller<T> implements Callable<T> {
private final BlockingQueue<Caller<T>> delegates;
public SerialCaller(BLockingQueue<Caller<T>> delegates) {
this.delegates = delegates;
}
public T call() throws Exception {
return delegates.take().call();
}
}
It should be easy to maintain a map of ids to queues for submitting tasks. That satisfies condition (1), and then you can look for simple solutions to condition (2), such as Executors. newFixedThreadPool
I think that the simplest solution is to just have a separate queue for each index and a separate executor (with one thread) for each queue.
The only thing you could achieve with a more complex solution would be to use fewer threads, but if the number of indexes is small and bounded that's probably not worth the effort.
Yes, there is such a library now: https://github.com/jano7/executor
int maxTasks = 10;
ExecutorService underlyingExecutor = Executors.newFixedThreadPool(maxTasks);
KeySequentialBoundedExecutor executor = new KeySequentialBoundedExecutor(maxTasks, underlyingExecutor);
Runnable task = new Runnable() {
#Override
public void run() {
// do something
}
};
executor.execute(new KeyRunnable<>("ID-1", task)); // execute the task by the underlying executor
executor.execute(new KeyRunnable<>("ID-2", task)); // execution is not blocked by the task for ID-1
executor.execute(new KeyRunnable<>("ID-1", task)); // execution starts when the previous task for ID-1 completes
I want to process a large number of independant lines in parallel. In the following code I'm creating a pool of NUM_THREAD Theads containing POOL_SIZE lines.
Each thread is started and I then wait for each thread using 'join'.
I guess it is a bad practice as here, a finished Thread will have to wait for his siblings in the pool.
What would be the correct way to implement this code ? Which classes should I use ?
Thanks !
class FasterBin extends Thread
{
private List<String> dataRows=new ArrayList<String>();
private Object result=null;
#Override
public void run()
{
for(String s:dataRows)
{
//Process item here (....)
}
}
}
(...)
List<FasterBin> threads=new Vector<FasterBin>();
String line;
Iterator<String> iter=(...);
for(;;)
{
while(threads.size()< NUM_THREAD)
{
FasterBin bin=new FasterBin();
while(
bin.dataRows.size() < POOL_SIZE &&
iter.hasNext()
)
{
nRow++;
bin.dataRows.add(iter.next());
}
if(bin.dataRows.isEmpty()) break;
threads.add(bin);
}
if(threads.isEmpty()) break;
for(FasterBin t:threads)
{
t.start();
}
for(FasterBin t:threads)
{
t.join();
}
for(FasterBin t:threads)
{
save(t.result);// ## do something with the result (save into a db etc...)
}
threads.clear();
}
finally
{
while(!threads.isEmpty())
{
FasterBin b=threads.remove(threads.size()-1);
try {
b.interrupt();
}
catch (Exception e)
{
}
}
}
Do NOT do all this by yourself! It is extremely hard to get 1) robust and 2) right.
Instead rewrite your stuff to create a lot of Runnables or Callables and use a suitable ExecutorService to get an Executor to process them with the behaviour you want.
Note that this stay inside the current JVM. If you have more than one JVM available (on multiple machines) I would recommend opening a new question.
java.util.concurrent.ThreadPoolExecutor.
ThreadPoolExecutor x=new ScheduledThreadPoolExecutor(10);
x.execute(runnable);
See this for an overview: Java API for util.concurrent
Direct use of Threads is actually discouraged - look at the package java.util.concurrent, you'll find there ThreadPools and Futures which should be used instead.
Thread.join doesn't mean that the Thread waits for others, it means your main Thread waits for one of the Thread in list to die. In this case your main Thread waits for the slowiest working Thread to finish. I don't see a problem with this approach.
Yes, in some sense, a finished Thread would have to wait for his siblings in the pool: when a thread finishes, it stops, and does not help other threads to finish sooner. Better say, the whole work waits for the thread which works for the longest time.
This is because each thread has exactly one task. You better create many tasks, much more than the number of threads, and put them all in a single queue. Let all working threads take their tasks from that queue in a loop. Then the difference in time for all threads would be roughly the time to execute one task, which is small because tasks are small.
You can start the pool of working threads yourself, or you can wrap each task in a Runnable and submit them to a standard thread pool - this makes no difference.
I'm using Spring ThreadPoolTaskExecutor in order to execute my threads.
I want to group my threads in several groups, and that every group will have different max allowed threads.
For example, something like this:
for (MyTask myTask : myTaskList){
threadPoolTaskExecutor.setMaxThreadsForGroup(myTask.getGroupName(), myTask.getMaxThreads());
threadPoolTaskExecutor.execute(myTask, myTask.getGroupName());
}
Somehow the threadPoolTaskExecutor should know to allow only myTask.getMaxThreads() to every group named myTask.getGroupName(), and the max threads in all tasks all together should not exceed what defined for threadPoolTaskExecutor in the applicationContext.xml
is it possible to do it in simple way?
Thanks
I can see two ways of doing this. The first (and simpler) way is to create a Map<String, ExecutorService> which maps your group names to a specific executor with a max thread limit. This doesn't satisfy your requirement to have a max number of threads overall, but I would argue that this requirement might be an unreasonable one since you can never have total control over the number of threads running in your Java application anyway.
The second way, while more complex, gives you more control. You can have a single executor with a max pool size, and instead of submitting your jobs to it directly, you submit worker tasks which take the real tasks of a BlockingQueue and process them until there are none left. The number of worker tasks you submit will be equal to the group thread limit. The pseudo-code might look like this:
ExecutorService executor = ...
int groupThreadLimit = 3;
final BlockingQueue<Runnable> groupTaskQueue = ...;
// Add all your tasks to the groupTaskQueue.
for(int i = 0; i < groupThreadLimit; i++) {
executor.execute(new Runnable() {
public void run() {
while(true) {
Runnable r = groupTaskQueue.pollFirst();
if(r == null) {
return; // All tasks complete or being processed. Queue empty.
}
r.run();
}
}
});
}
The only slight drawback with this technique is that once the worker tasks start, they won't yield until all the subtasks are complete. This might be bad if you have a fair usage policy and want to avoid starvation.
This question already has answers here:
How to wait for all threads to finish, using ExecutorService?
(27 answers)
Closed 5 years ago.
I need to submit a number of task and then wait for them until all results are available. Each of them adds a String to a Vector(that is synchronized by default). Then I need to start a new task for each result in the Vector but I need to do this only when all the previous tasks have stopped doing their job.
I want to use Java Executor, in particular I tried using Executors.newFixedThreadPool(100) in order to use a fixed number of thread (I have a variable number of task that can be 10 or 500) but I'm new with executors and I don't know how to wait for task termination.
This is something like a pseudocode of what my program needs to do:
ExecutorService e = Executors.newFixedThreadPool(100);
while(true){
/*do something*/
for(...){
<start task>
}
<wait for all task termination>
for each String in result{
<start task>
}
<wait for all task termination>
}
I can't do a e.shutdown because I'm in a while(true) and I need to reuse the executorService...
Can you help me? Can you suggest me a guide/book about java executors?
The ExecutorService gives you a mechanism to execute multiple tasks simultaneously and get a collection of Future objects back (representing the asynchronous computation of the task).
Collection<Callable<?>> tasks = new LinkedList<Callable<?>>();
//populate tasks
for (Future<?> f : executorService.invokeAll(tasks)) { //invokeAll() blocks until ALL tasks submitted to executor complete
f.get();
}
If you have Runnables instead of Callables, you can easily turn a Runnable into a Callable<Object> using the method:
Callable<?> c = Executors.callable(runnable);
Can you suggest me a guide/book about
java executors??
I can answer this part:
Java Concurrency in Practice by Brian Goetz (with Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes and Doug Lea) is most likely your best bet.
It's not only about executors though, but instead covers java.util.concurrent package in general, as well as basic concurrency concepts and techniques, and some advanced topics such as the Java memory model.
Rather than submitting Runnables or Callables to an Executor directly and storing the corresponding Future return values I'd recommend using a CompletionService implementation to retrieve each Future when it completes. This approach decouples the production of tasks from the consumption of completed tasks, allowing for example new tasks to originate on a producer thread over a period of time.
Collection<Callable<Result>> workItems = ...
ExecutorService executor = Executors.newSingleThreadExecutor();
CompletionService<Result> compService = new ExecutorCompletionService<Result>(executor);
// Add work items to Executor.
for (Callable<Result> workItem : workItems) {
compService.submit(workItem);
}
// Consume results as they complete (this would typically occur on a different thread).
for (int i=0; i<workItems.size(); ++i) {
Future<Result> fut = compService.take(); // Will block until a result is available.
Result result = fut.get(); // Extract result; this will not block.
}
When you submit to an executor service, you'll get a Future object back.
Store those objects in a collection, and then call get() on each in turn. get() blocks until the underlying job completes, and so the result is that calling get() on each will complete once all underlying jobs have finished.
e.g.
Collection<Future> futures = ...
for (Future f : futures) {
Object result = f.get();
// maybe do something with the result. This could be a
// genericised Future<T>
}
System.out.println("Tasks completed");
Once all these have completed, then begin your second submission. Note that this might not be an optimal use of your thread pool, since it will become dormant, and then you're re-populating it. If possible try and keep it busy doing stuff.
ExecutorService executor = ...
//submit tasks
executor.shutdown(); // previously submitted tasks are executed,
// but no new tasks will be accepted
while(!executor.awaitTermination(1, TimeUnit.SECONDS))
;
There's no easy way to do what you want without creating custom ExecutorService.
this is how i normally iterate a collection
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
I believe most of us doing this, I wonder is there any better approach than have to iterate sequentially? is there any java library..can I can make this parallel executed by multi-code cpu? =)
looking forward feedback from you all.
Java's multithreading is quite low level in this respect. The best you could do is something like this:
ExecutorService executor = Executors.newFixedThreadPool(10);
for (final Object item : collectionThingy) {
executor.submit(new Runnable() {
#Override
public void run() {
// do stuff with item
}
});
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
This is Java 6 code. If running on Java 5 drop the #Override annotation (it doesn't apply to objects implementing interfaces in java 5 but it does in Java 6).
What this does is it creates a task for each item in the collection. A thread pool (size 10) is created to run those tasks). You can replace that with anything you want. Lastly, the thread pool is shut down and the code blocks awaiting the finishing of all the tasks.
The last has at least one or two exceptions you will need to catch. At a guess, InterruptedException and ExecutionException.
In most cases, the added complexity wouldn't be worth the potential performance gain. However, if you needed to process a Collection in multiple threads, you could possibly use Executors to do this, which would run all the tasks in a pool of threads:
int numThreads = 4;
ExecutorService threadExecutor = Executors.newFixedThreadPool(numThreads);
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
Runnable runnable = new CollectionThingProcessor(iterator.next());
threadExecutor.execute(runnable);
}
As part of the fork-join framework JDK7 should (although not certain) have parallel arrays. This is designed to allow efficient implementation of certain operations across arrays on many-core machines. But just cutting the array into pieces and throwing it at a thread pool will also work.
Sorry, Java does not have this sort of language-level support for automatic parallelism, if you wish it, you will have to implement it yourself using libraries and threads.