I have this "ugly" Java code I need to convert to Kotlin idiomatic coroutines and I cant quite figure out how.
Thread[] pool=new Thread[2*Runtime.getRuntime().availableProcessors()];
for (int i=0;i<pool.length;i++)
pool[i]=new Thread(){
public void run() {
int y; while((y=yCt.getAndIncrement())<out.length) putLine(y,out[y]);
}
};
for (Thread t:pool) t.start();
for (Thread t:pool) t.join();
I think it is possible to implement using runBlocking but how do I deal with availableProcessors count?
I'll make some assumptions here:
putLine() is a CPU intensive and not IO operation. I assume this, because it is executed using threads number of 2 * CPU cores, which is usually used for CPU intensive tasks.
We just need to execute putLine() for each item in out. From the above code it is not clear if e.g. yCt is initially 0.
out isn't huge like e.g. millions of items.
You don't look for 1:1 the same code in Kotlin, but rather its equivalent.
Then the solution is really very easy:
coroutineScope {
out.forEachIndexed { index, item ->
launch(Dispatchers.Default) { putLine(index, item) }
}
}
Few words of explanation:
Dispatchers.Default coroutine dispatcher is used specifically for CPU calculations and its number of threads depends on the number of CPU cores. We don't need to create our own threads, because coroutines provide a suitable thread pool.
We don't handle a queue of tasks manually, because coroutines are lightweight and we can instead just schedule a separate coroutine per each item - they will be queued automatically.
coroutineScope() waits for its children, so we don't need to also manually wait for all asynchronous tasks. Any code put below coroutineScope() will be executed when all tasks finish.
There are some differences in behavior between the Java/threads and Kotlin/coroutines code:
Dispatchers.Default by default has the number of threads = CPU cores, not 2 * CPU cores.
In coroutines solution, if any task fail, the whole operation throws an exception. In the original code, errors are ignored and the application continues with inconsistent state.
In coroutines solution the thread pool is shared with other components of the application. This could be a desired behavior or not.
Related
I have a simple method which is multithreaded as follows:
int processors=Runtime.getRuntime().availableProcessors();
detectors[] theCores = new detectors[processors];
for(int i = 0; i < processors; i++){
theCores[i] = new detectors();
theCores[i].start();
}
for(int i = 0; i < processors; i++){
try{ // Waits for completion of all cores
theCores[i].join();}
catch(InterruptedException IntExp) {}
}
Using .join(); means that finished threads will pause until completion of all threads. Because all threads are not identical in speed, some will finish before others and there is sometimes quite a significant gap between the time it takes for the fastest thread to finish the method compared to the slowest thread. Is there any way to assign the cores that are finished and waiting to help out the remaining threads in their execution?
Read about Fork/Join in Java 7
The fork/join framework is distinct because it uses a work-stealing
algorithm. Worker threads that run out of things to do can steal tasks
from other threads that are still busy.
You can let the OS do this for you: instead of scheduling as many threads as processors/cores are available, start slightly more threads. If one thread is finished, the OS will provide the remaining threads with all the remaining CPU power. Depending on the type of work a load factor of 2 (threads = 2 * cores) can be a good choice. This should give you 100% CPU load for most of the time, whereas with a number of threads lower than the number of cores, the CPU load often is less then that, especially if the threads are memory or I/O intensive. In this case you should increase the factor even more.
"to help out the remaining threads in their execution", the work each thread does should be split in lesser parts (usually named tasks). Otherwise it is impossible to parallelize monolitic job coded as run() method. All the tasks are put in a common queue and each worker thread then takes next task from it. Such a construction is called thread pool and Java runtime library has several implementations for it: http://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
In case tasks has mutual data dependencies, more sophisticated facilities should be used: fork-join pool (mentioned by endriu_l), CompletableFuture.html from Java8, dataflow and actor libraries (they are numerous and easy to find, I only refer df4j2 of mine).
I've a long running task which consist of 2 parts. First part is intensive I/O operation (and almost no CPU), second part is intensive CPU operation. I would have 2 threads running this task so that CPU part of the task in one thread is bound to I/O part of this task running by another thread. In other words, I would like to run CPU-intensive part in thread #1 while thread #2 runs I/O operation and vise versa, so I utilize maximum CPU and I/O.
Is there some generic solution in Java for more then 2 threads?
Make a class which extends Thread. Now make two objects of that class and handle your logic of I/O and CPU part in two separate functions.
Take a look to the Executors class:
//Create a thread pool
ExecutorService executorService = Executors.newFixedThreadPool(10);
//Launch a new thread
executorService.execute(new Runnable() {
public void run() {
System.out.println("Asynchronous task");
}
});
//Try to terminate all the alive threads
executorService.shutdown();
This may help you:
Task Execution and Scheduling
I do not think that is possible. The scheduling of threads is fundamentally handled by the operating system. The OS decides which thread is run on which logical CPU. On the level of the application you can only give some hints to the OS scheduler, like priority, but you cannot force a certain scheduling.
It might be possible with languages like C or C++ by invoking OS specific APIs, but on the abstraction layer on Java you cannot force that behaviour.
splitting the work in 2 threads is an artifical constraint, and as any artifacal constraint, it can only limit the level of parallelism. If two parts are logically sequential (e.g. io work must preceede the cpu intensive work in order to provide data), then they should be executed sequentially on the same thread. If you have several independent tasks, they should be executed on different threads. Problems may arise if you have thousand of threads and they eat too much memory. Then you have to split your work in tasks and run that tasks on a thread pool (executor service). This is more complicated approach as you may need to coordinate starting of your tasks but there are no standard means to do so. One of solutions to coordinate small tasks is actor execution model, but it is impossible to say beforehand if actor model fits your needs.
If I have 2 CPUs and schedule 1000 tasks for the fork / join framework to work on, will the tasks be executed in a maximum of 2 at a time, or will more tasks be executed in parallel on the same CPU? (say, maybe one task is waiting for I/O, in which case the CPU would become idle and another thread could run)
If you do not include any restriction yourself, none will be applied and Java will fork as many threads as it can (maybe all 1000 depending on system restrictions). This is not ideal. If you're doing a computation which is likely to have some IO time but not be IO bound even at large amounts of concurrent processing, you might be able to justify running one more thread then the available number of CPUs. Running all 1000 at once would not be wise.
If I have 2 CPUs and schedule 1000 tasks for the fork / join framework to work on, will the tasks be executed in a maximum of 2 at a time, or will more tasks be executed in parallel on the same CPU?
If you have a dual core CPU, you can only actually execute 2 threads at once.
According to the ForkJoin documentation:
A ForkJoinPool is constructed with a given target parallelism level;
by default, equal to the number of available processors. The pool attempts to maintain enough active (or available) threads by
dynamically adding, suspending, or resuming internal worker threads,
even if some tasks are stalled waiting to join others. However, no
such adjustments are guaranteed in the face of blocked IO or other
unmanaged synchronization.
So it will probably run them two at a time on your 2 CPUs, possibly four at a time if the CPUs are hyperthreaded (I'm not certain). If you aren't happy with the default level of parallelism, you can specify a requested level of parallelism by calling the ForkJoinPool constructor that takes the level of parallelism as a parameter.
Is hyperthreading enabled on the cpu? If so you may run 2+ processes at the same time.
Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously.
Hyper-threading # wikipedia
I made a test to verify this:
import java.util.concurrent.*;
public class Test {
private static class TestAction extends RecursiveAction {
private int i;
public TestAction(int i) {
this.i = i;
}
protected void compute() {
if (i == 0) {
invokeAll(new TestAction(1), new TestAction(2), new TestAction(3),
new TestAction(4), new TestAction(5), new TestAction(6));
return;
}
System.out.println(i + " start");
try { Thread.sleep(2000); } catch (Exception e) { }
System.out.println(i + " end");
}
}
public static void main(String[] args) {
new ForkJoinPool().invoke(new TestAction(0));
}
}
The results of that running with the reference Oracle implementation is:
1 start
6 start <- wait 2 seconds
1 end
2 start
6 end
5 start <- wait 2 seconds
2 end
3 start
5 end
4 start <- wait 2 seconds
4 end
3 end
The same behavior is consistent on both Linux and Mac OS X.
So the answer to the question is: yes, the tasks will be executed on exactly the number of CPUs specified by the parallelism parameter (or the total available CPUs by default). If CPU time becomes available and the tasks simply block waiting for something, then the framework will do nothing automatically to run other tasks.
Since the documentation I've seen so far is pretty vague about what exactly the framework is supposed to do if the CPU is free, this could be an implementation detail.
By default, the Fork/Join Framework tries to maintain the number of threads equal to one less than the number of cores (if a single core machine, then one thread is created). You can see this code in makeCommonPool method in ForkJoinPool class.
If you think that this under-utilises your CPU, you can provide a custom value for parallelism.
But most interestingly, there is a way to make ForkJoinPool create more threads when the current thread occupying the CPU blocks on IO. All you have to is to implement the code block which actually blocks on the IO inside an implementation of the block method of the ForkJoinPool.ManagedBlocker object, and pass that ManagedBlocker object to the managedBlock method of the ForkJoinPool class. When this is done, the ForkJoinPool checks if the current thread calling this method is an instance of a ForkJoinPoolWorkerThread. If it is, the ForkjoinPool compensates by creating new threads which can take over the CPU.
ForkJoinPool fjp = ForkJoinPool.common();
Runnable task = new Runnable(){
public void run(){
//Some cpu-intensive code
ForkJoinPool.managedBlock(new ForkJoinPool.ManagedBlocker(){
public boolean isReleasable(){
//return true if an IO/blocking operation is to be done.
}
public boolean block(){
//Do an IO Operation here
//return true if all blocking code has finished execution.
//return false if more blocking code is yet to execute.
}
});
//Some more CPU intensive code here
}
};
fjp.submit(task);
I have a for loop where the computation at iteration i does not depend on the computations done in the previous iterations.
I want to parallelize the for loop(my code is in java) so that the computation of multiple iterations can be run concurrently on multiple processors. Should I create a thread for the computation of each iteration, i.e. number of threads to be created is equal to the number of iterations(number of iterations are large in the for loop)? How to do this?
Here's a small example that you might find helpful to get started with parallelization. It assumes that:
You create an Input object that contains the input for each iteration of your computation.
You create an Output object that contains the output from computing the input of each iteration.
You want to pass in a list of inputs and get back a list of outputs all at once.
Your input is a reasonable chunk of work to do, so overhead isn't too high.
If your computation is really simple then you'll probably want to consider processing them in batches. You could do that by putting say 100 in each input. It uses as many threads as there are processors in your system. If you're dealing with purely CPU intensive tasks then that's probably the number you want. You'd want to go higher if they're blocked waiting for something else (disk, network, database, etc.)
public List<Output> processInputs(List<Input> inputs)
throws InterruptedException, ExecutionException {
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(threads);
List<Future<Output>> futures = new ArrayList<Future<Output>>();
for (final Input input : inputs) {
Callable<Output> callable = new Callable<Output>() {
public Output call() throws Exception {
Output output = new Output();
// process your input here and compute the output
return output;
}
};
futures.add(service.submit(callable));
}
service.shutdown();
List<Output> outputs = new ArrayList<Output>();
for (Future<Output> future : futures) {
outputs.add(future.get());
}
return outputs;
}
You should not do the thread handling manually. Instead:
create a reasonably-sized thread pool executor service (if your computations do no IO, use as many threads as you have cores).
Run a loop that submits each individual computation to the executor service and keeps the resulting Future objects. Note that if each computation consists of only a small amount of work, this will create a lot of overhead and possibly even be slower than a single-threaded program. In that case, submit jobs that do packets of computation as mdma suggests.
Run a second loop that collects the results from all the Futures (it will implicitly wait until all computations have finished)
shut down the executor service
No, you should not create one thread for each iteration. The optimum number of threads is related to the number of processors available - too many threads, and you waste too much time context switching for no added performance.
If you're not totally attached to Java, you might want to try a parallel high-performance C system like OpenMPI. OpenMPI is suitable for this kind of problem.
Don't create the threads yourself. I recommend you use the fork/join framework (jsr166y) and create tasks that iterate over a given range of items. It will take care of the thread management for you, using as many threads as the hardware supports.
Task granularity is the main issue here. If each iteration is relatively low computation (say less than 100 operations) then having each iteration executed as a separate task will introduce a lot of overhead of task scheduling. It's better to have each task accept a List of arguments to compute, and return the result as a list. That way you can have each task compute 1, 10 or thousands of elements, to keep task granulary at a reasonable level that balances keeping work available, and reducing task management overhead.
There is also a ParallelArray class in jsr166z, that allows repeated computation over an array. That may work for you, if the values you are computing are primitive types.
I am writing an optimazation algorithm which creates about 100 threads. Currently, I start them all at one time (for-loop) and after that I tell every thread that it should join().
My problem is now that each thread uses to much memory so the heap space exception won't take long. I want some kind of scheduling but don't know how to implement it.
I have something like this in mind: start 10 threads and every time one of these finishes start a new one. So that there are allways running 10 threads at a time until no thread is left.
Has someone an idea or knows how to realize something like this?
Thank you very much and regards from Cologne
Marco
Use a ThreadPoolExecutor with an appropriate maximum pool size.
Here's an example to get you started. First, what you'll need to import:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
And then what you need to put inside your method:
ExecutorService pool = Executors.newFixedThreadPool(10);
for(final Task task: tasks) {
pool.execute(new Runnable() {
#Override
public void run() {
task.execute();
}
});
}
pool.shutdown();
while(!pool.awaitTermination(1, TimeUnit.SECONDS)) {
System.out.println("Waiting for tasks to shutdown");
}
Some notes about the above:
You'll need to implement your own
Task class that actually implements
your algorithm
The task class doesn't have to just
have an execute method (in fact, if
it has that signature, you could just
get your task to implement Runnable
and avoid the anonymous inner class)
You'll need to make sure that
everything that you use is properly
synchronised. The classes in
java.util.concurrent.atomic are
quite good if you have shared state
you need to update (e.g. if you want
to have a counter for how many tasks
you've processed).
You typically only want as many
threads executing as there are cores
/ cpus on your machine. Often
performance often goes up when
numbers of threads goes down.
Normally you only use more threads if
your tasks spend a lot of time
blocked.
Instead of starting a new Thread to do a new task, you are much better off to:
have a queue of tasks to execute (instead of threads to run)
use a smaller pool of threads (as mentionned by Michael) to process these tasks.
The difference in speed and memory is huge, because you don't have to start and stop a thread for each task.
The package java.util.concurrent explains everything about this.
A book would be easier to read though :-(
Consider the number of cores in the machine you will be using. Performance will be best if the number of threads you normally have running equals the number of cores. As KLE says, use a thread pool.