Is fork / join multithreaded?

Is fork / join multithreaded? - java

If I have 2 CPUs and schedule 1000 tasks for the fork / join framework to work on, will the tasks be executed in a maximum of 2 at a time, or will more tasks be executed in parallel on the same CPU? (say, maybe one task is waiting for I/O, in which case the CPU would become idle and another thread could run)

If you do not include any restriction yourself, none will be applied and Java will fork as many threads as it can (maybe all 1000 depending on system restrictions). This is not ideal. If you're doing a computation which is likely to have some IO time but not be IO bound even at large amounts of concurrent processing, you might be able to justify running one more thread then the available number of CPUs. Running all 1000 at once would not be wise.
If I have 2 CPUs and schedule 1000 tasks for the fork / join framework to work on, will the tasks be executed in a maximum of 2 at a time, or will more tasks be executed in parallel on the same CPU?
If you have a dual core CPU, you can only actually execute 2 threads at once.

According to the ForkJoin documentation:
A ForkJoinPool is constructed with a given target parallelism level;
by default, equal to the number of available processors. The pool attempts to maintain enough active (or available) threads by
dynamically adding, suspending, or resuming internal worker threads,
even if some tasks are stalled waiting to join others. However, no
such adjustments are guaranteed in the face of blocked IO or other
unmanaged synchronization.
So it will probably run them two at a time on your 2 CPUs, possibly four at a time if the CPUs are hyperthreaded (I'm not certain). If you aren't happy with the default level of parallelism, you can specify a requested level of parallelism by calling the ForkJoinPool constructor that takes the level of parallelism as a parameter.

Is hyperthreading enabled on the cpu? If so you may run 2+ processes at the same time.
Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously.
Hyper-threading # wikipedia

I made a test to verify this:
import java.util.concurrent.*;
public class Test {
private static class TestAction extends RecursiveAction {
private int i;
public TestAction(int i) {
this.i = i;
}
protected void compute() {
if (i == 0) {
invokeAll(new TestAction(1), new TestAction(2), new TestAction(3),
new TestAction(4), new TestAction(5), new TestAction(6));
return;
}
System.out.println(i + " start");
try { Thread.sleep(2000); } catch (Exception e) { }
System.out.println(i + " end");
}
}
public static void main(String[] args) {
new ForkJoinPool().invoke(new TestAction(0));
}
}
The results of that running with the reference Oracle implementation is:
1 start
6 start <- wait 2 seconds
1 end
2 start
6 end
5 start <- wait 2 seconds
2 end
3 start
5 end
4 start <- wait 2 seconds
4 end
3 end
The same behavior is consistent on both Linux and Mac OS X.
So the answer to the question is: yes, the tasks will be executed on exactly the number of CPUs specified by the parallelism parameter (or the total available CPUs by default). If CPU time becomes available and the tasks simply block waiting for something, then the framework will do nothing automatically to run other tasks.
Since the documentation I've seen so far is pretty vague about what exactly the framework is supposed to do if the CPU is free, this could be an implementation detail.

By default, the Fork/Join Framework tries to maintain the number of threads equal to one less than the number of cores (if a single core machine, then one thread is created). You can see this code in makeCommonPool method in ForkJoinPool class.
If you think that this under-utilises your CPU, you can provide a custom value for parallelism.
But most interestingly, there is a way to make ForkJoinPool create more threads when the current thread occupying the CPU blocks on IO. All you have to is to implement the code block which actually blocks on the IO inside an implementation of the block method of the ForkJoinPool.ManagedBlocker object, and pass that ManagedBlocker object to the managedBlock method of the ForkJoinPool class. When this is done, the ForkJoinPool checks if the current thread calling this method is an instance of a ForkJoinPoolWorkerThread. If it is, the ForkjoinPool compensates by creating new threads which can take over the CPU.
ForkJoinPool fjp = ForkJoinPool.common();
Runnable task = new Runnable(){
public void run(){
//Some cpu-intensive code
ForkJoinPool.managedBlock(new ForkJoinPool.ManagedBlocker(){
public boolean isReleasable(){
//return true if an IO/blocking operation is to be done.
}
public boolean block(){
//Do an IO Operation here
//return true if all blocking code has finished execution.
//return false if more blocking code is yet to execute.
}
});
//Some more CPU intensive code here
}
};
fjp.submit(task);

Related

How to convert Java Threads to Kotlin coroutines?

I have this "ugly" Java code I need to convert to Kotlin idiomatic coroutines and I cant quite figure out how.
Thread[] pool=new Thread[2*Runtime.getRuntime().availableProcessors()];
for (int i=0;i<pool.length;i++)
pool[i]=new Thread(){
public void run() {
int y; while((y=yCt.getAndIncrement())<out.length) putLine(y,out[y]);
}
};
for (Thread t:pool) t.start();
for (Thread t:pool) t.join();
I think it is possible to implement using runBlocking but how do I deal with availableProcessors count?

I'll make some assumptions here:
putLine() is a CPU intensive and not IO operation. I assume this, because it is executed using threads number of 2 * CPU cores, which is usually used for CPU intensive tasks.
We just need to execute putLine() for each item in out. From the above code it is not clear if e.g. yCt is initially 0.
out isn't huge like e.g. millions of items.
You don't look for 1:1 the same code in Kotlin, but rather its equivalent.
Then the solution is really very easy:
coroutineScope {
out.forEachIndexed { index, item ->
launch(Dispatchers.Default) { putLine(index, item) }
}
}
Few words of explanation:
Dispatchers.Default coroutine dispatcher is used specifically for CPU calculations and its number of threads depends on the number of CPU cores. We don't need to create our own threads, because coroutines provide a suitable thread pool.
We don't handle a queue of tasks manually, because coroutines are lightweight and we can instead just schedule a separate coroutine per each item - they will be queued automatically.
coroutineScope() waits for its children, so we don't need to also manually wait for all asynchronous tasks. Any code put below coroutineScope() will be executed when all tasks finish.
There are some differences in behavior between the Java/threads and Kotlin/coroutines code:
Dispatchers.Default by default has the number of threads = CPU cores, not 2 * CPU cores.
In coroutines solution, if any task fail, the whole operation throws an exception. In the original code, errors are ignored and the application continues with inconsistent state.
In coroutines solution the thread pool is shared with other components of the application. This could be a desired behavior or not.

Thread swaps in and out automatically without any yield or sleep

I thought a thread only gives up its domination(control?) when we wrote something like yield, sleep inside the run method. The output of the following code I am expected to see is something like :
#1(5), #1(4), #1(3), #1(2), #1(1), #2(5), #2(4), #2(3), #2(2), #2(1), #3(5), #3(4), #3(3), #3(2), #3(1), #4(5), #4(4), #4(3), #4(2), #4(1), #5(5), #5(4), #5(3), #5(2), #5(1)
however, it turns out like all the thread are running at the same time.
the output:
#4(5), #2(5), #1(5), #1(4), #1(3), #1(2), #1(1), #3(5), #5(5), #3(4), #2(4), #2(
3), #4(4), #2(2), #3(3), #3(2), #5(4), #3(1), #2(1), #4(3), #4(2), #4(1), #5(3),
#5(2), #5(1)
I am so confused.
public class SimpleThread extends Thread {
private int countDown = 5;
private static int threadCount = 0;
public SimpleThread() {
// Store the thread name:
super(Integer.toString(++threadCount));
start();
}
public String toString() {
return "#" + getName() + "(" + countDown + "), ";
}
public void run() {
while(true) {
System.out.print(this);
if(--countDown == 0)
return;
}
}
public static void main(String[] args) {
for(int i = 0; i < 5; i++)
new SimpleThread();
}
}

I thought a thread only gives up its domination(control?) when we wrote something like yield, sleep inside the run method.
Nope, not at all. Threads run in parallel, each executing concurrently with all other threads. It is common to get garbled output when you print to System.out from several threads at the same time.
It hasn't been necessary for threads or processes to explicitly yield control since the Windows 3.x days when the operating system used cooperative multitasking. UNIX operating systems use preemptive multitasking, as does every version of Windows since Windows 95. Preemptive multitasking means the operating system can suspend a thread at any point, for example when it's used up its time slice.
Having threads run in parallel is what enables programs to take advantage of the multi-core architectures that are common today. If only one thread could run at a time there'd be no benefit to having more than one CPU.

According to the specification thread execution order is not guaranteed. Thus you cannot expect that started thread will be finished before second starts.
According to Kathy Sierra oca/ocp java se 7 book (OCA/OCP Java SE 7 Programmer I & II Study Guide):
The thread scheduler is the part of the JVM (although most JVMs map Java threads
directly to native threads on the underlying OS) that decides which thread should
run at any given moment, and also takes threads out of the run state. Assuming a
single processor machine, only one thread can actually run at a time. Only one stack
can ever be executing at one time. And it's the thread scheduler that decides which
thread—of all that are eligible—will actually run. When we say eligible, we really
mean in the runnable state.
Any thread in the runnable state can be chosen by the scheduler to be the one and
only running thread. If a thread is not in a runnable state, then it cannot be chosen to be
the currently running thread. And just so we're clear about how little is guaranteed here:
The order in which runnable threads are chosen to run is not guaranteed.
Although queue behavior is typical, it isn't guaranteed. Queue behavior means
that when a thread has finished with its "turn," it moves to the end of the line of the
runnable pool and waits until it eventually gets to the front of the line, where it can
be chosen again. In fact, we call it a runnable pool, rather than a runnable queue, to
help reinforce the fact that threads aren't all lined up in some guaranteed order

How your threads are scheduled to run largely depend on the underlying OS and hardware. The OS decides when your threads get to execute and in what order.
For example, if you have multiple cores on your machine, the OS will likely try to run your threads concurrently (at the same time).
Even if you only have one core the OS will most likely try to be fair and give all your threads some time to execute since they have the same priority.

Java multithreading using .join(); Can you assign the idle thread to help still running threads?

I have a simple method which is multithreaded as follows:
int processors=Runtime.getRuntime().availableProcessors();
detectors[] theCores = new detectors[processors];
for(int i = 0; i < processors; i++){
theCores[i] = new detectors();
theCores[i].start();
}
for(int i = 0; i < processors; i++){
try{ // Waits for completion of all cores
theCores[i].join();}
catch(InterruptedException IntExp) {}
}
Using .join(); means that finished threads will pause until completion of all threads. Because all threads are not identical in speed, some will finish before others and there is sometimes quite a significant gap between the time it takes for the fastest thread to finish the method compared to the slowest thread. Is there any way to assign the cores that are finished and waiting to help out the remaining threads in their execution?

Read about Fork/Join in Java 7
The fork/join framework is distinct because it uses a work-stealing
algorithm. Worker threads that run out of things to do can steal tasks
from other threads that are still busy.

You can let the OS do this for you: instead of scheduling as many threads as processors/cores are available, start slightly more threads. If one thread is finished, the OS will provide the remaining threads with all the remaining CPU power. Depending on the type of work a load factor of 2 (threads = 2 * cores) can be a good choice. This should give you 100% CPU load for most of the time, whereas with a number of threads lower than the number of cores, the CPU load often is less then that, especially if the threads are memory or I/O intensive. In this case you should increase the factor even more.

"to help out the remaining threads in their execution", the work each thread does should be split in lesser parts (usually named tasks). Otherwise it is impossible to parallelize monolitic job coded as run() method. All the tasks are put in a common queue and each worker thread then takes next task from it. Such a construction is called thread pool and Java runtime library has several implementations for it: http://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
In case tasks has mutual data dependencies, more sophisticated facilities should be used: fork-join pool (mentioned by endriu_l), CompletableFuture.html from Java8, dataflow and actor libraries (they are numerous and easy to find, I only refer df4j2 of mine).

Java Executor with throttling/throughput control

I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers.
I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard" version out there somewhere I'd at least like to look at it first.

Take a look at guavas RateLimiter:
A rate limiter. Conceptually, a rate limiter distributes permits at a
configurable rate. Each acquire() blocks if necessary until a permit
is available, and then takes it. Once acquired, permits need not be
released. Rate limiters are often used to restrict the rate at which
some physical or logical resource is accessed. This is in contrast to
Semaphore which restricts the number of concurrent accesses instead of
the rate (note though that concurrency and rate are closely related,
e.g. see Little's Law).
Its threadsafe, but still #Beta. Might be worth a try anyway.
You would have to wrap each call to the Executor with respect to the rate limiter. For a more clean solution you could create some kind of wrapper for the ExecutorService.
From the javadoc:
final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
void submitTasks(List<Runnable> tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire(); // may wait
executor.execute(task);
}
}

The Java Executor doesn't offer such a limitation, only limitation by amount of threads, which is not what you are looking for.
In general the Executor is the wrong place to limit such actions anyway, it should be at the moment where the Thread tries to call the outside server. You can do this for example by having a limiting Semaphore that threads wait on before they submit their requests.
Calling Thread:
public void run() {
// ...
requestLimiter.acquire();
connection.send();
// ...
}
While at the same time you schedule a (single) secondary thread to periodically (like every 60 seconds) releases acquired resources:
public void run() {
// ...
requestLimiter.drainPermits(); // make sure not more than max are released by draining the Semaphore empty
requestLimiter.release(MAX_NUM_REQUESTS);
// ...
}

no more than say 100 tasks can be processed in a second -- if more
tasks get submitted they should get queued and executed later
You need to look into Executors.newFixedThreadPool(int limit). This will allow you to limit the number of threads that can be executed simultaneously. If you submit more than one thread, they will be queued and executed later.
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Future<?> result1 = threadPool.submit(runnable1);
Future<?> result2 = threadPool.submit(runnable2);
Futurte<SomeClass> result3 = threadPool.submit(callable1);
...
Snippet above shows how you would work with an ExecutorService that allows no more than 100 threads to be executed simultaneously.
Update:
After going over the comments, here is what I have come up with (kinda stupid). How about manually keeping a track of threads that are to be executed ? How about storing them first in an ArrayList and then submitting them to the Executor based on how many threads have already been executed in the last one second.
So, lets say 200 tasks have been submitted into our maintained ArrayList, We can iterate and add 100 to the Executor. When a second passes, we can add few more threads based on how many have completed in theExecutor and so on

Depending on the scenario, and as suggested in one of the previous responses, the basic functionalities of a ThreadPoolExecutor may do the trick.
But if the threadpool is shared by multiple clients and you want to throttle, to restrict the usage of each one of them, making sure that one client won't use all the threads, then a BoundedExecutor will do the work.
More details can be found in the following example:
http://jcip.net/listings/BoundedExecutor.java

Personally I found this scenario quite interesting. In my case, I wanted to stress that the interesting phase to throttle is the consuming side one, as in classical Producer/Consumer concurrent theory. That's the opposite of some of the suggested answers before. This is, we don't want to block the submitting thread, but block the consuming threads based in a rate (tasks/second) policy. So, even if there are tasks ready in the queue, executing/consuming Threads may block waiting to meet the throtle policy.
That said, I think a good candidate would be the Executors.newScheduledThreadPool(int corePoolSize). This way you would need a simple queue in front of the executor (a simple LinkedBlockingQueue would suit), and then schedule a periodic task to pick actual tasks from the queue (ScheduledExecutorService.scheduleAtFixedRate). So, is not an straightforward solution, but it should perform goog enough if you try to throttle the consumers as discussed before.

Can limit it inside Runnable:
public static Runnable throttle (Runnable realRunner, long delay) {
Runnable throttleRunner = new Runnable() {
// whether is waiting to run
private boolean _isWaiting = false;
// target time to run realRunner
private long _timeToRun;
// specified delay time to wait
private long _delay = delay;
// Runnable that has the real task to run
private Runnable _realRunner = realRunner;
#Override
public void run() {
// current time
long now;
synchronized (this) {
// another thread is waiting, skip
if (_isWaiting) return;
now = System.currentTimeMillis();
// update time to run
// do not update it each time since
// you do not want to postpone it unlimited
_timeToRun = now+_delay;
// set waiting status
_isWaiting = true;
}
try {
Thread.sleep(_timeToRun-now);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
// clear waiting status before run
_isWaiting = false;
// do the real task
_realRunner.run();
}
}};
return throttleRunner;
}
Take from JAVA Thread Debounce and Throttle

How to ensure a Thread won't delay in Java?

I wrote a multi threading programme, which have two to four thread at the same time.
One of the thread is time critical thread, it will be called every 500 milliseconds, it is not allow to delay more than 10 milliseconds. But when other thread have more loading, I find that some delay, around two millisecond is occurred. (Print the timestamp to show it) So, I worry that after running for a long time it will delay more than 10 milliseconds, except from check the timestamp, and adjust the looping interval to make sure the time is not delay more than 10 milliseconds, is there any way to make it safe?
Thanks.

Sounds like you need Real-Time Java

If timing is critical, I use a busy wait on a core which is dedicated to that thread. This can give you << 10 micro-second jitter most of the time. Its a bit extreme and will result in the logical thread not being used for anything else.
This is the library I use. You can use it to reserve a logical thread or a whole core. https://github.com/peter-lawrey/Java-Thread-Affinity
By using isolcpus= in grub.conf on Linux you can ensure that the logical thread or core is not used for any else (except the 100 Hz timer and power management which are relatively small and < 2 us delay)

You can set your threads priorities:
myCriticalThread.setPriority(Thread.MAX_PRIORITY);
otherThread.setPriority(Thread.NORM_PRIORITY); // the default
yetAnotherThread.setPriority(Thread.MIN_PRIORITY);
It won't really guarantee anything though.

There is no guarantee that your thread isn't delayed, because the OS may decide to give other processes precedence (unless you put effort in setting up a complete real-time system including a modified OS). That being said, for basic tasks, you should use a ScheduledExecutorService like this:
class A {
private final ScheduledExecutorService exe = Executors.newScheduledThreadPool(1);
public void startCriticalAction(Runnable command) {
this.exe.scheduleAtFixedRate(command, 100, 100, TimeUnit.MILLISECONDS);
}
public void shutdown() {
this.exe.shutdown();
}
}
The executor service will do its best to execute the task every 100ms. You should not develop this functionality yourself, because a lot of things can go wrong.

Creep up on the timeout:
waitFor(int timeout)
{
dateTime wallTimeEnd;
wallTimeEnd=now()+(msToDateTime(timeout));
int interval=timeout/2;
while(true){
if(interval>10){
sleep(interval);
interval=dateTimeToMs(wallTimeEnd-now()) / 2;
}
else
{
do{
sleep(0);
interval=dateTimeToMs(wallTimeEnd-now());
}while(interval>0);
}
}
This only wastes a core for 5-10ms

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.