Completable future waste more time? - java

I have a question, I am learning about CompletableFuture of Java 8, I did a dummy with One method running with runAsync of Completable future, it is a simple for 0 to 10 and in paralen a for o to 5 In the second method I run the same for to 0 to 20, but the method of runAsyn takes longer than the other method, It is normal?
Shouldn't the asynchronous method last the same or less than the other method?
Here is the code.
public class Sample{
public static void main(String x[]) throws InterruptedException {
runAsync();
System.out.println("========== SECOND TESTS ==========");
runSync();
}
static void runAsync() throws InterruptedException {
long startTimeOne = System.currentTimeMillis();
CompletableFuture<Void> cf = CompletableFuture.runAsync(() -> {
for (int i = 0; i < 10L; i++) {
System.out.println(" Async One");
}
});
for (int i = 0; i < 5; i++) {
System.out.println("two");
}
System.out.println("It is ready One? (1) " + cf.isDone());
System.out.println("It is ready One? (2)" + cf.isDone());
System.out.println("It is ready One? (3)" + cf.isDone());
System.out.println("It is ready One? (4)" + cf.isDone());
System.out.println("It is ready One? (5)" + cf.isDone());
System.out.println("It is ready One? (6)" + cf.isDone());
long estimatedTimeOne = System.currentTimeMillis() - startTimeOne;
System.out.println("Total time async: " + estimatedTimeOne);
}
static void runSync() {
long startTimeTwo = System.currentTimeMillis();
for (int i = 0; i < 20; i++) {
System.out.println("No async");
}
long estimatedTimeTwo = System.currentTimeMillis() - startTimeTwo;
System.out.println("Total time no async: " + estimatedTimeTwo);
}
}
The normal for waste 1 milisecond and the runAsync waste 54 miliseconds
Here is the result screenshot

First, you are violating basic rules mentioned in How do I write a correct micro-benchmark in Java?
Most notably, you’re running both approaches within the same runtime and allow them to affect each other.
Besides that, you are getting an output that is a sequence of messages, which is showing the fundamental problem of your operation: you can not print concurrently. The output system itself has to ensure that the printing will end up showing a sequential behavior.
When you are performing actions that can’t run in parallel through a concurrent framework, you can’t gain performance, you can only add thread communication overhead.
Besides that, the operations are not even the same:
the action you are passing to runAsync uses 10L as end boundary, in other words, is performing a long comparison where all other loops use int
"It is ready One? (6)" + cf.isDone() is performing two operations that do not appear in the sequential variant. First, polling the status of the CompletableFuture, which must be done with inter-thread semantics. Second, it bears string concatenation. Both are potentially expensive operations
The async variant is printing 21 messages whereas the sequential is printing 20. Even the total amount of characters to print is roughly 50% more in the async operation
These points may serve as examples of how easily you can do things wrong in a manual benchmark. But they do not affect the outcome significantly, due to the fundamental aspect mentioned before them. You can’t gain a performance advantage of doing the printing asynchronously at all.
Note that the output is quite consistent in your specific case. Since the common Fork/Join thread pool has not been used before your asynchronous operation, it needs to start a new thread when you submit your job, which takes so long that the subsequent local loop printing "two" completes before the asynchronous operation even starts. The next operation, polling cf.isDone() and performing string concatenation, on the other hand, is so slow, that the asynchronous operation completes entirely before these six print statements complete.
When you change the code to
CompletableFuture<Void> cf = CompletableFuture.runAsync(() -> {
for (int i = 0; i < 10; i++) {
System.out.println("Async One");
}
});
for(int i = 0; i < 10; i++) {
System.out.println("two");
}
cf.join();
you still can’t get a performance advantage, but the performance difference will be much smaller. When you add a statement like
ForkJoinPool.commonPool().execute(() -> System.out.println());
at the beginning of the main method, to ensure that the thread pool does not need to get initialized within the measured method, the perceived overhead may even reduce further.
Further, you may swap the order of the runAsync(); and runSync(); method invocations in the main method, to see how first-time execution effects influence the result when you run the two methods within the same JVM.
This all is not enough to make it a reliable benchmark but should help to understand the things that will go wrong when not understanding the pitfalls of doing a micro-benchmark.

Related

Rest parellel calls to service -Multithreading in java

I have a rest call api where max count of result return by the api is 1000.start page=1
{
"status": "OK",
"payload": {
"EMPList":[],
count:5665
}
So to get other result I have to change the start page=2 and again hit the service.again will get 1000 results only.
but after first call i want to make it as a parallel call and I want to collect the result and combine it and send it back to calling service in java. Please suggest i am new to java.i tried using callable but it's not working
It seems to me that ideally you should be able to configure your max count to one appropriate for your use case. I'm assuming you aren't able to do that. Here is a simple, lock-less, multi threading scheme that acts as a simple reduction operation for your two network calls:
// online runnable: https://ideone.com/47KsoS
int resultSize = 5;
int[] result = new int[resultSize*2];
Thread pg1 = new Thread(){
public void run(){
System.out.println("Thread 1 Running...");
// write numbers 1-5 to indexes 0-4
for(int i = 0 ; i < resultSize; i ++) {
result[i] = i + 1;
}
System.out.println("Thread 1 Exiting...");
}
};
Thread pg2 = new Thread(){
public void run(){
System.out.println("Thread 2 Running");
// write numbers 5-10 to indexes 5-9
for(int i = 0 ; i < resultSize; i ++) {
result[i + resultSize] = i + 1 + resultSize;
}
System.out.println("Thread 2 Exiting...");
}
};
pg1.start();
pg2.start();
// ensure that pg1 execution finishes
pg1.join();
// ensure that pg2 execution finishes
pg2.join();
// print result of reduction operation
System.out.println(Arrays.toString(result));
There is a very important caveat with this implementation however. You will notice that both of the threads DO NOT overlap in their memory writes. This is very important as if you were to simply change our int[] result to ArrayList<Integer> this could lead to catastrophic failure in our reduction operation between the two threads called a Race Condition (I believe the standard ArrayList implementation in Java is not thread safe). Since we can guarantee how large our result will be I would highly suggest sticking to my usage of an array for this multi-threaded implementation as ArrayLists hide a lot of implementation logic from you that you likely won't understand until you take a basic data-structures course.

Writing a Java program that uses threads to calculate an expression that ignores order of operations?

I am trying to author a Java program that uses threads to calculate an expression such as:
3 + 4 / 7 + 4 * 2
and outputs
Enter problem: 3 + 4 / 7 + 4 * 2
Thread-0 calculated 3+4 as 7
Thread-1 calculated 7/7 as 1
Thread-2 calculated 1+4 as 5
Thread-3 calculated 5*2 as 10
Final result: 10
In this exercise, we are ignoring order of operations. The expression is entered via user input. The goal is to get a separate thread to perform each calculation. I absolutely want each thread to perform each of the individual calculations, as I have listed above.
My honest, professional advice is don't try to use multithreading for this problem.
Learn to write clear, robust single-threaded code first. Learn how to debug it. Learn how to write the same thing in lots of different ways. It is only then that you can start to introduce the enormous complexity that is multithreading, and stand any chance of it being correct.
And learn, by reading about how to write multithreaded code correctly, what problems benefit from multithreading. This problem does not, because you need the result of the previous arithmetic operation as an input to the next.
I am only answering because of the terrible advice in comments to use global variables. Don't. This is not a good way to write multithreaded code, even in such a simple example. Even in single-threaded code, mutable global state is something which should be avoided if at all possible.
Keep your mutable state as tightly controlled as you can. Create a Runnable subclass which holds the operation you are going to perform:
class Op implements Runnable {
final int operand1, operand2;
final char oprator;
int result;
Op(int operand1, char oprator, int operand2) {
// Initialize fields.
}
#Override public void run() {
result = /* code to calculate `operand1 (oprator) operand2` */;
}
}
Now, you can calculate, say, 1 + 2 using:
Op op = new Op(1, '+', 2);
Thread t = new Thread(op);
t.start();
t.join();
int result = op.result;
(Or, you could have just used int result = 1 + 2;...)
So you can now use this in a loop:
String[] tokens = eqn.split(" ");
int result = Integer.parseInt(tokens[0]);
for (int t = 1; t < tokens.length; t += 2) {
Op op = new Op(
result,
result, tokens[t].charAt(0),
Integer.parseInt(tokens[t+1]));
Thread t = new Thread(op);
t.start();
t.join();
result = op.result;
}
All of the mutable state is confined to the scope of the op variable. If you, say, want to run a second calculation, you don't have to worry about what previous state is still hanging around: you don't have to reset anything before another run; you can invoke this code in parallel, if you want, without interference between runs.
But all of this loop could be written more cleanly - and faster - using a simple method call:
for (int t = 1; t < tokens.length; t += 2) {
result = method(
result,
result, tokens[t].charAt(0),
Integer.parseInt(tokens[t+1]));
}
Where method is a method containing /* code to calculate operand1 (oprator) operand2 */.

Compiler ignore threads priorities

I tried to compile the example from Thinking in Java by Bruce Eckel:
import java.util.concurrent.*;
public class SimplePriorities implements Runnable {
private int countDown = 5;
private volatile double d; // No optimization
private int priority;
public SimplePriorities(int priority) {
this.priority = priority;
}
public String toString() {
return Thread.currentThread() + ": " + countDown;
}
public void run() {
Thread.currentThread().setPriority(priority);
while(true) {
// An expensive, interruptable operation:
for(int i = 1; i < 100000; i++) {
d += (Math.PI + Math.E) / (double)i;
if(i % 1000 == 0)
Thread.yield();
}
System.out.println(this);
if(--countDown == 0) return;
}
}
public static void main(String[] args) {
ExecutorService exec = Executors.newCachedThreadPool();
for(int i = 0; i < 5; i++)
exec.execute(
new SimplePriorities(Thread.MIN_PRIORITY));
exec.execute(
new SimplePriorities(Thread.MAX_PRIORITY));
exec.shutdown();
}
}
According to the book, the output has to look like:
Thread[pool-1-thread-6,10,main]: 5
Thread[pool-1-thread-6,10,main]: 4
Thread[pool-1-thread-6,10,main]: 3
Thread[pool-1-thread-6,10,main]: 2
Thread[pool-1-thread-6,10,main]: 1
Thread[pool-1-thread-3,1,main]: 5
Thread[pool-1-thread-2,1,main]: 5
Thread[pool-1-thread-1,1,main]: 5
...
But in my case 6th thread doesn't execute its task at first and threads are disordered. Could you please explain me what's wrong? I just copied the source and didn't add any strings of code.
The code is working fine and with the output from the book.
Your IDE probably has console window with the scroll bar - just scroll it up and see the 6th thread first doing its job.
However, the results may differ depending on OS / JVM version. This code runs as expected for me on Windows 10 / JVM 8
There are two issues here:
If two threads with the same priority want to write output, which one goes first?
The order of threads (with the same priority) is undefined, therefore the order of output is undefined. It is likely that a single thread is allowed to write several outputs in a row (because that's how most thread schedulers work), but it could also be completely random, or anything in between.
How many threads will a cached thread pool create?
That depends on your system. If you run on a dual-core system, creating more than 4 threads is pointless, because there hardly won't be any CPU available to execute those threads. In this scenario further tasks will be queued and executed only after earlier tasks are completed.
Hint: there is also a fixed-size thread pool, experimenting with that should change the output.
In summary there is nothing wrong with your code, it is just wrong to assume that threads are executed in any order. It is even technically possible (although very unlikely), that the first task is already completed before the last task is even started. If your book says that the above order is "correct" then the book is simply wrong. On an average system that might be the most likely output, but - as above - with threads there is never any order, unless you enforce it.
One way to enforce it are thread priorities - higher priorities will get their work done first - you can find other concepts in the concurrent package.

How can I properly block a thread until timeout starts?

I would like to run several tasks in parallel until a certain amount of time has passed. Let us suppose those threads are CPU-heavy and/or may block indefinitely. After the timeout, the threads should be interrupted immediately, and the main thread should continue execution regardless of unfinished or still running tasks.
I've seen a lot of questions asking this, and the answers were always similar, often along the lines of "create thread pool for tasks, start it, join it on timeout"
The problem is between the "start" and "join" parts. As soon as the pool is allowed to run, it may grab CPU and the timeout will not even start until I get it back.
I have tried Executor.invokeAll, and found that it did not fully meet the requirements. Example:
long dt = System.nanoTime ();
ExecutorService pool = Executors.newFixedThreadPool (4);
List <Callable <String>> list = new ArrayList <> ();
for (int i = 0; i < 10; i++) {
list.add (new Callable <String> () {
#Override
public String call () throws Exception {
while (true) {
}
}
});
}
System.out.println ("Start at " + (System.nanoTime () - dt) / 1000000 + "ms");
try {
pool.invokeAll (list, 3000, TimeUnit.MILLISECONDS);
}
catch (InterruptedException e) {
}
System.out.println ("End at " + (System.nanoTime () - dt) / 1000000 + "ms");
Start at 1ms
End at 3028ms
This (27 ms delay) may not seem too bad, but an infinite loop is rather easy to break out of - the actual program experiences ten times more easily. My expectation is that a timeout request is met with very high accuracy even under heavy load (I'm thinking along the lines of a hardware interrupt, which should always work).
This is a major pain in my particular program, as it needs to heed certain timeouts rather accurately (for instance, around 100 ms, if possible better). However, starting the pool often takes as long as 400 ms until I get control back, pushing past the deadline.
I'm a bit confused why this problem is almost never mentioned. Most of the answers I have seen definitely suffer from this. I suppose it may be acceptable usually, but in my case it's not.
Is there a clean and tested way to go ahead with this issue?
Edited to add:
My program is involved with GC, even though not on a large scale. For testing purposes, I rewrote the above example and found that the results given are very inconsistent, but on average noticeably worse than before.
long dt = System.nanoTime ();
ExecutorService pool = Executors.newFixedThreadPool (40);
List <Callable <String>> list = new ArrayList <> ();
for (int i = 0; i < 10; i++) {
list.add (new Callable <String> () {
#Override
public String call () throws Exception {
String s = "";
while (true) {
s += Long.toString (System.nanoTime ());
if (s.length () > 1000000) {
s = "";
}
}
}
});
}
System.out.println ("Start at " + (System.nanoTime () - dt) / 1000000 + "ms");
try {
pool.invokeAll (list, 1000, TimeUnit.MILLISECONDS);
}
catch (InterruptedException e) {
}
System.out.println ("End at " + (System.nanoTime () - dt) / 1000000 + "ms");
Start at 1ms
End at 1189ms
invokeAll should work just fine. However, it is vital that you write your tasks to properly respond to interrupts. When catching InterruptedException, they should exit immediately. If your code is catching IOException, each such catch-block should be preceded with something like:
} catch (InterruptedIOException e) {
logger.log(Level.FINE, "Interrupted; exiting", e);
return;
}
If you are using Channels, you will want to handle ClosedByInterruptException the same way.
If you perform time-consuming operations that don't catch the above exceptions, you need to check Thread.interrupted() periodically. Obviously, checking more often is better, though there will be a point of diminishing returns. (Meaning, checking it after every single statement in your task probably isn't useful.)
if (Thread.interrupted()) {
logger.fine("Interrupted; exiting");
return;
}
In your example code, your Callable is not checking the interrupt status at all, so my guess is that it never exits. An interrupt does not actually stop a thread; it just signals the thread that it should terminate itself on its own terms.
Using the VM option -XX:+PrintGCDetails, I found that the GC runs more rarely, but with a far larger time delay than expected. That delay just so happens to coincide with the spikes I experienced.
A mundane and sad explanation for the observed behavior.

Fibonacci on Java ExecutorService runs faster sequentially than in parallel

I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).
Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.
To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.
Any idea why this is happening?
BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.
public class MainSimpler {
static int N=35;
static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
static ExecutorService executor;
public static void main(String[] args) {
int nThreads=2;
System.out.println("Number of threads = "+nThreads);
executor = Executors.newFixedThreadPool(nThreads);
Executable.inQueue = new AtomicInteger(nThreads);
long before = System.currentTimeMillis();
System.out.println("Fibonacci "+N+" is ... ");
executor.submit(new FibSimpler(N));
waitToFinish();
System.out.println(result.get());
long after = System.currentTimeMillis();
System.out.println("Duration: " + (after - before) + " milliseconds\n");
}
private static void waitToFinish() {
while (0 < pendingTasks.get()){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
executor.shutdown();
}
}
class FibSimpler implements Runnable {
int N;
FibSimpler (int n) { N=n; }
#Override
public void run() {
compute();
MainSimpler.pendingTasks.decrementAndGet(); // see text
}
void compute() {
int n = N;
if (n <= 1) {
MainSimpler.result.addAndGet(n); // see text
return;
}
MainSimpler.executor.submit(new FibSimpler(n-1));
MainSimpler.pendingTasks.incrementAndGet(); // see text
N = n-2;
compute(); // similar to the F/J counterpart
}
}
Runtime (approximately):
1 thread : 11 seconds
2 threads: 19 seconds
4 threads: 19 seconds
Update:
I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??
It might be related to the fact that the size of each "task" submitted
to the executor service is really small.
This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.
But still it must be the same number also if I set nThreads to 1.
I'm guessing you meant 'to higher than 1' here.
You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.
You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).

Categories

Resources