How to test task performance, using multitheading? - java

I have some exercises, and one of them refers to concurrency. This theme is new for me, however I spent 6 hours and finally solve my problem. But my knowledge of corresponding API is poor, so I need advice: is my solution correct or may be there is more appropriate way.
So, I have to implement next interface:
public interface PerformanceTester {
/**
* Runs a performance test of the given task.
* #param task which task to do performance tests on
* #param executionCount how many times the task should be executed in total
* #param threadPoolSize how many threads to use
*/
public PerformanceTestResult runPerformanceTest(
Runnable task,
int executionCount,
int threadPoolSize) throws InterruptedException;
}
where PerformanceTestResult contains total time (how long the whole performance test took in total), minimum time (how long the shortest single execution took) and maximum time (how long the longest single execution took).
So, I learned many new things today - about thread pools, types Executors, ExecutorService, Future, CompletionService etc.
If I had Callable task, I could make next:
Return current time in the end of call() procedure.
Create some data structure (some Map may be) to store start time and Future object, that retuned by fixedThreadPool.submit(task) (do this executionCount times, in loop);
After execution I could just subtract start time from end time for every Future.
(Is this right way in case of Callable task?)
But! I have only Runnable task, so I continued looking. I even create FutureListener implements Callable<Long>, that have to return time, when Future.isDone(), but is seams little crazy for my (I have to double threads count).
So, eventually I noticed CompletionService type with interesting method take(), that Retrieves and removes the Future representing the next completed task, waiting if none are yet present., and very nice example of using ExecutorCompletionService. And there is my solution.
public class PerformanceTesterImpl implements PerformanceTester {
#Override
public PerformanceTestResult runPerformanceTest(Runnable task,
int executionCount, int threadPoolSize) throws InterruptedException {
long totalTime = 0;
long[] times = new long[executionCount];
ExecutorService pool = Executors.newFixedThreadPool(threadPoolSize);
//create list of executionCount tasks
ArrayList<Runnable> solvers = new ArrayList<Runnable>();
for (int i = 0; i < executionCount; i++) {
solvers.add(task);
}
CompletionService<Long> ecs = new ExecutorCompletionService<Long>(pool);
//submit tasks and save time of execution start
for (Runnable s : solvers)
ecs.submit(s, System.currentTimeMillis());
//take Futures one by one in order of completing
for (int i = 0; i < executionCount; ++i) {
long r = 0;
try {
//this is saved time of execution start
r = ecs.take().get();
} catch (ExecutionException e) {
e.printStackTrace();
return null;
}
//put into array difference between current time and start time
times[i] = System.currentTimeMillis() - r;
//calculate sum in array
totalTime += times[i];
}
pool.shutdown();
//sort array to define min and max
Arrays.sort(times);
PerformanceTestResult performanceTestResult = new PerformanceTestResult(
totalTime, times[0], times[executionCount - 1]);
return performanceTestResult;
}
}
So, what can you say? Thanks for replies.

I would use System.nanoTime() for higher resolution timings. You might want to ignroe the first 10,000 tests to ensure the JVM has warmed up.
I wouldn't bother creating a List of Runnable and add this to the Executor. I would instead just add them to the executor.
Using Runnable is not a problem as you get a Future<?> back.
Note: Timing how long the task spends in the queue can make a big difference to the timing. Instead of taking the time from when the task was created you can have the task time itself and return a Long for the time in nano-seconds. How the timing is done should reflect the use case you have in mind.
A simple way to convert a Runnable task into one which times itself.
finla Runnable run = ...
ecs.submit(new Callable<Long>() {
public Long call() {
long start = System.nanoTime();
run.run();
return System.nanoTime() - start;
}
});

There are many intricacies when writing performance tests in the JVM. You probably aren't worried about them as this is an exercise, but if you are this question might have more information:
How do I write a correct micro-benchmark in Java?
That said, there don't seem to be any glaring bugs in your code. You might want to ask this on the lower traffic code-review site if you want a full review of your code:
http://codereview.stackexchange.com

Related

Completable future waste more time?

I have a question, I am learning about CompletableFuture of Java 8, I did a dummy with One method running with runAsync of Completable future, it is a simple for 0 to 10 and in paralen a for o to 5 In the second method I run the same for to 0 to 20, but the method of runAsyn takes longer than the other method, It is normal?
Shouldn't the asynchronous method last the same or less than the other method?
Here is the code.
public class Sample{
public static void main(String x[]) throws InterruptedException {
runAsync();
System.out.println("========== SECOND TESTS ==========");
runSync();
}
static void runAsync() throws InterruptedException {
long startTimeOne = System.currentTimeMillis();
CompletableFuture<Void> cf = CompletableFuture.runAsync(() -> {
for (int i = 0; i < 10L; i++) {
System.out.println(" Async One");
}
});
for (int i = 0; i < 5; i++) {
System.out.println("two");
}
System.out.println("It is ready One? (1) " + cf.isDone());
System.out.println("It is ready One? (2)" + cf.isDone());
System.out.println("It is ready One? (3)" + cf.isDone());
System.out.println("It is ready One? (4)" + cf.isDone());
System.out.println("It is ready One? (5)" + cf.isDone());
System.out.println("It is ready One? (6)" + cf.isDone());
long estimatedTimeOne = System.currentTimeMillis() - startTimeOne;
System.out.println("Total time async: " + estimatedTimeOne);
}
static void runSync() {
long startTimeTwo = System.currentTimeMillis();
for (int i = 0; i < 20; i++) {
System.out.println("No async");
}
long estimatedTimeTwo = System.currentTimeMillis() - startTimeTwo;
System.out.println("Total time no async: " + estimatedTimeTwo);
}
}
The normal for waste 1 milisecond and the runAsync waste 54 miliseconds
Here is the result screenshot
First, you are violating basic rules mentioned in How do I write a correct micro-benchmark in Java?
Most notably, you’re running both approaches within the same runtime and allow them to affect each other.
Besides that, you are getting an output that is a sequence of messages, which is showing the fundamental problem of your operation: you can not print concurrently. The output system itself has to ensure that the printing will end up showing a sequential behavior.
When you are performing actions that can’t run in parallel through a concurrent framework, you can’t gain performance, you can only add thread communication overhead.
Besides that, the operations are not even the same:
the action you are passing to runAsync uses 10L as end boundary, in other words, is performing a long comparison where all other loops use int
"It is ready One? (6)" + cf.isDone() is performing two operations that do not appear in the sequential variant. First, polling the status of the CompletableFuture, which must be done with inter-thread semantics. Second, it bears string concatenation. Both are potentially expensive operations
The async variant is printing 21 messages whereas the sequential is printing 20. Even the total amount of characters to print is roughly 50% more in the async operation
These points may serve as examples of how easily you can do things wrong in a manual benchmark. But they do not affect the outcome significantly, due to the fundamental aspect mentioned before them. You can’t gain a performance advantage of doing the printing asynchronously at all.
Note that the output is quite consistent in your specific case. Since the common Fork/Join thread pool has not been used before your asynchronous operation, it needs to start a new thread when you submit your job, which takes so long that the subsequent local loop printing "two" completes before the asynchronous operation even starts. The next operation, polling cf.isDone() and performing string concatenation, on the other hand, is so slow, that the asynchronous operation completes entirely before these six print statements complete.
When you change the code to
CompletableFuture<Void> cf = CompletableFuture.runAsync(() -> {
for (int i = 0; i < 10; i++) {
System.out.println("Async One");
}
});
for(int i = 0; i < 10; i++) {
System.out.println("two");
}
cf.join();
you still can’t get a performance advantage, but the performance difference will be much smaller. When you add a statement like
ForkJoinPool.commonPool().execute(() -> System.out.println());
at the beginning of the main method, to ensure that the thread pool does not need to get initialized within the measured method, the perceived overhead may even reduce further.
Further, you may swap the order of the runAsync(); and runSync(); method invocations in the main method, to see how first-time execution effects influence the result when you run the two methods within the same JVM.
This all is not enough to make it a reliable benchmark but should help to understand the things that will go wrong when not understanding the pitfalls of doing a micro-benchmark.

Java - Run tasks in varying time intervals

This question is for a college assignment.
I want to run a block of code every n*2 seconds (e.g. wait 1 second and run and wait 2 seconds and run and wait 4 seconds and run, etc) up to 5 times.
I currently have something like this.
int timer = 1000;
int tryCounter = 0;
while( !condition() && counter < 5){
doTask();
Thread.sleep(timer);
timer *= 2;
counter++;
}
Although this works, my grade benefits from not using Thread.sleep(). I figured out using a ScheduledThreadPoolExecutor with a fixed rate would be one way to go but I cannot get it to work due to the fact that the interval is not actually fixed.
This is for a theoretical Distributed System with high concurrency capabilities so what matters is the high scalability.
I could get away with Thread.sleep() if there was really no benefit or a viable way of doing this by writing it on my report. So does anyone have any insight on this?
It is possible to schedule tasks with ScheduledExecutorService combined with some logic. The .schedule argument lets you specify a time unit to use. You can declare a variable that can handle the increment you are trying to do.
int timer = 1000;
ScheduledExecutorService service = Executors.newSingleThreadScheduledExecutor();
Runnable runnable = new Runnable() {
public void run()
{
//Move your code you want to implement here
}
};
//Increment your variable
while(!condition()) {
for(int i = 0; i < 5; i++) {
service.schedule(runnable, timer, TimeUnit.SECOND);
timer *= 2;
}
}
Moving your code execution within the runnable block and then scheduling it within a for loop where the timer is incremented should accomplish the effect you are going for. Hope that helps!

Fibonacci on Java ExecutorService runs faster sequentially than in parallel

I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).
Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.
To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.
Any idea why this is happening?
BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.
public class MainSimpler {
static int N=35;
static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
static ExecutorService executor;
public static void main(String[] args) {
int nThreads=2;
System.out.println("Number of threads = "+nThreads);
executor = Executors.newFixedThreadPool(nThreads);
Executable.inQueue = new AtomicInteger(nThreads);
long before = System.currentTimeMillis();
System.out.println("Fibonacci "+N+" is ... ");
executor.submit(new FibSimpler(N));
waitToFinish();
System.out.println(result.get());
long after = System.currentTimeMillis();
System.out.println("Duration: " + (after - before) + " milliseconds\n");
}
private static void waitToFinish() {
while (0 < pendingTasks.get()){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
executor.shutdown();
}
}
class FibSimpler implements Runnable {
int N;
FibSimpler (int n) { N=n; }
#Override
public void run() {
compute();
MainSimpler.pendingTasks.decrementAndGet(); // see text
}
void compute() {
int n = N;
if (n <= 1) {
MainSimpler.result.addAndGet(n); // see text
return;
}
MainSimpler.executor.submit(new FibSimpler(n-1));
MainSimpler.pendingTasks.incrementAndGet(); // see text
N = n-2;
compute(); // similar to the F/J counterpart
}
}
Runtime (approximately):
1 thread : 11 seconds
2 threads: 19 seconds
4 threads: 19 seconds
Update:
I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??
It might be related to the fact that the size of each "task" submitted
to the executor service is really small.
This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.
But still it must be the same number also if I set nThreads to 1.
I'm guessing you meant 'to higher than 1' here.
You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.
You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).

testing for game algorithm speed

How should I test my algorithm in terms of speed? The enhanced algorithm I made and the original algorithm search the same depth and they both give the same move, they only differ in terms of speed.
Do you know how I should test my new algorithm that I made? Aside from just subtracting the system time it started to system time it ended. What I'm trying to say is I need to do a little formal tests with little bit of formulas. Should I simulate all possible moves and tally the time each algorithm (enhanced and original) took time to decide on a move? I'm quite clueless here.
I've used the below method a few times and have had success. If you are interested in multi-threaded benchmarking refer to the link at the bottom of the page.
Timing a single-threaded task using CPU, system, and user time
Timing a single-threaded task using CPU, system, and user time
"User time" is the time spent running your application's own code.
"System time" is the time spent running OS code on behalf of your
application (such as for I/O).
Java 1.5 introduced the java.lang.management package to monitor the JVM. The entry point for the package is the ManagementFactory class. It's static methods return a variety of different "MXBean" objects that report JVM information. One such bean can report thread CPU and user time.
Call ManagementFactory . getThreadMXBean() to get a ThreadMXBean that describes current JVM threads. The bean's getCurrentThreadCpuTime() method returns the CPU time for the current thread. The getCurrentThreadUserTime() method returns the thread's user time. Both of these report times in nanoseconds (but see Appendix on Times and (lack of) nanosecond accuracy).
Be sure to call isCurrentThreadCpuTimeSupported() first, though. If it returns false (rare), the JVM implementation or OS does not support getting CPU or user times. In that case, you're back to using wall clock time.
import java.lang.management.*;
/** Get CPU time in nanoseconds. */
public long getCpuTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadCpuTime( ) : 0L;
}
/** Get user time in nanoseconds. */
public long getUserTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadUserTime( ) : 0L;
}
/** Get system time in nanoseconds. */
public long getSystemTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
(bean.getCurrentCpuTime( ) - bean.getCurrentThreadUserTime( )) : 0L;
}
These methods return the CPU, user, and system time since the thread started. To time a task after the thread has started, call one or more of these before and after the task and take the difference:
long startSystemTimeNano = getSystemTime( );
long startUserTimeNano = getUserTime( );
... do task ...
long taskUserTimeNano = getUserTime( ) - startUserTimeNano;
long taskSystemTimeNano = getSystemTime( ) - startSystemTimeNano;
Taken from, http://nadeausoftware.com/articles/2008/03/java_tip_how_get_cpu_and_user_time_benchmarking#TimingasinglethreadedtaskusingCPUsystemandusertime
Here is a sample program to capture timings, you can change this to suit your needs:
package com.quicklyjava;
public class Main {
/**
* #param args
* #throws InterruptedException
*/
public static void main(String[] args) throws InterruptedException {
// start time
long time = System.nanoTime();
for (int i = 0; i < 5; i++) {
System.out.println("Sleeping Zzzz... " + i);
Thread.sleep(1000);
}
long difference = System.nanoTime() - time;
System.out.println("It took " + difference + " nano seconds to finish");
}
}
And here is the output:
Sleeping Zzzz... 0
Sleeping Zzzz... 1
Sleeping Zzzz... 2
Sleeping Zzzz... 3
Sleeping Zzzz... 4
It took 5007507169 nano seconds to finish

Java strange performance inconsistency

I have a simple recursive method, a depth first search. On each call, it checks if it's in a leaf, otherwise it expands the current node and calls itself on the children.
I'm trying to make it parallel, but I notice the following strange (for me) problem.
I measure execution time with System.currentTimeMillis().
When I break the search into a number of subsearches and add the total execution time, I get a bigger number than the sequential search. I only measure execution time, no communication or sync, etc. I would expect to get the same time when I add the times of the subtasks. This happens even if I just run one task after the other, so without threads. If I just break the search into some subtasks and run the subtasks one after the other, I get a bigger time.
If I add the number of method calls for the subtasks, I get the same number as the sequential search. So, basically, in both cases I do the same number of method calls, but I get different times.
I'm guessing there's some overhead on initial method calls or something else caused by a JVM mechanism. Any ideas what could it be?
For example, one sequential search takes around 3300 ms. If I break it into 13 tasks, it takes a total time of 3500ms.
My method looks like this:
private static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
Whenever I call it, I do it like this:
for(int i = 0; i < num_tasks; i++){
long start = System.currentTimeMillis();
dfs(tasks[i]);
totalTime += (System.currentTimeMillis() - start);
}
Problem is totalTime increases with num_tasks and I would expect to stay the same because the method_calls variable stays the same.
You should average out the numbers over longer runs. Secondly the precision of currentTimeMillis may not be sufficient, you can try using System.nanoTime().
As in all the programming languages, whenever you call a procedure or a method, you have to push the environment, initialize the new one, execute the programs instructions, return the value on the stack and finally reset the previous environment. It cost a bit! Create a thread cost also more!
I suppose that if you enlarge the researching tree you will have benefit by the parallelization.
Adding system clock time for several threads seems a weird idea. Either you are interested in the time until processing is complete, in which case adding doesn't make sense, or in cpu usage, in which case you should only count when the thread is actually scheduled to execute.
What probably happens is that at least part of the time, more threads are ready to execute than the system has cpu cores, and the scheduler puts one of your threads to sleep, which causes it to take longer to complete. It makes sense that this effect is exacerbated the more threads you use. (Even if your program uses less threads than you have cores, other programs (such as your development environment, ...) might).
If you are interested in CPU usage, you might wish to query ThreadMXBean.getCurrentThreadCpuTime
I'd expect to see Threads used. Something like this:
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Puzzle {
static volatile long totalTime = 0;
private static int method_calls = 0;
/**
* #param args
*/
public static void main(String[] args) {
final int num_tasks = 13;
final State[] tasks = new State[num_tasks];
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for(int i = 0; i < num_tasks; i++){
threadPool.submit(new DfsRunner(tasks[i]));
}
try {
threadPool.shutdown();
threadPool.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException e) {
System.out.println("Interrupted");
}
System.out.println(method_calls + " Methods in " + totalTime + "msecs");
}
static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
}
With the runnable bit like this:
public class DfsRunner implements Runnable {
private State state;
public DfsRunner(State state) {
super();
this.state = state;
}
#Override
public void run() {
long start = System.currentTimeMillis();
Puzzle.dfs(state);
Puzzle.totalTime += (System.currentTimeMillis() - start);
}
}

Categories

Resources