I can't identify the issue with my parallel run timer - java

I have a program that applies a median filter to an array of over 2 million values.
I'm trying to compare run times for sequential vs parallel on the same dataset. So when I execute the program, it does 20 runs, every run is timed, and an average of the 20 times is outputted to the console.
ArrayList<Double> times = new ArrayList<>(20);//to calculate average run time
for (int run = 1; run < 21; run++) //algorithm will run 20 times
{
long startTime = System.nanoTime();
switch (method)
{
case 1: //Sequential
filt.seqFilter();
break;
case 2: //ForkJoin Framework
pool.invoke(filt); //pool is ForkJoin
break;
}
Double timeElapsed = (System.nanoTime() - startTime) / 1000000.0;
times.add(run - 1, timeElapsed);
System.out.println("Run " + run + ": " + timeElapsed + " milliseconds.");
}
times.remove(Collections.max(times)); //there's always a slow outlier
double timesSum = 0;
for (Double e : times)
{
timesSum += e;
}
double average = timesSum / 19;
System.out.println("Runtime: " + average);
filt is of type FilterObject which extends RecursiveAction. My overridden compute() method in FilterObject looks like this:
public void compute()
{
if (hi - lo <= SEQUENTIAL_THRESHOLD)
{
seqFilter();
}
else
{
FilterObject left = new FilterObject(lo, (hi + lo) / 2);
FilterObject right = new FilterObject((hi + lo) / 2, hi);
left.fork();
right.compute();
left.join();
}
}
seqFilter() processes the values between the lo and hi indices in the starting array and adds the processed values to a final array in the same positions. That's why there is no merging of arrays after left.join().
My run times for this are insanely fast for parallel - so fast that I think there must be something wrong with my timer OR with my left.join() statement. I'm getting average times of around 170 milliseconds for sequential with a filtering window of size 3 and 0.004 milliseconds for parallel. Why am I getting these values? I'm especially concerned that my join() is in the wrong place.
If you'd like to see my entire code, with all the classes and some input files, follow this link.

After some testing of your code I found the reason. It turned out that the ForkJoinPool runs one task instance only once. Subsequent invoke() calls with the same task instance will return immediately. So you have to reinstantiate the task for every run.
Another problem is with the parallel (standard threads) run. You are starting the threads but never waiting for them to finish before measuring the time. I think You could use the CyclicBarrier here.
With the mentioned fixes I get roughly the same time for ForkJoin and standard threads. And it's three times faster than sequential. Seems reasonable.
P.S. You are doing a micro-benchmark. It may be useful to read answers to that question to improve your benchmark accuracy: How do I write a correct micro-benchmark in Java?

Related

How to run a For Loop for 60 seconds maximum irrespective of the size of the loop and complete all the iterations

for(i=1;i<list.size();i++){
//do something
//For Eg: move marker to a new position on the map
}
I want the above loop to complete all the iterations irrespective of the size of the list and also want the entire task to run for 1 minute. (60 seconds)
I don't really know if this is what you want but I hope this helps.
import java.util.concurrent.TimeUnit;
for(i=1;i<list.size();i++){
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
// Execute thing you want to be executed every second
}
As explanation: you iterate through the for loop and the thread waits for one second before executing the code after the TimeUnit.SECONDS.sleep(1);.
If the list's size is 60 it would therefore take a minute for the loop to end.
Edit: It has occurred to me that it might be smarter to do a try-catch around the sleep function.
You can, for example, use System.nanoTime() to measure the duration of your loop, and then use TimeUnit.NANOSECONDS.sleep(...) to make it wait for the rest of time like this:
long start = System.nanoTime();
long desiredDuration = 60 * 1000 * 1000;
// your loop goes here
long duration = System.nanoTime() - start;
if (duration < desiredDuration)
TimeUnit.NANOSECONDS.sleep(desiredDuration - duration);
The best possible solution is to compute the desired time first and then run the loop to that extent.
long finish=System.currentTimeMillis() + 60000;
while(System.currentTimeMillis() != finish)
{
//statements;
//statements;
}
If you are trying to equip the CPU and keep it idle for this time the process is known as busy waiting but is not considered convenient in many cases so i recommend to use Thread.sleep(duration) for this purpose.
Would like to receive further queries from your side.
To spread N amount of invocations uniformly across a minute, you'll have to set the delay in between the invocations to the value 60/(N-1). The -1 is optional but causes the first and last invocations to be exactly 60 seconds apart. (just like how a ladder with N rungs has N-1 spaces)
Of course, using sleep() with the number calculated above is not only subject to round-off errors, but also drift, because you do stuff between the delays, and that stuff also takes time.
A more accurate solution is to subtract the time at which each invocation should occur (defined by startTime + 60*i/(N-1)) from the current time. Reorder and reformulate those formulas and you can subtract the 'time that should have elapsed for the next invocation' from the already elapsed time.
Of course 'elapsed time' should be calculated using System.nanoTime() and not System.currentTimeMillis() as the latter can jump when the clock changes or the computer resumes from stand-by.
For this example I changed 60 seconds to 6 seconds so you can more easily see what's going on when you run it.
public static void main(String... args) throws Exception {
int duration = 6; // seconds
List<Double> list = IntStream.range(0, 10).mapToDouble(i->ThreadLocalRandom.current().nextDouble()).boxed().collect(Collectors.toList());
long startTime = System.nanoTime();
long elapsed = 0;
for (int i = 0; i < list.size(); i++) { // Bug fixed: start at 0, not at 1.
if (i > 0) {
long nextInvocation = TimeUnit.NANOSECONDS.convert(duration, TimeUnit.SECONDS) * i / (list.size() - 1);
long sleepAmount = nextInvocation - elapsed;
TimeUnit.NANOSECONDS.sleep(sleepAmount);
}
elapsed = System.nanoTime() - startTime;
doSomething(elapsed, list.get(i));
}
}
private static void doSomething(long elapsedNanos, Double d) {
System.out.println(elapsedNanos / 1.0e9f + "\t" + d);
}
Of course when the task you preform per list element takes longer than 60/(N-1) seconds, you get contention and the 'elapsed time' deadlines are always exceeded. With this algorithm the total time just taking longer than a mnute. However if some earlier invocations exceed the deadline, and later invocations take much less time than 60/(N-1), this algorithm will show 'catch-up' behavior. This can be partially solved by sleeping at least a minimum amount even when sleepAmount is less.
Check out this.
long start = System.currentTimeMillis();
long end = start + 60*1000; // 60 seconds * 1000 ms/sec
int i = 0;
while (System.currentTimeMillis() < end)
{
// do something, iterate your list
i++;
if (i == list.size()) { // check size of the list if iteration is completed
// if time has not yet expired, sleep for the rest of the time
Thread.sleep(end - System.currentTimeMillis());
}
}
Do not forget checking size of the list.

Time how long a function runs (short duration)

I'm relatively new to Java programming, and I'm running into an issue calculating the amount of time it takes for a function to run.
First some background - I've got a lot of experience with Python, and I'm trying to recreate the functionality of the Jupyter Notebook/Lab %%timeit function, if you're familiar with that. Here's a pic of it in action (sorry, not enough karma to embed yet):
Snip of Jupyter %%timeit
What it does is run the contents of the cell (in this case a recursive function) either 1k, 10k, or 100k times, and give you the average run time of the function, and the standard deviation.
My first implementation (using the same recursive function) used System.nanoTime():
public static void main(String[] args) {
long t1, t2, diff;
long[] times = new long[1000];
int t;
for (int i=0; i< 1000; i++) {
t1 = System.nanoTime();
t = triangle(20);
t2 = System.nanoTime();
diff = t2-t1;
System.out.println(diff);
times[i] = diff;
}
long total = 0;
for (int j=0; j<times.length; j++) {
total += times[j];
}
System.out.println("Mean = " + total/1000.0);
}
But the mean is wildly thrown off -- for some reason, the first iteration of the function (on many runs) takes upwards of a million nanoseconds:
Pic of initial terminal output
Every iteration after the first dozen or so takes either 395 nanos or 0 -- so there could be a problem there too... not sure what's going on!
Also -- the code of the recursive function I'm timing:
static int triangle(int n) {
if (n == 1) {
return n;
} else {
return n + triangle(n -1);
}
}
Initially I had the line n = Math.abs(n) on the first line of the function, but then I removed it because... meh. I'm the only one using this.
I tried a number of different suggestions brought up in this SO post, but they each have their own problems... which I can go into if you need.
Anyway, thank you in advance for your help and expertise!

Why is my java program becoming gradually slower?

I recently built a Fibonacci generator that uses recursion and hashmaps to reduce complexity. I am using the System.nanoTime() to keep track of the time it takes for my program to print 10000 Fibonacci number. It started out good with less than a second but gradually became slower and now it takes more than 4 seconds. Can someone explain why this might be happening. The code is down here-
import java.util.*;
import java.math.*;
public class FibonacciGeneratorUnlimited {
static int numFibCalls = 0;
static HashMap<Integer, BigInteger> d = new HashMap<Integer, BigInteger>();
static Scanner fibNumber = new Scanner(System.in);
static BigInteger ans = new BigInteger("0");
public static void main(String[] args){
d.put(0 , new BigInteger("0"));
d.put(1 , new BigInteger("1"));
System.out.print("Enter the term:\t");
int n = fibNumber.nextInt();
long startTime = System.nanoTime();
for (int i = 0; i <= n; i++) {
System.out.println(i + " : " + fib_efficient(i, d));
}
System.out.println((double)(System.nanoTime() - startTime) / 1000000000);
}
public static BigInteger fib_efficient(int n, HashMap<Integer, BigInteger> d) {
numFibCalls += 1;
if (d.containsKey(n)) {
return (d.get(n));
} else {
ans = (fib_efficient(n-1, d).add(fib_efficient(n-2, d)));
d.put(n, ans);
return ans;
}
}
}
If you are restarting the program every time you make a new fibonacci sequence, then your program most likely isn't the problem. It might just be the your processor got hot after running the program a few times, or a background process in your computer suddenly started, causing your program to slow down.
More memory java -Xmx=... or less caching
public static BigInteger fib_efficient(int n, HashMap<Integer, BigInteger> d) {
numFibCalls++;
if ((n & 3) <= 1) { // Every second is cached.
BigInteger cached = d.get(n);
if (cached != null) {
return cached;
} else {
BigInteger ans = fib_efficient(n-1, d).add(fib_efficient(n-2, d));
d.put(n, ans);
return ans;
}
} else {
return fib_efficient(n-1, d).add(fib_efficient(n-2, d));
}
}
Two subsequent numbers are cached out of four in order to stop the
recursion on both branches for:
fib(n) = fib(n-1) + fib(n-2)
BigInteger isn't the nicest class where performance and memory is concerned.
It started out good with less than a second but gradually became slower and now it takes more than 4 seconds.
What do you mean by this? Do you mean that you ran this exact same program with the same input and its run-time changed from < 1 second to > 4 seconds?
If you have the same exact code running with the same exact inputs in a deterministic algorithm...
then the differences are probably external to your code - maybe other processes are taking up more CPU on one run.
Do you mean that you increased the inputs from some value X to 10,000 and now it takes > 4 seconds?
Then that's just a matter of the algorithm taking longer with larger inputs, which is perfectly normal.
recursion and hashmaps to reduce complexity
That's not quite how complexity works. You have improved the best-case and the average-case, but you have done nothing to change the worst-case.
Now for some actual performance improvement advice
Stop printing out the results... that's eating up over 99% of your processing time. Seriously, though, switch out "System.out.println(i + " : " + fib_efficient(i, d))" with "fib_efficient(i,d)" and it'll execute over 100x faster.
Concatenating strings and printing to console are very expensive processes.
It happens because the complexity for Fibonacci is Big-O(n^2). This means that, the larger the input the time increases exponentially, as you can see in the graph for Big-O(n^2) in this link. Check this answer to see a complete explanation about it´s complexity.
Now, the complexity of your algorithm increases because you are using a HashMap to search and insert elements each time that function is invoked. Consider remove this HashMap.

Simple concurrent Java threads--capture begin and end

Am I correctly implementing these Java threads? The goal is to have ten concurrent threads who compute a sum from 1 to (upper bound 22 + i). I'm trying to identify the name and print it when running the thread, then print the result when the thread exits. Currently, I have all of the results printing at the same time in a random order and I am not sure if I am correctly getting the information when the thread begins and ends.
public class threads {
public static void main(String[] args) {
for(int i = 0; i < 10; i++) {
final int iCopy = i;
new Thread("" + i) {
public void run() {
int sum = 0;
int upperBound = 22;
int lowerBound = 1;
long threadID = Thread.currentThread().getId();
for (int number = lowerBound; number <= upperBound; number++){
sum = sum + number + iCopy;
}
System.out.println(threadID + " thread is running now, I and will compute the sum from 1 to " + (upperBound + iCopy) + ". The i is : " + iCopy);
System.out.println("Thread id #" + threadID + ", the "+ sum + " is done by the thread.");
}
}.start();
}
}
}
I have executed your code and observed that all threads are running properly 10 in this case. Since threads are invoked in random order that is why this behavior might be seen but I an sure that all threads for running fine and executing the functionality you require.
Any how in output i saw that in for loop the value should start from 0 to 9 but here even this is random, may be because some threads are sleeping while executing and giving way to other threads.
Hope this helps
Thanks.
The order the threads run in will depend entirely on the JVM being used and underlying resources.
If you have several cores (cpus) available, your code may run completely differently to a single core.
Essentially, your main loop runs to the end in a single thread, firing 10 new threads, and puts the start method in a process queue. Other processors may start running those threads. Each extra thread causes different total load, so they run slightly differently (performance wise) on each processor, meaning they run faster/slower, and end in different times.
Your code demonstrates this very well.

Single Threaded Program vs Multithreaded Program (measuing time elapsed)

I want to know if I need to measure time elapsed then Single Threaded Program is good approach or Multithreading Program is a good approach for that.
Below is my single threaded program that is measuring the time of our service-
private static void serviceCall() {
histogram = new HashMap<Long, Long>();
keys = histogram.keySet();
long total = 5;
long runs = total;
while (runs > 0) {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("SOME URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
runs--;
}
for (Long key : keys) {
Long value = histogram.get(key);
System.out.println("MEASUREMENT " + key + ":" + value);
}
}
Output I get from this Single Threaded Program is- Total call was 5
MEASUREMENT 163:1
MEASUREMENT 42:3
MEASUREMENT 47:1
which means 1 call came back in 163 ms. 3 calls came back in 42 ms and so on.
And also I did tried using Multithreaded program as well to measure the time elapsed. Meaning hitting the service parallely with few threads and then measuring how much each thread is taking.
Below is the code for that as well-
//create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(10);
// queue some tasks
for (int i = 0; i < 1 * 5; i++) {
service.submit(new ThreadTask(i, histogram));
}
public ThreadTask(int id, HashMap<Long, Long> histogram) {
this.id = id;
this.hg = histogram;
}
#Override
public void run() {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("", String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = hg.get(difference);
if (count != null) {
count++;
hg.put(Long.valueOf(difference), count);
} else {
hg.put(Long.valueOf(difference), Long.valueOf(1L));
}
}
And below is the result I get from the above program-
{176=1, 213=1, 182=1, 136=1, 155=1}
One call came back in 176 ms, and so on
So my question is why Multithreading program is taking a lot more time as compared to above Single threaded program? If there is some loop hole in my Multithreading program, can anyone help me to improve it?
Your multi-threaded program likely makes all the requests at the same time which puts more strain on the server which will cause it to respond slower to all request.
As an aside, the way you are doing the update isn't threadsafe, so your count will likely be off in the multithreaded scenario given enough trials.
For instance, Thread A and B both return in 100 ms at the same time. The count in histogram for 100 is 3. A gets 3. B gets 3. A updates 3 to 4. B updates 3 to 4. A puts the value 4 in the histogram. B puts the value 4 in the histogram. You've now had 2 threads believe they incremented the count but the count in the histogram only reflects being incremented once.

Categories

Resources