This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 6 years ago.
I recently started learning java and I am kinda stumped by this---
the below program determines the speed of the for loop--
public class ForLoopExample {
public static void main(String[] args) {
long startTime = System.nanoTime();
for(int a = 0; a < 10; a++) {
System.out.println(a);
}
long endTime = System.nanoTime();
System.out.println("for loop timing = " + (endTime - startTime) + " ns");
System.out.println("loop completed");
}
}
and the output timing is:
for loop timing = 853716 ns
The program determining the speed of the while loop:
public class WhileLoopExample {
public static void main(String[] args) {
int a=0;
long startTime = System.nanoTime();
while(a < 10) {
System.out.println(a);
a++;
}
long endTime = System.nanoTime();
System.out.println("while loop timing = " + (endTime - startTime) + " ns");
}
}
while loop timing=928358 ns
Why does this happen a detailed explanation would be appreciated.
Rerun your test with more than ten iterations, but also run more iterations of those iterations, say 10000 in the inner loop and ten in the outer and average the results. They should be close.
The reason that there is variance is a result of the operating system and multi threading. The OS is managing many tasks other than your program and those may take a slightly higher priority than your program. This causes sight delays in your execution.
Having a much bigger sample size should decrease the variance in the results.
Related
Consider the code snippets below and the time taken to execute them -
public static void main(String[] args) {
Long startTime = System.currentTimeMillis();
long sum = 0L;
for(int i = 0; i< Integer.MAX_VALUE; i++){
sum+=i;
}
Long timeDiff = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Time Difference : " + timeDiff + "secs");
}
Output -
Time Difference : 0secs
public static void main(String[] args) {
Long startTime = System.currentTimeMillis();
Long sum = 0L;
for(int i = 0; i< Integer.MAX_VALUE; i++){
sum+=i;
}
Long timeDiff = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Time Difference : " + timeDiff + "secs");
}
Output -
Time Difference : 8secs
public static void main(String[] args) {
Long startTime = System.currentTimeMillis();
Long sum = 0L;
for(Long i = 0L; i< Integer.MAX_VALUE; i++){
sum+=i;
}
Long timeDiff = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Time Difference : " + timeDiff + "secs");
}
Output -
Time Difference : 16secs
As per my understanding, it's happening because of every time object creation of Long Object, I am not sure how exactly this is happening. Tried looking into byte code didn't help much.
Help me understand how exactly things are internally happening?
Thanks in advance!
The "++" and "+=" operators are only defined for primitives.
Hence, when you apply them to a Long, an unboxing must take place before the operator is evaluated and then a boxing must take place to store the result.
The boxing probably costs more than the unboxing, since unboxing requires just a method call, while boxing requires object instantiation.
Each boxing involves the creation of a Long instance. Your loop has Integer.MAX_VALUE iterations, so the second loop creates over 2 billion Long objects (one for each sum+=i operation) while the third loop creates over 4 billion Long objects (one for each i++ operation and one for each sum+=i operation). These objects have to be instantiated and later garbage collected. That costs time.
Plausible causes:
- Too many object creations leading to GC activity at times.
- Too many boxing and unboxing of Wrapper to primitive and vice versa.
I am going to leverage multi-core CPU architecture using a sample which calculates the sum of all the integers 0 througn Integer.MAX_VALUE. My threshold is 5000000 integers. So I am spliting it recursively until the threshold it met. Then when it reaches the threshold I compute the sum. Finally all the sums are accumulated together to compute the final result. The problem is recursive in nature and fits in nicely with the Fork/join framework. But when I run it I am getting errors. Some times it is Stackoverflow error. Other time it is like this.
Exception in thread "ForkJoinPool.commonPool-worker-7" java.lang.NoClassDefFoundError: Could not initialize class java.util.concurrent.locks.AbstractQueuedSynchronizer$Node
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1198)
Here's my code:
public class ParallelSum extends RecursiveTask<Long> {
private static final int THRESHOLD = 5000000;
private final int start;
private final int end;
public ParallelSum(int start, int end) {
super();
this.start = start;
this.end = end;
}
#Override
protected Long compute() {
// System.out.println("Start: " + start + " End: " + end);
if (end - start > THRESHOLD) {
return ForkJoinTask.invokeAll(createSubtasks()).stream().mapToLong(ForkJoinTask::join).sum();
}
return LongStream.rangeClosed(start, end).sum();
}
private Collection<RecursiveTask<Long>> createSubtasks() {
final List<RecursiveTask<Long>> dividedTasks = new ArrayList<>();
dividedTasks.add(new ParallelSum(start, end / 2));
dividedTasks.add(new ParallelSum(end / 2 + 1, end));
return dividedTasks;
}
}
And the main method is like so,
public class ParallelSumTest {
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
final long sum = new ParallelSum(0, Integer.MAX_VALUE).compute();
long endTime = System.currentTimeMillis();
final long elapsedTime = endTime - startTime;
System.out.println(sum + " was computed in " + elapsedTime + " milliseconds.");
}
}
What is missing here? What I am doing wrong? Any help is appreciated.
The exception NoClassDefFoundError is most likely thrown because the JVM is running out of permanent generation space. Like the StackOverflowException, this is because the level of recursion generated by your algorithm is too deep.
There is a small error made when you create the tasks and split the work in two pieces. You should use the middle index as (start + end) / 2 instead of end / 2:
dividedTasks.add(new ParallelSum(start, (start + end) / 2));
dividedTasks.add(new ParallelSum((start + end) / 2 + 1, end));
On my machine with the adjustment above I got the following output:
2738188572099084288 was computed in 1636 milliseconds.
I have written the below code to observe the timing of a loop function. Surprisingly, It gives me different values for each run.
public static void main(String[] args) {
for (int attempt = 0; attempt < 10; attempt++) {
runloop();
}
}
public static void runloop() {
long sum = 0L;
long starttime = System.nanoTime();
for (int x = 0; x < 1000000; x++) {
sum += x;
}
long end = System.nanoTime();
System.out.println("Time taken:" + (end - starttime) / 1000L);
}
}
Observation :
Time taken:4062
Time taken:3122
Time taken:2707
Time taken:2445
Time taken:3575
Time taken:2823
Time taken:2228
Time taken:1816
Time taken:1839
Time taken:1811
I am not able to understand why there is such a difference in the timing.
What is the reason ?
It could be anything:
Other processes running on your computer limiting the time given to Java
Run of the garbage collector
Loop initialization time
...
I read in couple of blogs that in Java modulo/reminder operator is slower than bitwise-AND. So, I wrote the following program to test.
public class ModuloTest {
public static void main(String[] args) {
final int size = 1024;
int index = 0;
long start = System.nanoTime();
for(int i = 0; i < Integer.MAX_VALUE; i++) {
getNextIndex(size, i);
}
long end = System.nanoTime();
System.out.println("Time taken by Modulo (%) operator --> " + (end - start) + "ns.");
start = System.nanoTime();
final int shiftFactor = size - 1;
for(int i = 0; i < Integer.MAX_VALUE; i++) {
getNextIndexBitwise(shiftFactor, i);
}
end = System.nanoTime();
System.out.println("Time taken by bitwise AND --> " + (end - start) + "ns.");
}
private static int getNextIndex(int size, int nextInt) {
return nextInt % size;
}
private static int getNextIndexBitwise(int size, int nextInt) {
return nextInt & size;
}
}
But in my runtime environment (MacBook Pro 2.9GHz i7, 8GB RAM, JDK 1.7.0_51) I am seeing otherwise. The bitwise-AND is significantly slower, in fact twice as slow than the remainder operator.
I would appreciate it if someone can help me understand if this is intended behavior or I am doing something wrong?
Thanks,
Niranjan
Your code reports bitwise-and being much faster on each Mac I've tried it on, both with Java 6 and Java 7. I suspect the first portion of the test on your machine happened to coincide with other activity on the system. You should try running the test multiple times to verify you aren't seeing distortions based on that. (I would have left this as a 'comment' rather than an 'answer', but apparently you need 50 reputation to do that -- quite silly, if you ask me.)
For starters, logical conjunction trick only works with Nature Number dividends and power of 2 divisors. So, if you need negative dividends, floats, or non-powers of 2, sick with the default % operator.
My tests (with JIT warmup and 1M random iterations), on an i7 with a ton of cores and bus load of ram show about 20% better performance from the bitwise operation. This can very per run, depending how the process scheduler runs the code.
using Scala 2.11.8 on JDK 1.8.91
4Ghz i7-4790K, 8 core AMD, 32GB PC3 19200 ram, SSD
This example in particular will always give you a wrong result. Moreover, I believe that any program which is calculating the modulo by a power of 2 will be faster than bitwise AND.
REASON: When you use N % X where X is kth power of 2, only last k bits are considered for modulo, whereas in case of the bitwise AND operator the runtime actually has to visit each bit of the number under question.
Also, I would like to point out the Hot Spot JVM's optimizes repetitive calculations of similar nature(one of the examples can be branch prediction etc). In your case, the method which is using the modulo just returns the last 10 bits of the number because 1024 is the 10th power of 2.
Try using some prime number value for size and check the same result.
Disclaimer: Micro benchmarking is not considered good.
Is this method missing something?
public static void oddVSmod(){
float tests = 100000000;
oddbit(tests);
modbit(tests);
}
public static void oddbit(float tests){
for(int i=0; i<tests; i++)
if((i&1)==1) {System.out.print(" "+i);}
System.out.println();
}
public static void modbit(float tests){
for(int i=0; i<tests; i++)
if((i%2)==1) {System.out.print(" "+i);}
System.out.println();
}
With that, i used netbeans built-in profiler (advanced-mode) to run this. I set var tests up to 10X10^8, and every time, it showed that modulo is faster than bitwise.
Thank you all for valuable inputs.
#pamphlet: Thank you very much for the concerns, but negative comments are fine with me. I confess that I did not do proper testing as suggested by AndyG. AndyG could have used a softer tone, but its okay, sometimes negatives help seeing the positive. :)
That said, I changed my code (as shown below) in a way that I can run that test multiple times.
public class ModuloTest {
public static final int SIZE = 1024;
public int usingModuloOperator(final int operand1, final int operand2) {
return operand1 % operand2;
}
public int usingBitwiseAnd(final int operand1, final int operand2) {
return operand1 & operand2;
}
public void doCalculationUsingModulo(final int size) {
for(int i = 0; i < Integer.MAX_VALUE; i++) {
usingModuloOperator(1, size);
}
}
public void doCalculationUsingBitwise(final int size) {
for(int i = 0; i < Integer.MAX_VALUE; i++) {
usingBitwiseAnd(i, size);
}
}
public static void main(String[] args) {
final ModuloTest moduloTest = new ModuloTest();
final int invocationCount = 100;
// testModuloOperator(moduloTest, invocationCount);
testBitwiseOperator(moduloTest, invocationCount);
}
private static void testModuloOperator(final ModuloTest moduloTest, final int invocationCount) {
for(int i = 0; i < invocationCount; i++) {
final long startTime = System.nanoTime();
moduloTest.doCalculationUsingModulo(SIZE);
final long timeTaken = System.nanoTime() - startTime;
System.out.println("Using modulo operator // Time taken for invocation counter " + i + " is " + timeTaken + "ns");
}
}
private static void testBitwiseOperator(final ModuloTest moduloTest, final int invocationCount) {
for(int i = 0; i < invocationCount; i++) {
final long startTime = System.nanoTime();
moduloTest.doCalculationUsingBitwise(SIZE);
final long timeTaken = System.nanoTime() - startTime;
System.out.println("Using bitwise operator // Time taken for invocation counter " + i + " is " + timeTaken + "ns");
}
}
}
I called testModuloOperator() and testBitwiseOperator() in mutual exclusive way. The result was consistent with the idea that bitwise is faster than modulo operator. I ran each of the calculation 100 times and recorded the execution times. Then removed first five and last five recordings and used rest to calculate the avg. time. And, below are my test results.
Using modulo operator, the avg. time for 90 runs: 8388.89ns.
Using bitwise-AND operator, the avg. time for 90 runs: 722.22ns.
Please suggest if my approach is correct or not.
Thanks again.
Niranjan
My partner and I are attempting to program a LinkedList data structure. We have completed the data structure, and it functions properly with all required methods. We are required to perform a comparative test of the runtimes of our addFirst() method in our LinkedList class vs. the add(0, item) method of Java's ArrayList structure. The expected complexity of the addFirst() method for our LinkedList data structure is O(1) constant. This held true in our test. In timing the ArrayList add() method, we expected a complexity of O(N), but we again received a complexity of approximately O(1) constant. This appeared to be a strange discrepancy since we are utilizing Java's ArrayList. We thought there may be an issue in our timing structure, and we would be most appreciative if any one could help us identify our problem. Our Java code for the timing of both methods is listed below:
public class timingAnalysis {
public static void main(String[] args) {
//timeAddFirst();
timeAddArray();
}
public static void timeAddFirst()
{
long startTime, midTime, endTime;
long timesToLoop = 10000;
int inputSize = 20000;
MyLinkedList<Long> linkedList = new MyLinkedList<Long>();
for (; inputSize <= 1000000; inputSize = inputSize + 20000)
{
// Clear the collection so we can add new random
// values.
linkedList.clear();
// Let some time pass to stabilize the thread.
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000)
{ }
// Start timing.
startTime = System.nanoTime();
for (long i = 0; i < timesToLoop; i++)
linkedList.addFirst(i);
midTime = System.nanoTime();
// Run an empty loop to capture the cost of running the loop.
for (long i = 0; i < timesToLoop; i++)
{} // empty block
endTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop from
// the cost of running the loop and computing the removeAll method.
// Average it over the number of runs.
double averageTime = ((midTime - startTime) - (endTime - midTime)) / timesToLoop;
System.out.println(inputSize + " " + averageTime);
}
}
public static void timeAddArray()
{
long startTime, midTime, endTime;
long timesToLoop = 10000;
int inputSize = 20000;
ArrayList<Long> testList = new ArrayList<Long>();
for (; inputSize <= 1000000; inputSize = inputSize + 20000)
{
// Clear the collection so we can add new random
// values.
testList.clear();
// Let some time pass to stabilize the thread.
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000)
{ }
// Start timing.
startTime = System.nanoTime();
for (long i = 0; i < timesToLoop; i++)
testList.add(0, i);
midTime = System.nanoTime();
// Run an empty loop to capture the cost of running the loop.
for (long i = 0; i < timesToLoop; i++)
{} // empty block
endTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop from
// the cost of running the loop and computing the removeAll method.
// Average it over the number of runs.
double averageTime = ((midTime - startTime) - (endTime - midTime)) / timesToLoop;
System.out.println(inputSize + " " + averageTime);
}
}
}
You want to test for different inputSize, but you perform the operation to test timesToLoop times, which is constant. So of course, it takes the same time. You should use:
for (long i = 0; i < inputSize; i++)
testList.add(0, i);
As per my knowledge, Arraylist add operaion runs in O(1) time, so the results of your experiment are correct. I think the constant time for arrayList add method is amortized constant time.
As per java doc :
adding n elements require O(N) time so that is why the amortized constant time for adding.