Traversal performance of multidimensional array in Java

Traversal performance of multidimensional array in Java - java

In code and the results below, We can see that “Traverse2” is much faster than "Traverse1", indeed they just traverse the same number of elements.
1.How does this difference happened?
2.Putting longer interation inside shorter interation will have a better performance?
public class TraverseTest {
public static void main(String[] args)
{
int a[][] = new int[100][10];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 100; i++)
{
for(int j = 0; j < 10; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 100; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
}
}
Result:
1347116569345
1347116569360
1347116569360
If i change it to
System.out.println(System.nanoTime());
The result will be:
4888285195629
4888285846760
4888285914219
It means that if we put longer interation inside will have a better performance. And it seems to have some conflicts with cache hits theory.

I suspect that any strangeness in the results you are seeing in this micro-benchmark are due to flaws in the benchmark itself.
For example:
Your benchmark does not take account of "JVM warmup" effects, such as the fact that the JIT compiler does not compile to native code immediately. (This only happens after the code has executed for a bit, and the JVM has measured some usage numbers to aid optimization.) The correct way to deal with this is to put the whole lot inside a loop that runs a few times, and discard any initial sets of times that that look "odd" ... due to warmup effects.
The loops in your benchmark could in theory be optimized away. The JIT compiler might be able to deduce that they don't do any work that affects the program's output.
Finally, I'd just like to remind you that hand-optimizing like this is usually a bad idea ... unless you've got convincing evidence that it is worth your while hand-optimizing AND that this code is really where the application is spending significant time.

First, always run microbenchmark tests several times in a loop. Then you'll see both times are 0, as the array sizes are too small. To get non-zero times, increase array sizes in 100 times. My times are roughly 32 ms for Traverse1 and 250 for Traverse2.
The difference is because processor use cache memory. Access to sequential memory addresses is much faster.

My output(with you original code 100i/10j vs 10i/100j ):
1347118083906
1347118083906
1347118083906
You are using a very bad time resolution for a very quick calculation.
I changed the i and j limit to 1000 both.
int a[][] = new int[1000][1000];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
output:
1347118210671
1347118210687 //difference is 16 ms
1347118210703 //difference is 16 ms again -_-
Two possibilities:
Java hotspot changes the second loop into a first-type or optimizes
with exchanging i and j.
Time resolution is still not enough.
So i changed output as System.nanoTime()
int a[][] = new int[1000][1000];
System.out.println(System.nanoTime());
//Traverse1
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[i][j] = 1;
}
System.out.println(System.nanoTime());
//Traverse2
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[j][i] = 2;
}
System.out.println(System.nanoTime());
Output:
16151040043078
16151047859993 //difference is 7800000 nanoseconds
16151061346623 //difference is 13500000 nanoseconds --->this is half speed
1.How does this difference happened?
Note that even ommiting you just used wrong time-resolution, you are making wrong comparations vs inequal cases. First is contiguous-access while second is not.
Lets say first nested loops are just a heating-preparing for the second one then it would make your assumption of "second is much faster" even more wrong.
Dont forget that 2D-array is an "array of arrays" in java. So, the right-most index would show a contiguous area. Faster for the first version.
2.Putting longer interation inside shorter interation will have a better performance?
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 100; j++)
a[j][i] = 2;
}
Increasing the first index is slower because the next iteration goes kbytes away so you cannot use your cache-line anymore.
Absolutely not!

In my point of view, size of array also affects the result. Like:
public class TraverseTest {
public static void main(String[] args)
{
int a[][] = new int[10000][2];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 10000; i++)
{
for(int j = 0; j < 2; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 2; i++)
{
for(int j = 0; j < 10000; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
}
}
Traverse1 needs 10000*3+1 = 30001 comparisons to decide whether to exit the iteration,
however Traverse2 only needs 2*10001+1 = 20003 comparisons.
Traverse1 needs 1.5 times then number of comparisons of Traverse2.

Related

Conditional inside for loop or lots of for loop?

I'm trying to make my code work differently for different values of i in for loop, but I don't know if I should make the conditional go inside the loop or just create multiple for loops for enhanced speed.
My English does seem quite inefficient at explaining things, so here's an example:
for (int i = 1; i < 31; i++) {
if (i < 11) {
System.out.println(3*i);
} else if (i<21) {
System.out.println(2*i);
} else System.out.println(i);
}
or
for (int i = 1; i < 11; i++) System.out.println(3*i);
for (int i = 11; i < 21; i++) System.out.println(2*i);
for (int i = 21; i < 31; i++) System.out.println(i);
It would really help if the reason why one of them might be better or not could be explained as well. Thank you in advance :>

Enhanced speed should not be a consideration. The differences (if any) would be negligible.
You should choose the more readable version. When using a for loop, you usually mean you wish to perform the same action N times. In your case you want to perform 3 different actions, each a different number of times (or for different values of i). Therefore it makes more sense to have 3 loops.
for (int i = 1; i < 11; i++) {
System.out.println(3*i);
}
for (int i = 11; i < 21; i++) {
System.out.println(2*i);
}
for (int i = 21; i < 31; i++) {
System.out.println(i);
}

The first single loop analysis :-
Number of variable initialized 1.
Number of comparisons :-
1 < 31
1 < 11
2 < 31
2 < 11
so on.
Hence for 1 to 10 number of comparison 20.
for 11 to 20 number of comparison 30.
for 21 to 30 number of comparison 30.
so total 80 comparison for single loop.
but
for (int i = 1; i < 11; i++) System.out.println(3*i);
for (int i = 11; i < 21; i++) System.out.println(2*i);
for (int i = 21; i < 31; i++) System.out.println(i);
total comparison 31.
So the seperate loop is good instead of if else ledder.

Make the code readable is more important. The performance difference is very small, which can be ignored in most cases. Here is the experement result on my computer:
pattern 1:
run 100000 times cost 7548 milli seconds
run 1000000 times cost 70180 milli seconds
pattern 2:
run 100000 times cost 7536 milli seconds
run 1000000 times cost 70535 milli seconds

Unless driven by performance considerations let readability lead you.
The second one is surely easier to understand. Though I'd recommend using block statements:
for (int i = 1; i < 11; i++) {
System.out.println(3*i);
}
for (int i = 11; i < 21; i++) {
System.out.println(2*i);
}
for (int i = 21; i < 31; i++) {
System.out.println(i);
}
Of course you could make a formula:
for (int i = 1; i < 31; i++) {
int fac=3-((i-1)/10);
System.out.println(fac*i);
}
Though that seems fairly unreadable too it might be the best approach if the equivalent were many for-loops or a number of loops you couldn't determine at compile time.

The 3 for loops are faster (not important here), as there no longer is an if-else-if at every i-step. More important the three loops are far more readable because the if cascade is removed.
However using j = i + 1 the first loop can be converted to:
final int n = 30;
for (int j = 0; j < n; j++) {
int k = n/10 - j/10;
System.out.println(k * (j + 1));
}
Because of the division this will probably not be faster. However the removal of the if-cascade is an improvement. The expressions are harder to interprete while reading, but they specify a bit of calculating logic, which the mere stating of if conditions would not: one could change 30 with 300 and still everything would make sense.
Or
for (int j = 0; j < 3; ++j) {
for (int i = 1 + j*10; i < 11 + j*10; i++) {
System.out.println((3-j)*i);
}
}

Big O for 3 nested for loops?

public int Loop(int[] array1) {
int result = 0;
for (int i = 0; i < array1.length; i++) {
for (int j = 0; j < array1.length; j++) {
for (int k = 1; k < array1.length; k = k * 2) {
result += j * j * array1[k] + array1[i] + array1[j];
}
}
}
return result;
}
I'm trying to find the complexity function that counts the number of arithmetic operations here. I know the complexity class would be O(n^3), but I'm having a bit of trouble counting the steps.
My reasoning so far is that I count the number of arithmetic operations which is 8, so would the complexity function just be 8n^3?
Any guidance in the right direction would be very much appreciated, thank you!

The first loop will run n times, the second loop will run n times however the third loop will run log(n) times (base 2). Since you are multiplying k by two each time the inverse operation would be to take the log. Multiplying we have O(n^2 log(n))

If we can agree that the following is one big step:
result += j * j * array1[k] + array1[i] + array1[j]
then let's call that incrementResult.
How many times is incrementResult called here? (log n)
for (int k = 1; k < array1.length; k = k * 2) {
// incrementResult
}
Lets call that loop3. Then how many times is loop3 called here? (n)
for (int j = 0; j < array1.length; j++) {
// loop 3
}
Let's call that loop2. Then, how many times is loop2 called here? (n)
for (int i = 0; i < array1.length; i++) {
// loop 2
}
Multiply all of those and you'll get your answer :)

That depends on the loops. For instance:
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
for (int k = 0; k < 10; k++) {
sum += i * j * k;
}
}
}
has complexity O(1), because the number of iterations does not depend on the input at all.
Or this:
for (int i = 0; i < n*n*n*n*n*n; i++) {
sum += i;
}
is O(n^6), even though there is a single loop.
What really matters is how many iterations each loop makes.
In your case, it is easy to see that each iteration of the innermost loop is O(1). How many iterations are there? How many times do you need to double a number until you reach n? If x is the number of iterations, we'd exit the loop at the first x such that k = 2^x > n. Can you solve this for x?
Each iteration of the second loop will do this, so the cost of the second loop is the number of iterations (which are easier to count this time) times the cost of the inner loop.
And each iteration of the first loop will do this, so the cost of the first loop is the number of iterations (which is also easy to count) times the cost of the second loop.
Overall, the runtime is the product of 3 numbers. Can you find them?

Calling size() in a for loop condition, bad efficiency? [duplicate]

This question already has answers here:
Using collection size in for loop comparison
(4 answers)
Closed 7 years ago.
I just wanted to know in general, is this code inefficient:
for (int i = 0; i < array.size(); i++) {
//do something
}
as opposed to:
int x = array.size();
for (int i = 0; i < x; i++) {
//do something
}
or is it negligible? (How about in nested for loops?)

Assuming array is an ArrayList, it's of almost no difference since the implementation of size() merely accesses a member field:
public int size() {
return size;
}
The second code just saves the field value in a local variable and re-uses it in the loop instead of accessing the field every time, so that's just a difference between an access to a local variable versus an access to a field (accessing a local variable is slightly faster).

You can test it yourself doing some test like below:
public static void main(String[] args) {
ArrayList<Long> array = new ArrayList<Long>(99999);
int i = 0;
while (i < 99999) {
array.add(1L);
i++;
}
long ini1 = System.currentTimeMillis();
i = 0;
for (int j = 0; j < array.size(); j++) {
i += array.get(j);
}
long end1 = System.currentTimeMillis();
System.out.println("Time1: " + (end1 - ini1));
long ini2 = System.currentTimeMillis();
i = 0;
for (int j = 0; j < 99999; j++) {
i += array.get(j);
}
long end2 = System.currentTimeMillis();
System.out.println("Time2: " + (end2 - ini2));
}
Output:
Time1: 13
Time2: 10
I think that the difference its irrelevant in most applications and cases, i run the test several times and the times vary but the difference keeps "constant" at least in terms of percentage...

Arrays don't have a size, but length
for (int i = 0; i < array.length; i++) {
//do something
}
Efficiency is O(1).

actually, performance is almost the same if array.size is not very big.
u can always make like this:
for (int i = 0, x = array.length; i < x; i++) {
//do something
}

Big-Oh Notation?? Approximating the value of sum after the following code fragment, in terms of variable n

Okay, so I don't know what Big-Oh is because I swear my professor didn't cover it, and I need help for something I assume to be simple asap. I know the answers to it, but she wants code for it, and I don't know how to compile it. Basically, I googled help w/ this & they just simply have the answer, w/o an example of how to get it, or they have n = 1000 or something, but I don't see that in the prompt or what n should equal to. I hope someone understands me. Advice, please? lol.
This is the prompt:
1) Approximate the value of sum after the following code fragment, in terms of variable n in Big-Oh notation.
2) Answer the estimated run time of the following program segment in Big-Oh notation:
int sum = 0;
for (int i = 1; i <= n - 3; i++) {
for (int j = 1; j <= n + 4; j += 5) {
sum += 2;
}
sum++;
}
for (int i = 1; i <= 100; i++) {
sum++;
}
3) Answer the estimated run time of the following program segment in Big-Oh notation:
int sum = 0;
for (int i = 1; i <= n; i++) {
sum++;
}
for (int j = 1; j <= n / 2; j++) {
sum++;
}
I'm used to just sticking public static void main(String[] args) { in front of everything, so I did this:
public class BigO {
public static void main(String[] args) {
}
public static void main(int n) {
int sum = 0;
for (int i = 1; i <= n - 3; i++) {
for (int j = 1; j <= n + 4; j += 5) {
sum += 2;
}
sum++;
}
for (int i = 1; i <= 100; i++) {
sum++;
}
}
}
Of course that doesn't work.

Big O notation isn't about getting the program to work. It's about looking at the code to see how quickly the running-time of the program increases when you increase some variable (frequently the number of inputs but in this case, simply n).
Suppose that you analyse the running time of the program for successive values of n -> n=1, n=2, n-3, etc. and find that the running time is described by a linear equation like An + B. The dominant term here is the An term so you ignore the B. You can also ignore the A and say that it's order O(n).
If the running time is described by An^2 + Bn + C then it's order O(n^2).
You understand the nature of the performance by analyzing the code and determining how it's looping not by actually getting the code to run.

Why access volatile variable is about 100 slower than member?

Here I wrote a test about access speed of local, member, volatile member:
public class VolatileTest {
public int member = -100;
public volatile int volatileMember = -100;
public static void main(String[] args) {
int testloop = 10;
for (int i = 1; i <= testloop; i++) {
System.out.println("Round:" + i);
VolatileTest vt = new VolatileTest();
vt.runTest();
System.out.println();
}
}
public void runTest() {
int local = -100;
int loop = 1;
int loop2 = Integer.MAX_VALUE;
long startTime;
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
}
for (int j = 0; j < loop2; j++) {
}
}
System.out.println("Empty:" + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
local++;
}
for (int j = 0; j < loop2; j++) {
local--;
}
}
System.out.println("Local:" + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
member++;
}
for (int j = 0; j < loop2; j++) {
member--;
}
}
System.out.println("Member:" + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
volatileMember++;
}
for (int j = 0; j < loop2; j++) {
volatileMember--;
}
}
System.out.println("VMember:" + (System.currentTimeMillis() - startTime));
}
}
And here is a result on my X220 (I5 CPU):
Round:1
Empty:5
Local:10
Member:312
VMember:33378
Round:2
Empty:31
Local:0
Member:294
VMember:33180
Round:3
Empty:0
Local:0
Member:306
VMember:33085
Round:4
Empty:0
Local:0
Member:300
VMember:33066
Round:5
Empty:0
Local:0
Member:303
VMember:33078
Round:6
Empty:0
Local:0
Member:299
VMember:33398
Round:7
Empty:0
Local:0
Member:305
VMember:33139
Round:8
Empty:0
Local:0
Member:307
VMember:33490
Round:9
Empty:0
Local:0
Member:350
VMember:35291
Round:10
Empty:0
Local:0
Member:332
VMember:33838
It surprised me that access to volatile member is 100 times slower than normal member. I know there is some highlight feature about volatile member, such as a modification to it will be visible for all thread immediately, access point to volatile variable plays a role of "memory barrier". But can all these side effect be the main cause of 100 times slow?
PS: I also did a test on a Core II CPU machine. It is about 9:50, about 5 times slow. seems like this is also related to CPU arch. 5 times is still big, right?

The volatile members are never cached, so they are read directly from the main memory.

Acess to volatile prevents some JIT optimisaton. This is especially important if you have a loop which doesn't really do anything as the JIT can optimise such loops away (unless you have a volatile field) If you run the loops "long" the descrepancy should increase more.
In more realistic test, you might expect volatile to take between 30% and 10x slower for cirtical code. In most real programs it makes very little difference because the CPU is smart enough to "realise" that only one core is using the volatile field and cache it rather than using main memory.

Access to a volatile variable prevents the CPU from re-ordering the instructions before and after the access, and this generally slows down execution.

Using volatile will read from the memory directly so that every core of cpu will get the change at next get from the variable, there's no cpu cache used, which will not use register, L1~L3 cache tech, reading from
register 1 clock cycle
L1 cache 4 clock cycle
L2 cache 11 clock cycle
L3 cache 30~40 clock cycle
Memory 100+ clock cycle
That's why your result is about 100 times slower when using volatile.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Traversal performance of multidimensional array in Java - java

Related

Conditional inside for loop or lots of for loop?

Big O for 3 nested for loops?

Calling size() in a for loop condition, bad efficiency? [duplicate]

Big-Oh Notation?? Approximating the value of sum after the following code fragment, in terms of variable n

Why access volatile variable is about 100 slower than member?

Categories

Resources