This question lead me to do some testing:
public class Stack
{
public static void main(String[] args)
{
Object obj0 = null;
Object obj1 = new Object();
long start;
long end;
double difference;
double differenceAvg = 0;
for (int j = 0; j < 100; j++)
{
start = System.nanoTime();
for (int i = 0; i < 1000000000; i++)
if (obj0 == null);
end = System.nanoTime();
difference = end - start;
differenceAvg +=difference;
}
System.out.println(differenceAvg/100);
differenceAvg = 0;
for (int j = 0; j < 100; j++)
{
start = System.nanoTime();
for (int i = 0; i < 1000000000; i++)
if (null == obj0);
end = System.nanoTime();
difference = end - start;
differenceAvg +=difference;
}
System.out.println(differenceAvg/100);
differenceAvg = 0;
for (int j = 0; j < 100; j++)
{
start = System.nanoTime();
for (int i = 0; i < 1000000000; i++)
if (obj1 == null);
end = System.nanoTime();
difference = end - start;
differenceAvg +=difference;
}
System.out.println(differenceAvg/100);
differenceAvg = 0;
for (int j = 0; j < 100; j++)
{
start = System.nanoTime();
for (int i = 0; i < 1000000000; i++)
if (null == obj1);
end = System.nanoTime();
difference = end - start;
differenceAvg +=difference;
}
System.out.println(differenceAvg/100);
}
}
Tangential to the other post, it's interesting to note how much faster the comparison is when the Object that we're comparing is initialized. The first two numbers in each output are when the Object was null and the latter two numbers are when the Object was initialized. I ran 21 additional executions of the program, in all 30 executions, the comparison was much faster when the Object was initialized. What's going on here?
If you move last two loops to the beginning you will get the same results, so comparisons are irrelevant.
It's all about JIT compiler warm-up. During the first 2 loops java starts with interpreting bytecode. After some iterations, it determines that code path is "hot", so it compiles it to machine code and removes the loops that have no effect, so you are basically measuring System.nanotime and double arithmetic.
I'm not really sure why two loops are slow. I think that after it finds two hot paths it decides to optimize entire method.
Related
I want to see how long it takes for 10,000 random integers to be sorted. Since in a bubblesort, the arrays are sorted at each stage and it could also vary each time, I want to know the total time it takes for the final sorting array to appear. So my time calculations should be when each sorting of the array is taking place, and when the final sorting happens and the results appear, the output should tell me the time in seconds.
I have used System.currentTimeMillis(); for this task but how would I use it so it calculates the time at each sorting stage? I have used it inside the for (int k = 0; k < numbers.length; k++){ loop because this loops through all the stages of the sorting, but my program would not output anything. How would I fix that?
Code:
class Main {
public static void main(String[] args) {
// Clear screen
System.out.print("\033[H\033[2J");
System.out.flush();
double msStartTime = 0d;
double msEndTime = 0d;
// Initialize an int array variable and set the limit to 10,000
int numbers[] = new int[10000];
// Generate random 10,000 integers to bubblesort
for (int x = 0; x < numbers.length; x++) {
numbers[x] = (int) (Math.random() * 10001);
}
for (int i = 0; i < numbers.length; i++) {
for (int j = i; j < numbers.length; j++) {
if (numbers[j] < numbers[i]) {
int temp = numbers[j];
numbers[j] = numbers[i];
numbers[i] = temp;
}
}
for (int k = 0; k < numbers.length; k++){
msStartTime = (double) System.currentTimeMillis();
}
}
msEndTime = (double) System.currentTimeMillis();
System.out
.println("To sort an array of 10,000 integers, it takes " + (msEndTime - msStartTime) / 1000 + " seconds");
}
}
i think you can use StopWatch.here is how u can add it to maven and use it
https://www.baeldung.com/java-measure-elapsed-time
String conSingles = ""; //one to cong. single chars to
String appendSingles = "";
//Testing one char conc. and append
long before = System.currentTimeMillis(); //static method so no new
for(int i = 0; i <10; i++) {
for (int j = 0; j <70000; j++) {
conSingles += "j";
}
conSingles=""; //Clear string
}
long after = System.currentTimeMillis();
long before2 = System.currentTimeMillis(); //static method so no new
for(int i = 0; i <10; i++) {
StringBuilder appSingles = new StringBuilder(); //one to append single chars to
for (int j = 0; j < 75000000; j++) {
appSingles.append("j");
}
appendSingles = appSingles.toString();
appendSingles = "";
}
long after2 = System.currentTimeMillis();
long total = (after-before)/10;
long total2 = (after2-before2)/10;
I am comparing how many one char strings I can put together in 1000 msec with String concat vs StringBuilder append. I am close to 1000 msec when I run the loop once, but when I do it ten times, for average, I always get around 3-400 msec in average per round. What is it in my method that makes round of one so slow compared to 10 rounds?
This question already has answers here:
Using collection size in for loop comparison
(4 answers)
Closed 7 years ago.
I just wanted to know in general, is this code inefficient:
for (int i = 0; i < array.size(); i++) {
//do something
}
as opposed to:
int x = array.size();
for (int i = 0; i < x; i++) {
//do something
}
or is it negligible? (How about in nested for loops?)
Assuming array is an ArrayList, it's of almost no difference since the implementation of size() merely accesses a member field:
public int size() {
return size;
}
The second code just saves the field value in a local variable and re-uses it in the loop instead of accessing the field every time, so that's just a difference between an access to a local variable versus an access to a field (accessing a local variable is slightly faster).
You can test it yourself doing some test like below:
public static void main(String[] args) {
ArrayList<Long> array = new ArrayList<Long>(99999);
int i = 0;
while (i < 99999) {
array.add(1L);
i++;
}
long ini1 = System.currentTimeMillis();
i = 0;
for (int j = 0; j < array.size(); j++) {
i += array.get(j);
}
long end1 = System.currentTimeMillis();
System.out.println("Time1: " + (end1 - ini1));
long ini2 = System.currentTimeMillis();
i = 0;
for (int j = 0; j < 99999; j++) {
i += array.get(j);
}
long end2 = System.currentTimeMillis();
System.out.println("Time2: " + (end2 - ini2));
}
Output:
Time1: 13
Time2: 10
I think that the difference its irrelevant in most applications and cases, i run the test several times and the times vary but the difference keeps "constant" at least in terms of percentage...
Arrays don't have a size, but length
for (int i = 0; i < array.length; i++) {
//do something
}
Efficiency is O(1).
actually, performance is almost the same if array.size is not very big.
u can always make like this:
for (int i = 0, x = array.length; i < x; i++) {
//do something
}
In code and the results below, We can see that “Traverse2” is much faster than "Traverse1", indeed they just traverse the same number of elements.
1.How does this difference happened?
2.Putting longer interation inside shorter interation will have a better performance?
public class TraverseTest {
public static void main(String[] args)
{
int a[][] = new int[100][10];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 100; i++)
{
for(int j = 0; j < 10; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 100; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
}
}
Result:
1347116569345
1347116569360
1347116569360
If i change it to
System.out.println(System.nanoTime());
The result will be:
4888285195629
4888285846760
4888285914219
It means that if we put longer interation inside will have a better performance. And it seems to have some conflicts with cache hits theory.
I suspect that any strangeness in the results you are seeing in this micro-benchmark are due to flaws in the benchmark itself.
For example:
Your benchmark does not take account of "JVM warmup" effects, such as the fact that the JIT compiler does not compile to native code immediately. (This only happens after the code has executed for a bit, and the JVM has measured some usage numbers to aid optimization.) The correct way to deal with this is to put the whole lot inside a loop that runs a few times, and discard any initial sets of times that that look "odd" ... due to warmup effects.
The loops in your benchmark could in theory be optimized away. The JIT compiler might be able to deduce that they don't do any work that affects the program's output.
Finally, I'd just like to remind you that hand-optimizing like this is usually a bad idea ... unless you've got convincing evidence that it is worth your while hand-optimizing AND that this code is really where the application is spending significant time.
First, always run microbenchmark tests several times in a loop. Then you'll see both times are 0, as the array sizes are too small. To get non-zero times, increase array sizes in 100 times. My times are roughly 32 ms for Traverse1 and 250 for Traverse2.
The difference is because processor use cache memory. Access to sequential memory addresses is much faster.
My output(with you original code 100i/10j vs 10i/100j ):
1347118083906
1347118083906
1347118083906
You are using a very bad time resolution for a very quick calculation.
I changed the i and j limit to 1000 both.
int a[][] = new int[1000][1000];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
output:
1347118210671
1347118210687 //difference is 16 ms
1347118210703 //difference is 16 ms again -_-
Two possibilities:
Java hotspot changes the second loop into a first-type or optimizes
with exchanging i and j.
Time resolution is still not enough.
So i changed output as System.nanoTime()
int a[][] = new int[1000][1000];
System.out.println(System.nanoTime());
//Traverse1
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[i][j] = 1;
}
System.out.println(System.nanoTime());
//Traverse2
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 1000; j++)
a[j][i] = 2;
}
System.out.println(System.nanoTime());
Output:
16151040043078
16151047859993 //difference is 7800000 nanoseconds
16151061346623 //difference is 13500000 nanoseconds --->this is half speed
1.How does this difference happened?
Note that even ommiting you just used wrong time-resolution, you are making wrong comparations vs inequal cases. First is contiguous-access while second is not.
Lets say first nested loops are just a heating-preparing for the second one then it would make your assumption of "second is much faster" even more wrong.
Dont forget that 2D-array is an "array of arrays" in java. So, the right-most index would show a contiguous area. Faster for the first version.
2.Putting longer interation inside shorter interation will have a better performance?
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 100; j++)
a[j][i] = 2;
}
Increasing the first index is slower because the next iteration goes kbytes away so you cannot use your cache-line anymore.
Absolutely not!
In my point of view, size of array also affects the result. Like:
public class TraverseTest {
public static void main(String[] args)
{
int a[][] = new int[10000][2];
System.out.println(System.currentTimeMillis());
//Traverse1
for(int i = 0; i < 10000; i++)
{
for(int j = 0; j < 2; j++)
a[i][j] = 1;
}
System.out.println(System.currentTimeMillis());
//Traverse2
for(int i = 0; i < 2; i++)
{
for(int j = 0; j < 10000; j++)
a[j][i] = 2;
}
System.out.println(System.currentTimeMillis());
}
}
Traverse1 needs 10000*3+1 = 30001 comparisons to decide whether to exit the iteration,
however Traverse2 only needs 2*10001+1 = 20003 comparisons.
Traverse1 needs 1.5 times then number of comparisons of Traverse2.
Here I wrote a test about access speed of local, member, volatile member:
public class VolatileTest {
public int member = -100;
public volatile int volatileMember = -100;
public static void main(String[] args) {
int testloop = 10;
for (int i = 1; i <= testloop; i++) {
System.out.println("Round:" + i);
VolatileTest vt = new VolatileTest();
vt.runTest();
System.out.println();
}
}
public void runTest() {
int local = -100;
int loop = 1;
int loop2 = Integer.MAX_VALUE;
long startTime;
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
}
for (int j = 0; j < loop2; j++) {
}
}
System.out.println("Empty:" + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
local++;
}
for (int j = 0; j < loop2; j++) {
local--;
}
}
System.out.println("Local:" + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
member++;
}
for (int j = 0; j < loop2; j++) {
member--;
}
}
System.out.println("Member:" + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < loop; i++) {
for (int j = 0; j < loop2; j++) {
volatileMember++;
}
for (int j = 0; j < loop2; j++) {
volatileMember--;
}
}
System.out.println("VMember:" + (System.currentTimeMillis() - startTime));
}
}
And here is a result on my X220 (I5 CPU):
Round:1
Empty:5
Local:10
Member:312
VMember:33378
Round:2
Empty:31
Local:0
Member:294
VMember:33180
Round:3
Empty:0
Local:0
Member:306
VMember:33085
Round:4
Empty:0
Local:0
Member:300
VMember:33066
Round:5
Empty:0
Local:0
Member:303
VMember:33078
Round:6
Empty:0
Local:0
Member:299
VMember:33398
Round:7
Empty:0
Local:0
Member:305
VMember:33139
Round:8
Empty:0
Local:0
Member:307
VMember:33490
Round:9
Empty:0
Local:0
Member:350
VMember:35291
Round:10
Empty:0
Local:0
Member:332
VMember:33838
It surprised me that access to volatile member is 100 times slower than normal member. I know there is some highlight feature about volatile member, such as a modification to it will be visible for all thread immediately, access point to volatile variable plays a role of "memory barrier". But can all these side effect be the main cause of 100 times slow?
PS: I also did a test on a Core II CPU machine. It is about 9:50, about 5 times slow. seems like this is also related to CPU arch. 5 times is still big, right?
The volatile members are never cached, so they are read directly from the main memory.
Acess to volatile prevents some JIT optimisaton. This is especially important if you have a loop which doesn't really do anything as the JIT can optimise such loops away (unless you have a volatile field) If you run the loops "long" the descrepancy should increase more.
In more realistic test, you might expect volatile to take between 30% and 10x slower for cirtical code. In most real programs it makes very little difference because the CPU is smart enough to "realise" that only one core is using the volatile field and cache it rather than using main memory.
Access to a volatile variable prevents the CPU from re-ordering the instructions before and after the access, and this generally slows down execution.
Using volatile will read from the memory directly so that every core of cpu will get the change at next get from the variable, there's no cpu cache used, which will not use register, L1~L3 cache tech, reading from
register 1 clock cycle
L1 cache 4 clock cycle
L2 cache 11 clock cycle
L3 cache 30~40 clock cycle
Memory 100+ clock cycle
That's why your result is about 100 times slower when using volatile.