If I have a piece of code like so...
if(!hasRunMethod) {
runMethod();
hasRunMethod = true;
}
...and that code is being executed in a loop over and over, many times every second, even though the code inside is only called once, is this bad coding practice? If so, what should I do instead?
Quickly tested (on java version 1.8.0_05):
long start = System.nanoTime();
int run = 1_000_000;
for(int i = 0; i < run; ++i) {
if(alwaysTrue) {
}
}
long end = System.nanoTime();
end - start averages ~1,820,000 nano seconds.
this:
long start = System.nanoTime();
int run = 1_000_000;
for(int i = 0; i < run; ++i) {
// if(alwaysTrue) {
//
// }
}
long end = System.nanoTime();
end - start averages ~1,556,000 nano seconds.
as an added bonus:
long start = System.nanoTime();
int run = 1_000_000;
for(int i = 0; i < run; ++i) {
if(true) {
}
}
long end = System.nanoTime();
end - start averages ~1,542,000 nano seconds, same as commented out.
Conclusion
if(someBool){} inside a loop has some performance impact. But it's so negligible I find it hard to think of a bottleneck sensitive enough for it to matter.
Related
I am trying to get familiar with java multithreaded applications. I tried to think of a simple application that can be parallelized very well. I thought vector addition would be a good application to do so.
However, when running on my linux server (which has 4 cores) I dont get any speed up. The time to execute on 4,2,1 threads is about the same.
Here is the code I came up with:
public static void main(String[]args)throws InterruptedException{
final int threads = Integer.parseInt(args[0]);
final int length= Integer.parseInt(args[1]);
final int balk=(length/threads);
Thread[]th = new Thread[threads];
final double[]result =new double[length];
final double[]array1=getRandomArray(length);
final double[]array2=getRandomArray(length);
long startingTime =System.nanoTime();
for(int i=0;i<threads;i++){
final int current=i;
th[i]=new Thread(()->{
for(int k=current*balk;k<(current+1)*balk;k++){
result[k]=array1[k]+array2[k];
}
});
th[i].start();
}
for(int i=0;i<threads;i++){
th[i].join();
}
System.out.println("Time needed: "+(System.nanoTime()-startingTime));
}
length is always a multiple of threads and getRandomArray() creates a random array of doubles between 0 and 1.
Execution Time for 1-Thread: 84579446ns
Execution Time for 2-Thread: 74211325ns
Execution Time for 4-Thread: 89215100ns
length =10000000
Here is the Code for getRandomArray():
private static double[]getRandomArray(int length){
Random random =new Random();
double[]array= new double[length];
for(int i=0;i<length;i++){
array[i]=random.nextDouble();
}
return array;
}
I would appreciate any help.
The difference is observable for the following code. Try it.
public static void main(String[]args)throws InterruptedException{
for(int z = 0; z < 10; z++) {
final int threads = 1;
final int length= 100_000_000;
final int balk=(length/threads);
Thread[]th = new Thread[threads];
final boolean[]result =new boolean[length];
final boolean[]array1=getRandomArray(length);
final boolean[]array2=getRandomArray(length);
long startingTime =System.nanoTime();
for(int i=0;i<threads;i++){
final int current=i;
th[i]=new Thread(()->{
for(int k=current*balk;k<(current+1)*balk;k++){
result[k]=array1[k] | array2[k];
}
});
th[i].start();
}
for(int i=0;i<threads;i++){
th[i].join();
}
System.out.println("Time needed: "+(System.nanoTime()-startingTime)*1.0/1000/1000);
boolean x = false;
for(boolean d : result) {
x |= d;
}
System.out.println(x);
}
}
First things first you need to warmup your code. This way you will measure compiled code. The first two iterations have the same(approximately) time but the next will differ. Also I changed double to boolean because my machine doesn't have much memory. This allows me to allocate a huge array and it also makes work more CPU consuming.
There is a link in comments. I suggest you to read it.
Hi from my side if you are trying to see how your cores shares work you can make very simple task for all cores, but make them to work constantly on something not shared across different threads (basically to simulate for example merge sort, where threads are working on something complicated and use shared resources in a small amount of time). Using your code i did something like this. In such case you should see almost exactly 2x speed up and 4 times speed up.
public static void main(String[]args)throws InterruptedException{
for(int a=0; a<5; a++) {
final int threads = 2;
final int length = 10;
final int balk = (length / threads);
Thread[] th = new Thread[threads];
System.out.println(Runtime.getRuntime().availableProcessors());
final double[] result = new double[length];
final double[] array1 = getRandomArray(length);
final double[] array2 = getRandomArray(length);
long startingTime = System.nanoTime();
for (int i = 0; i < threads; i++) {
final int current = i;
th[i] = new Thread(() -> {
Random random = new Random();
int meaningless = 0;
for (int k = current * balk; k < (current + 1) * balk; k++) {
result[k] = array1[k] + array2[k];
for (int j = 0; j < 10000000; j++) {
meaningless+=random.nextInt(10);
}
}
});
th[i].start();
}
for (int i = 0; i < threads; i++) {
th[i].join();
}
System.out.println("Time needed: " + ((System.nanoTime() - startingTime) * 1.0) / 1000000000 + " s");
}
}
You see, in your code most time is consumed by building big table, and then threads are executing very fast, their work is so fast that your calculation of time is wrong because most of time is consumed by creating threads. When i invoked code which works on precalculated loop like this:
long startingTime =System.nanoTime();
for(int k=0; k<length; k++){
result[k]=array1[k]|array2[k];
}
System.out.println("Time needed: "+(System.nanoTime()-startingTime));
It worked two times faster than your code with 2 threads. I hope that you understand what i mean in this case and will see my point when i gave my threads much more meaningless work.
I have written the below code to observe the timing of a loop function. Surprisingly, It gives me different values for each run.
public static void main(String[] args) {
for (int attempt = 0; attempt < 10; attempt++) {
runloop();
}
}
public static void runloop() {
long sum = 0L;
long starttime = System.nanoTime();
for (int x = 0; x < 1000000; x++) {
sum += x;
}
long end = System.nanoTime();
System.out.println("Time taken:" + (end - starttime) / 1000L);
}
}
Observation :
Time taken:4062
Time taken:3122
Time taken:2707
Time taken:2445
Time taken:3575
Time taken:2823
Time taken:2228
Time taken:1816
Time taken:1839
Time taken:1811
I am not able to understand why there is such a difference in the timing.
What is the reason ?
It could be anything:
Other processes running on your computer limiting the time given to Java
Run of the garbage collector
Loop initialization time
...
My partner and I are attempting to program a LinkedList data structure. We have completed the data structure, and it functions properly with all required methods. We are required to perform a comparative test of the runtimes of our addFirst() method in our LinkedList class vs. the add(0, item) method of Java's ArrayList structure. The expected complexity of the addFirst() method for our LinkedList data structure is O(1) constant. This held true in our test. In timing the ArrayList add() method, we expected a complexity of O(N), but we again received a complexity of approximately O(1) constant. This appeared to be a strange discrepancy since we are utilizing Java's ArrayList. We thought there may be an issue in our timing structure, and we would be most appreciative if any one could help us identify our problem. Our Java code for the timing of both methods is listed below:
public class timingAnalysis {
public static void main(String[] args) {
//timeAddFirst();
timeAddArray();
}
public static void timeAddFirst()
{
long startTime, midTime, endTime;
long timesToLoop = 10000;
int inputSize = 20000;
MyLinkedList<Long> linkedList = new MyLinkedList<Long>();
for (; inputSize <= 1000000; inputSize = inputSize + 20000)
{
// Clear the collection so we can add new random
// values.
linkedList.clear();
// Let some time pass to stabilize the thread.
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000)
{ }
// Start timing.
startTime = System.nanoTime();
for (long i = 0; i < timesToLoop; i++)
linkedList.addFirst(i);
midTime = System.nanoTime();
// Run an empty loop to capture the cost of running the loop.
for (long i = 0; i < timesToLoop; i++)
{} // empty block
endTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop from
// the cost of running the loop and computing the removeAll method.
// Average it over the number of runs.
double averageTime = ((midTime - startTime) - (endTime - midTime)) / timesToLoop;
System.out.println(inputSize + " " + averageTime);
}
}
public static void timeAddArray()
{
long startTime, midTime, endTime;
long timesToLoop = 10000;
int inputSize = 20000;
ArrayList<Long> testList = new ArrayList<Long>();
for (; inputSize <= 1000000; inputSize = inputSize + 20000)
{
// Clear the collection so we can add new random
// values.
testList.clear();
// Let some time pass to stabilize the thread.
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000)
{ }
// Start timing.
startTime = System.nanoTime();
for (long i = 0; i < timesToLoop; i++)
testList.add(0, i);
midTime = System.nanoTime();
// Run an empty loop to capture the cost of running the loop.
for (long i = 0; i < timesToLoop; i++)
{} // empty block
endTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop from
// the cost of running the loop and computing the removeAll method.
// Average it over the number of runs.
double averageTime = ((midTime - startTime) - (endTime - midTime)) / timesToLoop;
System.out.println(inputSize + " " + averageTime);
}
}
}
You want to test for different inputSize, but you perform the operation to test timesToLoop times, which is constant. So of course, it takes the same time. You should use:
for (long i = 0; i < inputSize; i++)
testList.add(0, i);
As per my knowledge, Arraylist add operaion runs in O(1) time, so the results of your experiment are correct. I think the constant time for arrayList add method is amortized constant time.
As per java doc :
adding n elements require O(N) time so that is why the amortized constant time for adding.
I have 2 methods in java (for example Factorial calculation) and i have to test these 2 methods to find which one is faster. I have that code as Recursion and as for loop:
They both in the same Class data.
public long FakultaetRekursiv( int n){
if(n == 1){
return 1;
}
else{
return FakultaetRekursiv(n-1) * n;
}
}
public long Fakultaet( int n){
int x=1;
for(int i=1; i<=n; i++){
x= x*i;
}
return x;
}
I heard currentTimeMillis() could help a little but i dont know how to do exactly.
Thanks.
Micro-benchmarking is hard, use the right tools, for example Caliper. Here is an example that will work for you:
import com.google.caliper.SimpleBenchmark;
public class Benchmark extends SimpleBenchmark {
#Param({"1", "10", "100"}) private int arg;
public void timeFakultaet(int reps) {
for (int i = 0; i < reps; ++i) {
Fakultaet(arg);
}
}
public void timeFakultaetRekursiv(int reps) {
for (int i = 0; i < reps; ++i) {
FakultaetRekursiv(arg);
}
}
}
The framework will run tour time*() methods a lot of times, moreover it will inject different arg values and bechmark them separately.
Always go by the basics ! Just use this to find the time taken by each of the functions
long startTime = System.nanoTime();
methodToTime();
long endTime = System.nanoTime();
long duration = endTime - startTime;
long start = System.currentTimeMillis();
// your code here
System.out.println(System.currentTimeMillis() - start + "ms");
You can also do it by hand:
The first method can be described with a recurrence relation of F(x) = F(x-1) * x which generates the pattern...
F(x) = F(x-1) * x
= F(x-2)*x*(x-1)
= F(x-3)*x*(x-1)*(x-2)
. . .
= k*n
which is O(n).
Obviously, the second method can be described by O(n) as well, which means they are in the same upper bound. But this can be used as a quick check before implementing timing solutions.
This question is identical to this
Two loop bodies or one (result identical)
but in my case, I use Java.
I have two loops that runs a billion times.
int a = 188, b = 144, aMax = 0, bMax = 0;
for (int i = 0; i < 1000000000; i++) {
int t = a ^ i;
if (t > aMax)
aMax = t;
}
for (int i = 0; i < 1000000000; i++) {
int t = b ^ i;
if (t > bMax)
bMax = t;
}
The time it takes to run these two loops in my machine is appr 4 secs. When I fuse these two loops into a single loop and perform all the operations in that single loop, then it runs in 2 secs. As you can see trivial operations makes up the loop contents, thus requiring constant time.
My question is where am I getting this performance improvement?
I am guessing that the only possible place where performance gets affected in the two separate loops is that it increments i and checks if i < 1000000000 2 billion times vs only 1 billion times if I fuse the loops together. Is anything else going on in there?
Thanks!
If you don't run a warm-up phase, it is possible that the first loop gets optimised and compiled but not the second one, whereas when you merge them the whole merged loop gets compiled. Also, using the server option and your code, most gets optimised away as you don't use the results.
I have run the test below, putting each loop as well as the merged loop in their own method and warmimg-up the JVM to make sure everything gets compiled.
Results (JVM options: -server -XX:+PrintCompilation):
loop 1 = 500ms
loop 2 = 900 ms
merged loop = 1,300 ms
So the merged loop is slightly faster, but not that much.
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 3; i++) {
loop1();
loop2();
loopBoth();
}
long start = System.nanoTime();
loop1();
long end = System.nanoTime();
System.out.println((end - start) / 1000000);
start = System.nanoTime();
loop2();
end = System.nanoTime();
System.out.println((end - start) / 1000000);
start = System.nanoTime();
loopBoth();
end = System.nanoTime();
System.out.println((end - start) / 1000000);
}
public static void loop1() {
int a = 188, aMax = 0;
for (int i = 0; i < 1000000000; i++) {
int t = a ^ i;
if (t > aMax) {
aMax = t;
}
}
System.out.println(aMax);
}
public static void loop2() {
int b = 144, bMax = 0;
for (int i = 0; i < 1000000000; i++) {
int t = b ^ i;
if (t > bMax) {
bMax = t;
}
}
System.out.println(bMax);
}
public static void loopBoth() {
int a = 188, b = 144, aMax = 0, bMax = 0;
for (int i = 0; i < 1000000000; i++) {
int t = a ^ i;
if (t > aMax) {
aMax = t;
}
int u = b ^ i;
if (u > bMax) {
bMax = u;
}
}
System.out.println(aMax);
System.out.println(bMax);
}
In short, the CPU can execute the instructions in the merged loop in parallel, doubling performance.
Its also possible the second loop is not optimised efficiently. This is because the first loop will trigger the whole method to be compiled and the second loop will be compiled without any metrics which can upset the timing of the second loop. I would place each loop in a separate method to make sure this is not the case.
The CPU can perform a large number of independent operation in parallel (depth 10 on Pentium III and 20 in the Xeon). One operation it attempts to do in parallel is a branch, using branch prediction, but if it doesn't take the same branch almost every time.
I suspect with loop unrolling your loop looks more like following (possibly more loop unrolling in this case)
for (int i = 0; i < 1000000000; i += 2) {
// this first block is run almost in parallel
int t1 = a ^ i;
int t2 = b ^ i;
int t3 = a ^ (i+1);
int t4 = b ^ (i+1);
// this block run in parallel
if (t1 > aMax) aMax = t1;
if (t2 > bMax) bMax = t2;
if (t3 > aMax) aMax = t3;
if (t4 > bMax) bMax = t4;
}
Seems to me that in the case of a single loop the JIT may opt to do loop unrolling and as a result the performance is slightly better
Did you use -server? If no, you should - the client JIT is not a as predictable, neither as good. If you are really interested in what exactly is going on, you can use UnlockDiagnostic + LogCompilation to check what optimizations are being applied in both cases (all the way down to the generated assembly).
Also, from the code you provided I can't see whether you do warmup, whether you run your test one or multiple times for the same JVM, whether you did it a couple of runs (different JVMs). Whether you are taking into account the best, the average or the median time, do you throw out outliers?
Here is a good link on the subject of writing Java micro-benchmarks: http://www.ibm.com/developerworks/java/library/j-jtp02225/index.html
Edit: One more microbenchmarking tip, beware of on-the-stack replacement: http://www.azulsystems.com/blog/cliff/2011-11-22-what-the-heck-is-osr-and-why-is-it-bad-or-good