In Java, C# or C++, suppose we have a very common situation where we need to iterate a large amount of times and execute a function doX, but on only one of the iterations we should execute a function doY.
int index = 123456;
for(int i = 0; i < 1000000; i++)
{
if(i == index) doY();
else doX();
}
In situations where I see a real performance issue, I usually break the loop in 2, but this can be very painful, especially if the loop's body is large. Does the compiled code really check for the condition on every iteration, or can it be optimized by the compiler? Furthermore, if index is not a constant at compile time, can there be such optimizations?
This usually won't cause a huge performance issue. This is due to branch predicting. Refer to this famous question.
Branch predicting is basically assembly's way of guessing which way the if-statement will evaluate to. If it guesses right, it will take almost no time. If it guesses wrong, it will backtrack and cause performance issues. The branch predictor will generally use it's previous branched route as the "guess" for the next branch.
Since your if-statement evaluates to false nearly time. The branch predictor will predict correctly nearly every time.
So to answer your question "Does the compiled code really check for the condition on every iteration?". No it does not. Though it's not optimized by the compiler, but by the assembly pipeline itself.
Related
public static ArrayList<Integer> duplicates(int[] arr) {
ArrayList<Integer> doubles = new ArrayList<Integer>();
boolean isEmpty = true;
for(int i = 0; i<arr.length; i++) {
for (int j = i+1; j< arr.length; j++) {
if( arr[i] == arr[j] && !doubles.contains(arr[i]) ){
doubles.add(arr[i]);
isEmpty = false;
break;
}
}
}
if(isEmpty) doubles.add(-1);
Collections.sort(doubles);
return doubles;
}
public static void main(String[] args) {
System.out.println( ( duplicates( new int[]{1,2,3,4,4,4} ) ) ); // Return: [4]
}
I made this function in Java which returns multiples of an input int array or returns a -1 if the input array is empty or when there are no multiples.
It works, but there is probably a way to make it faster.
Are there any good practices to make functions more efficient and faster in general?
There are, in broad strokes, 2 completely unrelated performance improvements you can make:
Reduce algorithmic complexity. This is a highly mathematical concept.
Reduce actual performance characteristics - literally, just make it run faster and/or use less memory (often, 'use less memory' and 'goes faster' go hand in hand).
The first is easy enough, but can be misleading: You can write an algorithm that does the same job in an algorithmically less complex way which nevertheless actually runs slower.
The second is also tricky: Your eyeballs and brain cannot do the job. The engineers that write the JVM itself are on record as stating that in general they have no idea how fast any given code actually runs. That's because the JVM is way too complicated: It has so many complicated avenues for optimizing how fast stuff runs (not just complicated in the code that powers such things, also complicated in how they work. For example, hotspot kicks in eventually, and uses the characteristics of previous runs to determine how best to rewrite a given method into finely tuned machine code, and the hardware you run it on also matters rather a lot).
This leads to the following easy conclusions:
Don't do anything unless there is an actual performance issue.
You really want a profiler report that actually indicates which code is 'relevant'. Generally, for any given java app, literally 1% of all of your lines of code is responsible for 99% of the load. There is just no point at all optimizing anything, except that 1%. A profiler report is useful in finding the 1% that requires the attention. Java ships with a profiler and there are commercial offerings as well.
If you want to micro-benchmark (time a specific slice of code against specific inputs), that's really difficult too, with many pitfalls. There's really only one way to do it right: Use the Java Microbenchmark Harness.
Whilst you can decide to focus on algorithmic complexity, you may still want a profiler report or JMH run because algorithmic complexity is all about 'Eventually, i.e. with large enough inputs, the algorithmic complexity overcomes any other performance aspect'. The trick is: Are your inputs large enough to hit that 'eventually' space?
For this specific algorithm, given that I have no idea what reasonable inputs might be, you're going to have to do the work on setting up JMH and or profiler runs. However, as far as algorithmic complexity goes:
That doubles.contains call has O(N) algorithmic complexity: The amount of time that call takes is linear relative to how large your inputs are.
You can get O(1) algorithmic complexity if you use a HashSet instead.
From the point of view of just plain performance, generally an ArrayList's performance and memory load vs. an int[] is quite large.
This gives 2 alternate obvious strategies to optimize this code:
Replace the ArrayList<Integer> with an int[].
Replace the ArrayList<integer> with a HashSet<Integer> instead.
You can't really combine these two, not without spending a heck of a long time handrolling a primitive int array backed hashbucket implementation. Fortunately, someone did the work for you: Eclipse Collections has a primitive int hashset implementation.
Theoretically it's hard to imagine how replacing this with IntHashSet can be slower. However, I can't go on record and promise you that it'll be any faster: I can imagine if your input is an int array with a few million ints in there, IntHashSet is probably going to be many orders of magnitude faster. But you really need test data and a profiler report and/or a JMH run or we're all just guessing, which is a bad idea, given that the JVM is such a complex beast.
So, if you're serious about optimizing this:
Write a bunch of test cases.
Write a wrapper around this code so you can run those tests in a JMH setup.
Replace the code with IntHashSet and compare that vs. the above in your JMH harness.
If that really improves things and the performance now fits your needs, great. You're done.
If not, you may have to re-evaluate where and how you use this code, or if there's anything else you can do to optimize things.
It works, but there is probably a way to make it faster.
I think you will find this approach significantly faster. I omitted the sort from both methods just to check. This does not discuss general optimizations as rzwitserloot's excellent answer already does that.
The two main problems with your method are:
you are using a nested loop which is essentially is an O(N*N) problem.
and you use contains on a list which must do a linear search each time to find the value.
A better way is to use a HashSet which works very close to O(1) lookup time (relatively speaking and depending on the set threshold values).
The idea is as follows.
Create two sets, one for the result and one for what's been seen.
iterate over the array
try to add the value to the seen set, if it returns true, that means a duplicate is not in the seen set so it is ignored.
if it returns false, a duplicate does exist in the seen set so it is added to the duplicate set.
Note the use of the bang ! to invert the above conditions.
once the loop is finished, return the duplicates in a list as required.
public static List<Integer> duplicatesSet(int[] arr) {
Set<Integer> seen = new HashSet<>();
Set<Integer> duplicates = new HashSet<>();
for (int v : arr) {
if (!seen.add(v)) {
duplicates.add(v);
}
}
return duplicates.isEmpty()
? new ArrayList<>(List.of(-1))
: new ArrayList<>(duplicates);
}
The sort is easily added back in. That will take additional computing time but that was not the real problem.
To test this I generated a list of random values and put them in an array. The following generates an array of 1,000,000 ints between 1 and 1000 inclusive.
Random r = new Random();
int[] val = r.ints(1_000_000, 1, 1001).toArray();
A while back, I was reading up on some Android performance tips when I came by:
Foo[] mArray = ...
public void zero() {
int sum = 0;
for (int i = 0; i < mArray.length; ++i) {
sum += mArray[i].mSplat;
}
}
public void one() {
int sum = 0;
Foo[] localArray = mArray;
int len = localArray.length;
for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;
}
}
Google says:
zero() is slowest, because the JIT can't yet optimize away the cost of getting the array length once for every iteration through the loop.
one() is faster. It pulls everything out into local variables, avoiding the lookups. Only the array length offers a performance benefit.
Which made total sense. But after thinking way too much about my computer architecture exam I remembered Branch Predictors:
a branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline.
Isn't the computer assuming i < mArray.length is true and thus, computing the loop condition and the body of the loop in parallel (and only predicting the wrong branch on the last loop) , effectively removing any performance loses?
I was also thinking about Speculative Execution:
Speculative execution is an optimization technique where a computer system performs some task that may not be actually needed... The objective is to provide more concurrency...
In this case, the computer would be executing the code both as if the loop had finished and as if it was still going concurrently, once again, effectively nullifying any computational costs associated with the condition (since the computer's already performing computations for the future while it computes the condition)?
Essentially what I'm trying to get at is the fact that, even if the condition in zero() takes a little longer to compute than one(), the computer is usually going to compute the correct branch of code while it's waiting to retrieve the answer to the conditional statement anyway, so the performance loss in the lookup to myAray.length shouldn't matter (that's what I thought anyway).
Is there something I'm not realizing here?
Sorry about the length of the question.
Thanks in advance.
The site you linked to notes:
zero() is slowest, because the JIT can't yet optimize away the cost of getting the array length once for every iteration through the loop.
I haven't tested on Android, but I'll assume that this is true for now. What this means is that for every iteration of the loop the CPU has to execute code that loads the value of mArray.length from memory. The reason is that the length of the array may change so the compiler can't treat it as a static value.
Whereas in the one() option the programmer explicitly sets the len variable based on knowledge that the array length won't change. Since this is a local variable the compiler can store it in a register rather than loading it from memory in each loop iteration. So this will reduce the number of instructions executed in the loop, and it will make the branch easier to predict.
You are right that branch prediction helps reduce the overhead associated with the loop condition check. But there is still a limit to how much speculation is possible so executing more instructions in each loop iteration can incur additional overhead. Also many mobile processors have less advanced branch predictors and don't support as much speculation.
My guess is that on a modern desktop processor using an advanced Java JIT like HotSpot that you would not see a 3X performance difference. But I don't know for certain, it could be an interesting experiment to try.
Am I right in saying that the time complexity in big O notation would just be O(1)?
public boolean size() {
return (size == 0);
}
Am I right in saying that the time complexity in big O notation would just be O(1)?
No.
This is so common a misconception among students/pupils that I can only constantly repeat this:
Big-O notation is meant to give the complexity of something, with respect to a certain measure, over another number:
For example, saying:
"The algorithm for in-place FFT has a space requirement of O(n), with n being the number of FFT bins"
says something about how much the FFT will need in memory, observed for different lengths of the FFT.
So, you don't specify
What is the thing you're actually observing? Is it the time between calling and returning from your method? Is it the comparison alone? Is "time" measured in Java bytecode instructions, or real machine cycles?
What do you vary? The number of calls to your method? The variable size?
What is it that you actually want to know?
I'd like to stress 3.: Computer science students often think that they know how something will behave if they just know the theoretical time complexity of an algorithm. In reality, these numbers tend to mean nothing. And I mean that. A single fetching of a variable that is not in the CPU cache can take the time of 100-10000 additions in the CPU. Calling a method just to see whether something is 0 will take a few dozen instructions if directly compiled, and might take a lot more if you're using something that is (semi-)interpreted like Java; however, in Java, the next time you call that same method, it might already be there as precompiled machine code...
Then, if your compiler is very smart, it might not only inline the function, eliminating the stack save/restore and call/return instructions, but possibly even merging the result into whatever instructions you were conditioning on that return value, which in essence means that this function, in an extreme case, might not take a single cycle to execute.
So, no matter how you put this, you can not say "time complexity in big O of something that is a language specific feature" without saying what you vary, and exactly what your platform is.
From Sonar Metrics complexity page the following method has a complexity of 5.
public void process(Car myCar){ <- +1
if(myCar.isNotMine()){ <- +1
return; <- +1
}
car.paint("red");
car.changeWheel();
while(car.hasGazol() && car.getDriver().isNotStressed()){ <- +2
car.drive();
}
return;
}
This is how the tool calculate complexity:
Keywords incrementing the complexity: if, for, while, case, catch,
throw, return (that is not the last statement of a method), &&, ||, ?
Why do case statements,if blocks and while blocks increase the complexity of the method? What is the intuition behind this metric calculation of complexity of methods?
It's because they have conditions in them which increase the number of tests needed to ensure that the code is correct.
Also probably ifs have less complexity than loops (while, for). Also read up on cyclomatic complexity related to this.
Read this blog post, it describes the actual reality of not being able to test everything and the sheer number of tests you require to test everything.
Maybe it is based on the Cyclomatic Complexity by McCabe (at least looks like it).
This metric is widely used in the Software Engineering field.
Take a look at this: http://en.wikipedia.org/wiki/Cyclomatic_complexity
Somar measures cyclomatic complexity, which represents the number of linearly independent paths through the source code.
The key to answering your question comes from a research paper of Thomas McCabe, published in December of 1976:
It can be shown that the cyclomatic complexity of any structured program with only one entrance point and one exit point is equal to the number of decision points (i.e., 'if' statements or conditional loops) contained in that program plus one.
This is precisely what Sonar does: it finds the decision points, which come from loops, conditional statements, and multipart boolean expressions, and counts their number.
what is the complexity of a program having only one loop, is it log n?
can someone give me some ideas about estimating complexity of codes?
Well, that really depends on what is going on in that loop.
This loop is linear time, i.e., O(n):
int sum = 0;
foreach( int i in SomeCollection )
{
sum += i;
}
However, consider a loop which performs a substring search during each iteration. Now you have to consider the complexity of the string searching algorithm. Your question can't be answered as it stands. You will need to provide a code sample if you want a meaningful answer.
Software Complexity
There is a field of study that attempts to quantify exactly that.
It's called cyclomatic complexity.
In this case, I believe your complexity rating would be p = 2, because the loop is conditional and that means there are two paths through the method.
Time complexity
If you are referring to time complexity, it's still just O(1) unless the loop iteration count is derived via a polynomial or exponential function, perhaps indirectly because the method is itself called in a higher loop, or because the loop counter is effectively multiplied by string operations or something at a lower level.
To get the Big-O complexity of a piece of code, you need to ask yourself "how many iterations are done"?
The problem is of course, that it isn't always easy to figure it out from the code itself, sometimes it's just better to look at the big picture and calculate the amount of operations done.
Examples:
For (i = 0 ; i < n ; i++ ) {
//Do stuff
}
This is a complexity
For (i = n ; i > 0 ; i= i/2 ) {
//Do stuff
}
This is a complexity... Because in each iteration i is cut in half.
void Mess (int n) {
for (i = 0 ; i < n ; i++) {
//Do stuff
Mess(n-1);
}
}
Now this looks like a simple loop, nut because it calls itself with recursion, it's actually quite a mess.... Each iteration calls itself n times with n-1.
So here it would be easier to think from the end. If n == 1 , there's 1 iteration. If n == 2 then it calls the previous scenario twice.
So if we'll call the function , we can see what we'll get this recursively:
Which in the end will of course give us n!
Bottom line, it's not always trivial.
If there's just one loop, it's probably the number of times that loop's body executes.... But of course you may have many hidden loops in library calls. It's easy to make a loop that executes n times O(n^2) or worse if you have strlen, memcpy, etc. taking place inside the loop body.
Or even worse, if you have a library (or language) with regular expressions and their regex implementation is naive (like Perl), it's easy to make a program with just a single loop O(2^n)!! This cannot happen with a correctly written regex implementation.
You can easily predict the computation time and complexity of your code with some tools such as "trend-prof"(https://pdfs.semanticscholar.org/8616/7f320e230641299754e0fbd860c44f5895f0.pdf)
For the same purpose on R codes, the GuessCompx library available in Github.
If it is Big-O time complexity you are asking about, then for loop it is n times complexity of whatever is within the loop, where n is loop count limit.
So. if the code inside loop is taking constant time to execute, i.e. if its time complexity is O(1), then the complexity of the loop will be O(1*n) = O(n).
In the similar way, if within the loop you have another loop which will make m steps, then your entire code is O(n*m), and so on.