Time complexity measure of JDK class methods - java

Is there an established way of measuring (or getting an existing measure) a JDK class method complexity? Is javap representative of time complexity and to what degree. In particular, I am interested in the complexity of Arrays.sort() but also some other collections manipulation methods.
E.g. I am trying to compare two implementations for performance, one is using Arrays.sort() and one doesn't. The javap disassembly for that doesn't returns a lot more steps (twice as many) but I am not sure if the one that does excludes the Arrays.sort() steps. IOW, does javap of one method include a recursive measure of the methods invoked within or just for that method?
Also, is there a way, without modifying and recompiling the Java code itself, to find how many loop iterations were done when a certain base Java method was invoked on specific parameters? E.g. measure the number of iterations of Arrays.sort('A', 'r', 'T', 'f')?

I would not expect javap to be even a little bit representative of actual speed.
The Javadoc specifies the algorithmic complexity, but if you care about constant factors there is absolutely no way to realistically compare constant factors except with actual benchmarks.
You can't get any information on what was done when Arrays.sort is called on a primitive array, but by passing a custom Comparator that counts the number of calls, you can count the number of comparisons made when sorting an object array. (That said, object arrays are sorted with a different sorting algorithm -- specifically a stable one -- and primitive arrays are sorted with a Quicksort variant.)

you can use the output from javap to determine where loops occur you want to find the goto instruction. This post gives a comprehensive explanation of that identification.
From the post:
Before considering any loop start/exit instrumentation, you should
look into the definitions of what entry, exit and successors are.
Although a loop will only have one entry point, it may have multiple
exit points and/or multiple successors, typically caused by break
statements (sometimes with labels), return statements and/or
exceptions (explicitly caught or not). While you haven't given details
regarding the kind of instrumentations you're investigating, it's
certainly worth considering where you want to insert code (if that's
what you want to do). Typically, some instrumentation may have to be
done before each exit statement or instead of each successor statement
(in which case you'll have to move the original statement).

Arrays.sort() for primitives uses tuned quicksort. For Object uses mergesort (but this is depends on implementation).
From: Arrays
For example, the algorithm used by sort(Object[]) does not have to be
a mergesort, but it does have to be stable

Related

Why are there differences in the executed bytecode of a java program logged with -XX:TraceBytecodes

I'm trying to understand how the java interpreter works.
To see exactly what bytecodes are executed i build myself a jdk fastdebug build and used the -XX:+TraceBytecodes option.
Additionally i turned off the JIT-Compiler with -XX:-UseCompiler.
My expectation was that the bytecodes are the same for multiple runs of the same program. I noticed that there are always differences like some bytecode parts get executed earlier or later and the total sum of bytecodes differs from run to run.
Why is that?
To my knowledge the java interpreter can not optimize the code and always runs the same instructions in the same order every run.
Edit:
public class TestSimple2 {
public static void main(String[] args) throws Exception {
System.out.println("start prog");
System.out.println("end prog");
}
}
Code execution is not always deterministic and in this specific case, it’s deliberate. However, the methods shown in the trace are not invoked by your code, so this must be part of the internal startup/class initialization code.
Apparently, the code in question iterates over a Set created via one of the Set.of(…) methods introduced with Java 9, with more than two elements.
In this case, the implementation randomizes the iteration order. As Stuart Marks, one of the core developers, explains in this answer:
Hashed Collection Iteration Order. The new Set.of and Map.of structures randomize their iteration order. The iteration order of HashSet and HashMap is undefined, but in practice it turns out to be relatively stable. Code can develop inadvertent dependencies on iteration order. Switching to the new collection factories may expose old code to iteration order dependencies, surfacing latent bugs.
In another answer, he also explains:
In any case, another reason for randomized iteration order is to preserve flexibility for future implementation changes.
This turns out to be a bigger deal than most people think. Historically, HashSet and HashMap have never specified a particular iteration order. From time to time, however, the implementation needed to change, to improve performance or to fix bugs. Any change to iteration order generated a lot of flak from users. Over the years, a lot of resistance built up to changing iteration order, and this made maintenance of HashMap more difficult.
You can read the linked answer for more details regarding the motivation, but one implementation detail is important to understand the difference in the trace of executed byte code instructions:
… Initially the order changed on every iteration, but this imposed some overhead. Eventually we settled on once per JVM invocation. The cost is a 32-bit XOR operation per table probe, which I think is pretty cheap.
This has changed slightly between Java 9 and recent versions, the former used int idx = Math.floorMod(pe.hashCode() ^ SALT, elements.length); when probing for a location, e.g. within contains, the newer versions use idx = Math.floorMod(SALT, table.length >> 1) << 1; when initializing an iterator with a starting point.
In either case, we end up calling Math.floorMod at one point with a value depending on SALT, which is the value different in each JVM invocation. floorMode invokes floorDiv internally, which is implemented as
public static int floorDiv(int x, int y) {
int r = x / y;
// if the signs are different and modulo not zero, round down
if ((x ^ y) < 0 && (r * y != x)) {
r--;
}
return r;
}
So here, we have a conditional depending on the incoming value, hence the SALT, which is the reason why we see different sequences of executed bytecode, as sometimes, the branch is taken and sometimes not. Note that the last instruction before the difference is ifeq, a conditional branch.
For the difference in the execution of the next method, we have to refer to yet another answer:
The current implementation of SetN is a fairly simple closed hashing scheme, as opposed to the separate chaining approach used by HashMap.
…
Thus we have a class space-time tradeoff. If we make the table larger, there will be empty slots sprinkled throughout the table. When storing items, there should be fewer collisions, and linear probing will find empty slots more quickly.
…
In bringing up the implementation, we ran a bunch of benchmarks using different expansion factors. […] We chose 2.0 since it got most of the performance improvement (close to O(1) time) while providing good space savings compared to HashSet.
So the internal array is twice as large as the Set’s actual size and contains null entries that have to be skipped when iterating. When we take into account that the iteration order has been randomized, it’s clear that this code may encounter the empty array slots at different times, hence, also cause differences in the reported executed byte code.
Note that the last instruction before the difference is ifnonnull, a conditional branch taken when the tested value is not null. Since the code between the branch instruction and its target bears an invocation of nextIndex(), I suppose, you ran the code under a JRE newer than Java 9¹.
¹ The difference is that Java 9 randomizes the actual array locations, which adds costs to the probing in the contains method, whereas newer versions use only hash code based array locations, but randomize the order right in the iterator, by using a SALT dependent starting index and direction, which adds slight costs to the iterator initialization instead.

Java collection performance when comparing items

A basic performance question from someone coming from C/C++.
I'm using a Collection (ArrayDeque) to simply hold, add, remove items by identity. I know the contract is for the collection to use equals() when checking equality, for example during remove(obj), but in my case I want to use reference semantics (like IdentityHashMap but don't need the map). So I am fine to just know that I will never override the equals() on any of the objects held inside the collection (which is declared to hold an interface).
Coming from native programming I can't avoid asking myself, will the compiled code of remove(obj) traverse items and perform a virtual call on Object.equals() only to end up comparing addresses? Since I'm storing interface references, there is no way (?) to optimise this using final so the compiler doesn't bother making the useless calls (i.e. inline them) - but now I'm getting ahead of myself because it may be such optimisation is not necessary anyway and JVM has other means (devirtualisation?) to generate optimal code in this case.
Assuming my code needs the level of optimisation that can be obtained by thinking about this aspect in the first place - is my understanding correct? What is a good design for this case?
Making the method final wont avoid the virtual call because invokevirtual opcode will be used anyway and there is no way for the JVM to tell if the method was final or not.
The good news is that the JVM might be able inline it or avoid the virtual call if it can't see that the method is overridden anywhere in the classpath so your performance will improve as your program runs.
When you use the remove method, it will call the equals method for comparison. Ideally, you should be overriding the equals and hashcode method to use such methods. Otherwise the by-default implementation of type-checking and address comparison happens. It is highly recommended to define your implementation of equals and hashcode methods while using methods of Collections.
Regarding the performance, yes you are right - all the objects in the collection will be scanned linearly till the JVM encounters correct match. It is a linear search, hence the time complexity for this operation of removal will take O(n) time.

How to decide between lambda iteration and normal loop?

Since he introduction of Java 8 I got really hooked to lambdas and started using them whenever possible, mostly to start getting accustomed to them. One of the most common usage is when we want to iterate and act upon a collection of objects in which case I either resort to forEach or stream(). I rarely write the old for(T t : Ts) loop and I almost forgot about the for(int i = 0.....).
However, we were discussing this with my supervisor the other day and he told me that lambdas aren't always the best choice and can sometimes hinder performance. From a lecture I had seen on this new feature I got the feeling that lambda iterations are always fully optimized by the compiler and will (always?) be better than bare iterations, but he begs to differ. Is this true? If yes how do I distinguish between the best solution in each scenario?
P.S: I'm not talking about cases where it is recommended to apply parallelStream. Obviously those will be faster.
Performance depends on so many factors, that it’s hard to predict. Normally, we would say, if your supervisor claims that there was a problem with performance, your supervisor is in charge of explaining what problem.
One thing someone might be afraid of, is that behind the scenes, a class is generated for each lambda creation site (with the current implementation), so if the code in question is executed only once, this might be considered a waste of resources. This harmonizes with the fact that lambda expressions have a higher initialization overhead as the ordinary imperative code (we are not comparing to inner classes here), so inside class initializers, which only run once, you might consider avoiding it. This is also in line with the fact, that you should never use parallel streams in class initializers, so this potential advantage isn’t available here anyway.
For ordinary, frequently executed code that is likely to be optimized by the JVM, these problems do not arise. As you supposed correctly, classes generated for lambda expressions get the same treatment (optimizations) as other classes. At these places, calling forEach on collections bears the potential of being more efficient than a for loop.
The temporary object instances created for an Iterator or the lambda expression are negligible, however, it might be worth noting that a foreach loop will always create an Iterator instance whereas lambda expression do not always do. While the default implementation of Iterable.forEach will create an Iterator as well, some of the most often used collections take the opportunity to provide a specialized implementation, most notably ArrayList.
The ArrayList’s forEach is basically a for loop over an array, without any Iterator. It will then invoke the accept method of the Consumer, which will be a generated class containing a trivial delegation to the synthetic method containing the code of you lambda expression. To optimize the entire loop, the horizon of the optimizer has to span the ArrayList’s loop over an array (a common idiom recognizable for an optimizer), the synthetic accept method containing a trivial delegation and the method containing your actual code.
In contrast, when iterating over the same list using a foreach loop, an Iterator implementation is created containing the ArrayList iteration logic, spread over two methods, hasNext() and next() and instance variables of the Iterator. The loop will repeatedly invoke the hasNext() method to check the end condition (index<size) and next() which will recheck the condition before returning the element, as there is no guaranty that the caller does properly invoke hasNext() before next(). Of course, an optimizer is capable of removing this duplication, but that requires more effort than not having it in the first place. So to get the same performance of the forEach method, the optimizer’s horizon has to span your loop code, the nontrivial hasNext() implementation and the nontrivial next() implementation.
Similar things may apply to other collections having a specialized forEach implementation as well. This also applies to Stream operations, if the source provides a specialized Spliterator implementation, which does not spread the iteration logic over two methods like an Iterator.
So if you want to discuss the technical aspects of foreach vs. forEach(…), you may use these information.
But as said, these aspects describe only potential performance aspects as the work of the optimizer and other runtime environmental aspects may change the outcome completely. I think, as a rule of thumb, the smaller the loop body/action is, the more appropriate is the forEach method. This harmonizes perfectly with the guideline of avoiding overly long lambda expressions anyway.
It depends on specific implementation.
In general forEach method and foreach loop over Iterator usually have pretty similar performance as they use similar level of abstraction. stream() is usually slower (often by 50-70%) as it adds another level that provides access to the underlying collection.
The advantages of stream() generally are the possible parallelism and easy chaining of the operations with lot of reusable ones provided by JDK.

Can Java compiler optimize adding to a set in recursive methods

Simple question asked mostly out of curiosity about what java compiler's are smart enough to do. I know not all compilers are built equally, but I'm wondering if others feel it's reasonable to expect an optimization on most compilers I'm likely to run against, not if it works on a specific version or on all versions.
So lets say that I have some tree structure and I want to collect all the descendant of a node. There are two easy ways to do this recursively.
The more natural method, for me, to do this would be something like this:
public Set<Node> getDescendants(){
Set<Node> descendants=new HashSet<Node>();
descendants.addall(getChildren());
for(Node child: getChildren()){
descendants.addall(child.getDescendants());
}
return descendants;
}
However, assuming no compiler optimizations and a decent sized tree this could get rather expensive. On each recursive call I create and fully populate a set, only to return that set up the stack so the calling method can add the contents of my returning set to it's version of the descendants set, discarding the version that was just built and populated in the recursive call.
So now I'm creating many sets just to have them be discarded as soon as I return their contents. Not only do I pay a minor initialization cost for building the sets, but I also pay the more substantial cost of moving all the contents of one set into the larger set. In large trees most of my time is spent moving Nodes around in memory from set A to B. I think this even makes my algorithm O(n^2) instead of O(n) due to the time spent copying Nodes; though it may work out to being O(N log(n)) if I set down to do the math.
I could instead have a simple getDescendants method that calls a helper method that looks like this:
public Set<Node> getDescendants(){
Set<node> descendants=new HashSet<Node>();
getDescendantsHelper(descendants);
return descendants;
}
public Set<Node> getDescendantsHelper(Set<Node> descendants){
descendants.addall(getChildren());
for(Node child: getChildren()){
child.getDescendantsHelper(descendant);
}
return nodes;
}
This ensures that I only ever create one set and I don't have to waste time copying from one set to another. However, it requires writing two methods instead of one and generally feels a little more cumbersome.
The question is, do I need to do option two if I'm worried about optimizing this sort of method? or can I reasonably expect the java compiler, or JIT, to recognize that I am only creating temporary sets for convenience of returning to the calling method and avoid the wasteful copying between sets?
edit: cleaned up bad copy paste job which lead to my sample method adding everything twice. You know something is bad when your 'optimized' code is slower then your regular code.
The question is, do I need to do option two if I'm worried about optimizing this sort of method?
Definitely yes. If performance is a concern (and most of the time it is not!), then you need it.
The compiler optimizes a lot but on a very different scale. Basically, it works with one method only and it optimizes the most commonly used path there in. Due to heavy inlining it can sort of optimize across method calls, but nothing like the above.
It can also optimize away needless allocations, but only in very simple cases. Maybe something like
int sum(int... a) {
int result = 0;
for (int x : a) result += x;
return result;
}
Calling sum(1, 2, 3) means allocating int[3] for the varargs arguments and this can be eliminated (if the compiler really does it is a different question). It can even find out that the result is a constant (which I doubt it really does). If the result doesn't get used, it can perform dead code elimination (this happens rather often).
Your example involves allocating a whole HashMap and all its entries, and is several orders of magnitude more complicated. The compiler has no idea how a HashMap works and it can't find out e.g., that after m.addAll(m1) the set m contains all member of m1. No way.
This is an algorithmical optimization rather than low-level. That's what humans are still needed for.
For things the compiler could do (but currently fails to), see e.g. these questions of mine concerning associativity and bounds checks.

Is it faster to create a new object reference if it will only be used twice?

I have a question about instruction optimization. If an object is to be used in two statements, is it faster to create a new object reference or should I instead call the object directly in both statements?
For the purposes of my question, the object is part of a Vector of objects (this example is from a streamlined version of Java without ArrayLists). Here is an example:
AutoEvent ptr = ((AutoEvent)e_autoSequence.elementAt(currentEventIndex));
if(ptr.exitConditionMet()) {currentEventIndex++; return;}
ptr.registerSingleEvent();
AutoEvent is the class in question, and e_autoSequence is the Vector of AutoEvent objects. The AutoEvent contains two methods in question: exitConditionMet() and registerSingleEvent().
This code could, therefore, alternately be written as:
if(((AutoEvent)e_autoSequence.elementAt(currentEventIndex)).exitConditionMet())
{currentEventIndex++; return;}
((AutoEvent)e_autoSequence.elementAt(currentEventIndex)).registerSingleEvent();
Is this faster than the above?
I understand the casting process is slow, so this question is actually twofold: additionally, in the event that I am not casting the object, which would be more highly optimized?
Bear in mind this is solely for two uses of the object in question.
The first solution is better all round:
Only one call to the vector elementAt method. This is actually the most expensive operation here, so only doing it once is a decent performance win. Also doing it twice potentially opens you up to some race conditions.
Only one cast operation. Casts are very cheap on moderns JVMs, but still have a slight cost.
It's more readable IMHO. You are getting an object then doing two things with it. If you get it twice, then the reader has to mentally figure out that you are getting the same object. Better to get it once, and assign it to a variable with a good name.
A single assignment of a local variable (like ptr in the first solution) is extremely cheap and often free - the Java JIT compiler is smart enough to produce highly optimised code here.
P.S. Vector is pretty outdated. Consider converting to an ArrayList<AutoEvent>. By using the generic ArrayList you won't need to explicitly cast, and it is much faster than a Vector (because it isn't synchronised and therefore has less locking overhead)
First solution will be faster.
The reason is that assignments work faster than method invocations.
In the second case you will have method elementAt() invoked twice, which will make it slower and JVM will probably not be able to optimize this code because it doesn't know what exactly is happening in the elementAt().
Also remember that Vector's methods are synchronized, which makes every method invocation even slower due to lock acquisition.
I don't know what do you mean by "create a new object reference" here. The following code ((AutoEvent)e_autoSequence.elementAt(currentEventIndex)) probably will be translated into bytecode that obtains sequence element, casts it to AutoEven and store the resulting reference on stack. Local variable ptr as other local variables is stored on stack too, so assigning reference to is is just copying 4 bytes from one stack slot to another, nearby stack slot. This is very-very fast operation. Modern JVMs do not do reference counting, so assigning references is probably as cheap as assigning int values.
Lets get some terminology straight first. Your code does not "create a new object reference". It is fetching an existing object reference (either once or twice) from a Vector.
To answer your question, it is (probably) a little bit faster to fetch once and put the reference into a temporary variable. But the difference is small, and unlikely to be significant unless you do it lots of times in a loop.
(The elementAt method on a Vector or ArrayList is O(1) and cheap. If the list was a linked list, which has an O(N) implementation for elementAt, then that call could be expensive, and the difference between making 1 or 2 calls could be significant ...)
Generally speaking, you should think about the complexity of your algorithms, but beyond that you shouldn't spend time optimizing ... until you have solid profiling evidence to tell you where to optimize.
I can't say whether ArrayList would be more appropriate. This could be a case where you need the thread-safety offered by Vector.

Categories

Resources