Overuse of Method-chaining in Java - java

I see a lot of this kind of code written by Java developers and Java instructors:
for ( int x = 0 ; x < myArray.length ; x++ )
accum += (mean() - myArray[x]) * (mean() - myArray[x] );
I am very critical of this because mean() is being invoked twice for every element in the array, when it only has to be invoked once:
double theMean = mean();
for ( int x = 0 ; x < myArray.length ; x++ )
accum += (theMean - myArray[x]) * (theMean - myArray[x]);
Is there something about optimization in Java that makes the first example acceptable? Should I stop riding developers about this?
*** More information. An array of samples is stored as an instance variable. mean() has to traverse the array and calculate the mean every time it is invoked.

You are right. Your way (second code sample) is more efficient. I don't think Java can optimize the first code sample to call mean() just once and re-use its return value, since mean() might have side effects, so the compiler can't decide to call it once if your code calls it twice.

Leave your developers alone, it's fine -- it's readable and it works, without introducing unnecessary names and variables.
Optimization should only ever be done under the guidance of a performance monitoring tool which can show you where you're actually slow. And, typically, performance is enhanced more effectively by considering the large scale architecture of an application, not line by line bytecode optimization, which is expensive and usually unhelpful.

Your version will likely run faster, though an optimizing compiler may be able to detect if the mean() method returns the same value every time (e.g. if the value is hard-coded or stored in a field) and eliminate the method call.
If you are recommending this change for efficiency reasons, you may be falling foul of premature optimization. You don't really know where the bottlenecks are in your system until you measure in the appropriate environment under appropriate loads. Even then, improved hardware is often more cost-effective solution than developer time.
If you are recommending it because it will eliminate duplication then I think you might be on stronger ground. If the mean() method took arguments too, it would be especially reasonable to pull that out of the loop and call the method once and only once.

Yes, some compilers will optimize this to just what you say.
Yes, you should stop riding developers about this.
I think your preferred way is better, but not mostly because of the optimization. It is more clear that the value is the same in both places if it does not involve a method call, particularly in cases where the method call is more complex than the one you have here.
For that matter, I think it's better to write
double theMean = mean();
for (int x=0; x < myArray.length; x++)
{ double curValue = myArray[x];
double toSquare = theMean - curValue;
accum += toSquare * toSquare;
}
Because it makes it easier to determine that you are squaring whatever is being accumulated, and just what it is that's being sqaured.

Normally the compiler will not optimize the method call since it cannot know whether the return value would be the same (this is especially true when mean processes an array as it has no way of checking whether the result can be cached). So yes the mean() method would be invoked twice.
In this case, if you know for sure that the array is kept the same regardless of the values of x and accum in the loop (more generally, regardless of any change in the program values), then the second code is more optimal.

Related

Performance loss of continued call to array.length or list.size()

I have seen people say to cache the values of size for a list or length for an array when iterating, to save the time of checking the length/size over and over again.
So
for (int i = 0; i < someArr.length; i++) // do stuff
for (int i = 0; i < someList.size(); i++) // do stuff
Would be turned into
for (int i = 0, length = someArr.length; i < length; i++) // do stuff
for (int i = 0, size = someList.size(); i < size; i++) // do stuff
But since Array#length isn't a method, just a field, shouldn't it not have any difference? And if using an ArrayList, size() is just a getter so shouldn't that also be the same either way?
It is possible the JIT compiler will do some of those optimizations for itself. Hence, doing the optimizations by hand may be a complete waste of time.
It is also possible (indeed likely) that the performance benefit you are going to get from hand optimizing those loops is too small to be worth the effort. Think of it this way:
Most of the statements in a typical program are only executed rarely
Most loops will execute in a few microseconds or less.
Hand optimizing a program takes in the order of minutes or hours of developer time.
If you spend minutes to get a execution speedup that is measured in microseconds, you are probably wasting your time. Even thinking about it too long is wasting time.
The corollary is that:
You should benchmark your code to decide whether you need to optimize it.
You should profile your code to figure out which parts of your code is worth spending optimization effort on.
You should set (realistic) performance goals, and stop optimization when you reach those goals.
Having said all of that:
theArr.length is very fast, probably just a couple of machine instructions
theList.size() will probably also be very fast, though it depends on what List class you are using.
For an ArrayList the size() call is probably a method call + a field fetch versus a field fetch for length.
For an ArrayList the size() call is likely to be inlined by the JIT compiler ... assuming that the JIT compiler can figure that out.
The JIT compiler should be able to hoist the length fetch out of the loop. It can probably deduce that it doesn't change in the loop.
The JIT compiler might be able to hoist the size() call, but it will be harder for it to deduce that the size doesn't change.
What this means is that if you do hand optimize those two examples, you will most likely get negligible performance benefit.
In general the loss is negligible. Even a LinkedList.size() will use a stored count, and not iterate over all nodes.
For large sizes you may assume the conversion to machine code may catch up, and optimize it oneself.
If inside the loop the size is changed (delete/insert) the size variable must be changed too, which gives us even less solid code.
The best would be to use a for-each
for (Bar bar: bars) { ... }
You might also use the somewhat more costing Stream:
barList.forEach(bar -> ...);
Stream.of(barArray).forEach(bar -> ...);
Streams can be executed in parallel.
barList.parallelStream().forEach(bar -> ...);
And last but not least you may use standard java code for simple loops:
Arrays.setAll(barArray, i -> ...);
We are talking here about micro-optimisations. I would go for elegance.
Most often the problem is the used algorithm & datastructurs. List is notorious, as everything can be a List. However Set or Map often provide much higher power/expressiveness.
If a complex piece of software is slow, profile the application. Check the break lines: java collections versus database queries, file parsing.

Which of these is more efficient in java?

I am wondering which of the following is the most efficient?
int x = 1, y = 2;
System.out.print(x+y)
or...
int x = 1, y = 2, z = 3;
System.out.print(z);
I'm guessing it's the first, but not sure - thanks.
The real answer is: talking about efficiency on such a level does not make any sense at all.
Keep in mind that the overall performance and efficiency of a Java program is determined by many many factors - for example when/how the JIT kicks in in order to turn byte code into machine code.
Worrying about such subtleties will not help you to create a meaningful, maintainable, "good OO" design. Heck; in your case, depending on context, it could even be that the compiler does constant folding and turns your whole thing into println(3) (as it is really straight forward to throw away those variables); so maybe in both cases, the compiler creates the exact same bytecode.
Dont get me wrong: it is fair to ask/learn/understand what compilers, JVMs and JITs do. But: dont assume that you can categorize things that easily into "A more efficient than B".
If you truly mean the case where you have supplied all the literal values like that, then the difference doesn't exist at all, at least not after your code is JIT-compiled. In either case you will have zero calculation done at runtime. The JIT compiler will work out the result and hardcode it into all its use sites. The optimization techniques involved are Constant Propagation and Constant Folding.
It would be second option as you do not need any memory for calculation. You're just print a number instead of adding them together and than printing.
This is simple example, so performance is not noticeable at this level..
Good practice is to assign the task appropriately to different functions.

Calling a method vs assigning the return type

I would like to know which one is good. I am writing a for loop. In the condition part I am using str.length(). I wonder is this a good idea. I can also assign the value to an integer variable and use it in the loop.
Which one is the suitable/better way?
If you use str.length() more than once or twice in the code, it's logical to extract it to a local var simply for brevity's sake. As for performance, it will most probably be exactly the same because the JIT compiler will inline that call, so the native code will be as if you have used a local variable.
There is no distinct downside to calling a function in the loop condition expression in the sense that "you really should never do it". You want to watch out when calling functions that have side effects, but even that can be acceptable in some circumstances.
There are three major reasons for moving function calls out of the loop (including the loop condition expressions):
Performance. The function may (depending on the JIT compiler) get called for every iteration of the loop, which costs you execution time. Particularly if the function's code has a higher order of complexity than O(1) after the first execution, this will increase the execution time. By how much depends entirely on exactly what the function in question does and how it is implemented.
Side effects. If the function has any side effects, those may (will) be executed repeatedly. This might be exactly what you want, but you need to be aware of it. A side effect is basically something that is observable outside of the function that is being called; for example, disk or network I/O are often considered to be side effects. A function that simply performs calculations on already available data is generally a pure function.
Code clarity. Admittedly str.length() isn't very long, but if you have a complex calculation based around a function call in the loop conditional, code clarity can very easily suffer. For this reason it may be advantageous to move the loop termination condition calculation out of the loop condition expression itself. Beware of awakening the sleeping beast, however; make very sure that the refactored code actually is more readable.
For str.length() it doesn't really matter unless you are really after the last bit of performance you can get, particularly as as has been pointed out by other answerers, String#length() is an O(1) complexity operation. Especially in the general case, if you need the additional performance, consider introducing a variable to hold the result of the function call and comparing against that rather than making the function call repeatedly.
Personally, I'd consider code clarity before worrying about micro-optimizations like exactly where to place a specific function call. But if you have everything else down and still need to ooze a little bit more performance out of the code, moving the function call out of the condition expression and using a local variable (preferably of a primitive type) is something worth considering. Chances are, though, that if you are worried about that, you'll see bigger gains by considering a different algorithm. (Do you really need to iterate over the string the way you are doing? Is there no other way to do what you are after?)
It usually doesn't matter. Use whichever makes your code clearer.
If a value is going to be used more than once, then there are two advantages to assigning it to a local variable:
You can give the variable a good name, which makes your code easier to read an understand
You can sometimes avoid a small amount of overhead by calling the method only once. This helps performance (although the difference is often too small to be noticeable - if in doubt you should benchmark)
Note: This advice only applies to pure functions. You need to be much more careful if the function has side effects, or might return a different value each time (like Math.random()) - in these cases you need to think much more carefully about the effect of multiple function calls.
Calling length costs O(1) since the length is stored as a member - It's a constant operation, don't waste your time thinking about complexity and performance of this thing.
there are no difference at all between the two
But suppose if the str.length changes then in the for loop you need to manualy change the value
for example
String str="hi";
so in the for loop you write this way
for int i=0;i<str.length();i++)
{
}
or
for int i=0;i<2;i++)
{
}
Now suppose you want to change the str String str="hi1";
so in the for loop
for int i=0;i<3;i++)
{
}
So I would suggest you to go for str.length()
If you use str.length always this will evaluated. It is better to assign this value to variable and use that in for loop.
for(int i=0; i<str.length;i++){ // str.length always evaluvated
}
int k=str.length; // only one time evaluvated
for(int i=0;i<k;i++){
}
If you are concern about performance you may use second approach.
If you are using str.length() in the code more than one time then you need to assign it to another variable and use it. Otherwise you can use str.length() itself.
Reason for need
When we call a method, each time the current position is stored in a DS (heap/stack) and go to the corresponding called method and make their operations
And come back and from the DS retrieve the current position and do the normal operations.
That is actually happening. So when we do it so many times in a program it will cause the above mentioned scenario for several times.
Therefore we need to create a local variable and assign into it and use where ever need in the program.

Java for loop performance

What is better in for loop
This:
for(int i = 0; i<someMethod(); i++)
{//some code
}
or:
int a = someMethod();
for(int i = 0; i<a; i++)
{//some code
}
Let's just say that someMethod() returns something large.
First method will execute someMethod() in each loop thus decreasing speed, second is faster but let's say that there are a lot of similar loops in application so declaring a variable vill consume more memory.
So what is better, or am I just thinking stupidly.
The second is better - assuming someMethod() does not have side effects.
It actually caches the value calculated by someMethod() - so you won't have to recalculate it (assuming it is a relatively expansive op).
If it does (has side effects) - the two code snaps are not equivalent - and you should do what is correct.
Regarding the "size for variable a" - it is not an issue anyway, the returned value of someMethod() needs to be stored on some intermediate temp variable anyway before calculation (and even if it wasn't the case, the size of one integer is negligible).
P.S.
In some cases, compiler / JIT optimizer might optimize the first code into the second, assuming of course no side effects.
If in doubt, test. Use a profiler. Measure.
Assuming the iteration order isn't relevant, and also assuming you really want to nano-optimize your code, you may do this :
for (int i=someMethod(); i-->0;) {
//some code
}
But an additional local variable (your a) isn't such a burden. In practice, this isn't much different from your second version.
If you don't need this variable after loop, there is simple way to hide it inside:
for (int count = someMethod (), i = 0; i < count; i++)
{
// some code
}
It really depends how long it takes to generate the output of someMethod(). Also the memory usage would be the same, because someMethod() first has to generate the output and stores this then. The second way safes your cpu from computing the same output every loop and it should not take more memory. So the second one is better.
I would not consider the memory consumption of the variable a as a problem as it is an int and requires 192 bit on a 64 bit machine. So I would prefer the second alternative as it execution efficiency is better.
The most important part about loop optimizations is allowing the JVM to unroll the loop. To do so in the 1st variant it has to be able to inline the call to someMethod(). Inlining has some budget and it can get busted at some point. If someMethod() is long enough the JVM may decide it doesn't like to inline.
The second variant is more helpful (to JIT compiler) and likely to work better.
my way for putting down the loop is:
for (int i=0, max=someMethod(); i<max; i++){...}
max doesn't pollute the code, you ensure no side effects from multiple calls of someMethod() and it's compact (single liner)
If you need to optimize this, then this is the clean / obvious way to do it:
int a = someMethod();
for (int i = 0; i < a; i++) {
//some code
}
The alternative version suggested by #dystroy
for (int i=someMethod(); i-->0;) {
//some code
}
... has three problems.
He is iterating in the opposite direction.
That iteration is non-idiomatic, and hence less readable. Especially if you ignore the Java style guide and don't put whitespace where you are supposed to.
There is no proof that the code will actually be faster than the more idiomatic version ... especially once the JIT compiler has optimized them both. (And even if the less readable version is faster, the difference is likely to be negligible.)
On the other hand, if someMethod() is expensive (as you postulate) then "hoisting" the call so that it is only done once is likely to be worthwhile.
I was a bit confused about the same and did a sanity test for the same with a list of 10,000,000 integers in it. Difference was more than two seconds with latter being faster:
int a = someMethod();
for(int i = 0; i<a; i++)
{//some code
}
My results on Java 8 (MacBook Pro, 2.2 GHz Intel Core i7) were:
using list object:
Start- 1565772380899,
End- 1565772381632
calling list in 'for' expression:
Start- 1565772381633,
End- 1565772384888

refactoring Java arrays and primitives (double[][]) to Collections and Generics (List<List<Double>>)

I have been refactoring throwaway code which I wrote some years ago in a FORTRAN-like style. Most of the code is now much more organized and readable. However the heart of the algorithm (which is performance-critical) uses 1- and 2-dimensional Java arrays and is typified by:
for (int j = 1; j < len[1]+1; j++) {
int jj = (cont == BY_TYPE) ? seq[1][j-1] : j-1;
for (int i = 1; i < len[0]+1; i++) {
matrix[i][j] = matrix[i-1][j] + gap;
double m = matrix[i][j-1] + gap;
if (m > matrix[i][j]) {
matrix[i][j] = m;
pointers[i][j] = UP;
}
//...
}
}
For clarity, maintainability and interfacing with the rest of the code I would like to refactor it. However on reading Java Generics Syntax for arrays and Java Generics and numbers I have the following concerns:
Performance. The code is planned to use about 10^8 - 10^9 secs/yr and this is just about manageable. My reading suggests that changing double to Double can sometimes add a factor of 3 in performance. I'd like other experience on this. I would also expect that moving from foo[] to List would be a hit as well. I have no first-hand knowledge and again experience would be useful.
Array-bound checking. Is this treated differently in double[] and List and does it matter? I expect some problems to violate bounds as the algorithm is fairly simple and has only been applied to a few data sets.
If I don't refactor then the code has an ugly and possibly fragile intermixture of the two approaches. I am already trying to write things such as:
List<double[]> and
List<Double>[]
and understand that the erasure does not make this pretty and at best gives rise to compiler warnings. It seems difficult to do this without very convoluted constructs.
Obsolescence. One poster suggested that Java arrays should be obsoleted. I assume this isn't going to happen RSN but I would like to move away from outdated approaches.
SUMMARY The consensus so far:
Collections have a significant performance hit over primitive arrays, especially for constructs such as matrices. This is incurred in auto(un)boxing numerics and in accessing list items
For tight numerical (scientific) algorithms the array notation [][] is actually easier to read but the variables should named as helpfully as possible
Generics and arrays do not mix well. It may be useful to wrap the arrays in classes to transport them in/out of the tight algorithm.
There is little objective reason to make the change
QUESTION #SeanOwen has suggested that it would be useful to take constant values out of the loops. Assuming I haven't goofed this would look like:
int len1 = len[1];
int len0 = len[0];
int seq1 = seq[1];
int[] pointersi;
double[] matrixi;
for (int i = 1; i < len0+1; i++) {
matrixi = matrix[i];
pointersi = pointers[i];
}
for (int j = 1; j < len1+1; j++) {
int jj = (cont == BY_TYPE) ? seq1[j-1] : j-1;
for (int i = 1; i < len0+1; i++) {
matrixi[j] = matrixi[j] + gap;
double m = matrixi[j-1] + gap;
if (m > matrixi[j]) {
matrixi[j] = m;
pointersi[j] = UP;
}
//...
}
}
I thought compilers were meant to be smart at doing this sort of thing. Do we need to still do this?
I read an excellent book by Kent Beck on coding best-practices ( http://www.amazon.com/Implementation-Patterns/dp/B000XPRRVM ). There are also interesting performance figures.
Specifically, there are comparison between arrays and various collections., and arrays are really much faster (maybe x3 compared to ArrayList).
Also, if you use Double instead of double, you need to stick to it, and use no double, as auto(un)boxing will kill your performance.
Considering your performance need, I would stick to array of primitive type.
Even more, I would calculate only once the upper bound for the condition in loops.
This is typically done the line before the loop.
However, if you don't like that the upper bound variable, used only in the loop, is accessible outside the loop, you can take advantage of the initialization phase of the for loop like this:
for (int i=0, max=list.size(); i<max; i++) {
// do something
}
I don't believe in obsolescence for arrays in java. For performance-critical loop, I can't see any language designer taking away the fastest option (especially if the difference is x3).
I understand your concern for maintainability, and for coherence with the rest of the application. But I believe that a critical loop is entitled to some special practices.
I would try to make the code the clearest possible without changing it:
by carefully questionning each variable name, ideally with a 10-min brainstorming session with my collegues
by writing coding comments (I'm against their use in general, as a code that is not clear should be made clear, not commented ; but a critical loop justifies it).
by using private methods as needed (as Andreas_D pointed out in his answer). If made private final, chances are very good (as they would be short) that they will get inlined when running, so there would be no performance impact at runtime.
I fully agree with KLE's answer. Because the code is performance-critical, I'd keep the array based datastructures as well. And I believe, that just introducing collections, wrappers for primitive types and generics will not improve maintainability and clarity.
In addition, if this algorithm is the heart of the application and has been in use for several years now, chance are fairly low, that it will need maintenance like bug fixing or improvements.
For clarity, maintainability and
interfacing with the rest of the code
I would like to refactor it.
Instead of changing datastructures I'd concentrate on renaming and maybe moving some part of the code to private methods. From looking at the code, I have no idea what's happening, and the problem, as I see it, are the more or less short and technical variable and field names.
Just an example: one 2-dimensional array is just named 'matrix'. But it's obviously clear, that this is a matrix, so naming it 'matrix' is pretty redundant. It would be more helpful to rename it so that it becomes clear, what this matrix is really used for, what kind of data is inside.
Another candidate is your second line. With two refactorings, I'd rename 'jj' to something more meaningful and move the expression to a private method with a 'speaking' name.
The general guideline is to prefer generified collections over arrays in Java, but it's only a guideline. My first thought would be to NOT change this working code. If you really want to make this change, then benchmark both approaches.
As you say, performance is critical, in which case the code that meets the needed performance is better than code that doesn't.
You might also run into auto-boxing issues when boxing/unboxing the doubles - a potentially more subtle problem.
The Java language guys have been very strict about keeping the JVM compatible across different versions so I don't see arrays going anywhere - and I wouldn't call them obsolete, just more primitive than the other options.
Well I think that arrays are the best way to store process data in algorithms. Since Java doesn't support operator overloading (one of the reasons why I think arrays won't be obsolete that soon) switching to collections would make the code quite hard to read:
double[][] matrix = new double[10][10];
double t = matrix[0][0];
List<List<Double>> matrix = new ArrayList<List<Double>>(10);
Collections.fill(matrix, new ArrayList<Double>(10));
double t = matrix.get(0).get(0); // autoboxing => performance
As far as I know Java prestores some wrapper Object for Number instances (e.g. the first 100 integers), so that you can access them faster but I think that won't help much with that many data.
I thought compilers were meant to be
smart at doing this sort of thing. Do
we need to still do this?
You are probably right that the JIT takes care of it, but if this section is so performance critical, trying and benchmarking wouldn't hurt.
When you know the exact dimensions of the list you should stick with arrays. Arrays are not inherently bad, and they're not going anywhere. If you are performing a lot of (non-sequential) read and write operations you should use arrays and not lists, because the access methods of lists introduce a large overhead.
In addition to sticking with arrays, I think you can tighten up this code in some meaningful ways. For instance:
Indeed, don't compute the loop bounds every time, save them off
You repeatedly reference matrix[i]. Just save off a reference to this subarray rather than dereferencing the 2D array every time
That trick gets even more useful if you can loop over i in the outer loop instead of inner loop
It's getting extreme, but saving the value of j-1 in a local might even prove to be worth it rather than recomputing
Finally if you are really really concerned about performance, run the ProGuard optimizer over the resulting byte code to have it perform some compiler optimizations like unrolling or peephole optimizations

Categories

Resources