I come from a C background, so I admit that I'm still struggling with letting go of memory management when writing in Java. Here's one issue that's come up a few times that I would love to get some elaboration on. Here are two ways to write the same routine, the only difference being when double[] array is declared:
Code Sample 1:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Code Sample 2:
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Here, private double[] calculateSomethingAndReturnAnArray(int i) always returns an array of the same length. I have a strong aversion to Code Sample 2 because it creates a new array for each iteration when it could just overwrite the existing array. However, I think this might be one of those times when I should just sit back and let Java handle the situation for me.
What are the reasons to prefer one of the ways over the other or are they truly identical in Java?
There's nothing special about arrays here because you're not allocating for the array, you're just creating a new variable, it's equivalent to:
Object foo;
for(...){
foo = func(...);
}
In the case where you create the variable outside the loop it, the variable (which will hold the location of the thing it refers to) will only ever be allocated once, in the case where you create the variable inside the loop, the variable may be reallocated for in each iteration, but my guess is the compiler or the JIT will fix that in an optimization step.
I'd consider this a micro-optimization, if you're running into problems with this segment of your code, you should be making decisions based on measurements rather than on the specs alone, if you're not running into issues with this segment of code, you should do the semantically correct thing and declare the variable in the scope that makes sense.
See also this similar question about best practices.
A declaration of a local variable without an initializing expression will do NO work whatsoever. The work happens when the variable is initialized.
Thus, the following are identical with respects to semantics and performance:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
// ...
}
and
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
// ...
}
(You can't even quibble that the first case allows the array to be used after the loop ends. For that to be legal, array has to have a definite value after the loop, and it doesn't unless you add an initializer to the declaration; e.g. double[] array = null;)
To elaborate on #Mark Elliot 's point about micro-optimization:
This is really an attempt to optimize rather than a real optimization, because (as I noted) it should have no effect.
Even if the Java compiler actually emitted some non-trivial executable code for double[] array;, the chances are that the time to execute would be insignificant compared with the total execution time of the loop body, and of the application as a whole. Hence, this is most likely to be a pointless optimization.
Even if this is a worthwhile optimization, you have to consider that you have optimized for a specific target platform; i.e. a particular combination of hardware and JVM version. Micro-optimizations like this may not be optimal on other platforms, and could in theory be anti-optimizations.
In summary, you are most likely wasting your time if you focus on things like this when writing Java code. If performance is a concern for your application, focus on the MACRO level performance; e.g. things like algorithmic complexity, good database / query design, patterns of network interactions, and so on.
Both create a new array for each iteration. They have the same semantics.
Related
Let's say I have this function in java
public static Character firstNonrepeatedChar(String in) {
int[] repeated = new int[256];
for(int i=0; i<256; i++){
repeated[i] = 0;
}
// First time calling in.length()
for(int j=0; j<in.length(); j++){
repeated[in.charAt(j)]++;
}
// Second time calling in.length()
// I could have used "int length = in.length() and use this variable in this second loop"
for(int j=0; j<in.length(); j++){
if(repeated[in.charAt(j)] == 1)
return in.charAt(j);
}
return null;
}
As you can see I have used in.length() twice. Another approach could be saving the in.length() once in a variable and use the variable. can someone tell me how big of difference this makes? I know if I wanted to use that value like 100 times I should keep the value in a variable but in this case we are deciding between just one more function call or using an integer variable.
The JIT will inline simple methods like length() If you want improve performance you would have to look at different algorithms.
Something you can do is assume an array is already full of 0's so you don't need to zero it out. Note: you might have characters > 255.
Also I would return a char as you cannot have a null value.
generally, optimize when you can measure that the optimization 1) is beneficial and 2) is worth it. By worth it, I mean that a decrease in readability warrants the increased performance. Point 1 means that your change may be detrimental, or may just not do anything.
For example, adding a local variable will make your method's stack larger, which may be worse than the runtime gain in extreme environments. Also, String.length() simply returns a variable's value, so not calling the method does not save you much. Your JIT may (and probably will) optimize the loop condition anyway, meaning that your optimization was not actually beneficial.
It really depends on the function you are calling. Since you are calling String.length(), it is perfectly fine to call it more than once in the same context, even if the return value is not expected to change.
However, it is considered best practice to cache function return values in variables, especially for complex functions. In your case, there isn't much of a difference.
What is better in for loop
This:
for(int i = 0; i<someMethod(); i++)
{//some code
}
or:
int a = someMethod();
for(int i = 0; i<a; i++)
{//some code
}
Let's just say that someMethod() returns something large.
First method will execute someMethod() in each loop thus decreasing speed, second is faster but let's say that there are a lot of similar loops in application so declaring a variable vill consume more memory.
So what is better, or am I just thinking stupidly.
The second is better - assuming someMethod() does not have side effects.
It actually caches the value calculated by someMethod() - so you won't have to recalculate it (assuming it is a relatively expansive op).
If it does (has side effects) - the two code snaps are not equivalent - and you should do what is correct.
Regarding the "size for variable a" - it is not an issue anyway, the returned value of someMethod() needs to be stored on some intermediate temp variable anyway before calculation (and even if it wasn't the case, the size of one integer is negligible).
P.S.
In some cases, compiler / JIT optimizer might optimize the first code into the second, assuming of course no side effects.
If in doubt, test. Use a profiler. Measure.
Assuming the iteration order isn't relevant, and also assuming you really want to nano-optimize your code, you may do this :
for (int i=someMethod(); i-->0;) {
//some code
}
But an additional local variable (your a) isn't such a burden. In practice, this isn't much different from your second version.
If you don't need this variable after loop, there is simple way to hide it inside:
for (int count = someMethod (), i = 0; i < count; i++)
{
// some code
}
It really depends how long it takes to generate the output of someMethod(). Also the memory usage would be the same, because someMethod() first has to generate the output and stores this then. The second way safes your cpu from computing the same output every loop and it should not take more memory. So the second one is better.
I would not consider the memory consumption of the variable a as a problem as it is an int and requires 192 bit on a 64 bit machine. So I would prefer the second alternative as it execution efficiency is better.
The most important part about loop optimizations is allowing the JVM to unroll the loop. To do so in the 1st variant it has to be able to inline the call to someMethod(). Inlining has some budget and it can get busted at some point. If someMethod() is long enough the JVM may decide it doesn't like to inline.
The second variant is more helpful (to JIT compiler) and likely to work better.
my way for putting down the loop is:
for (int i=0, max=someMethod(); i<max; i++){...}
max doesn't pollute the code, you ensure no side effects from multiple calls of someMethod() and it's compact (single liner)
If you need to optimize this, then this is the clean / obvious way to do it:
int a = someMethod();
for (int i = 0; i < a; i++) {
//some code
}
The alternative version suggested by #dystroy
for (int i=someMethod(); i-->0;) {
//some code
}
... has three problems.
He is iterating in the opposite direction.
That iteration is non-idiomatic, and hence less readable. Especially if you ignore the Java style guide and don't put whitespace where you are supposed to.
There is no proof that the code will actually be faster than the more idiomatic version ... especially once the JIT compiler has optimized them both. (And even if the less readable version is faster, the difference is likely to be negligible.)
On the other hand, if someMethod() is expensive (as you postulate) then "hoisting" the call so that it is only done once is likely to be worthwhile.
I was a bit confused about the same and did a sanity test for the same with a list of 10,000,000 integers in it. Difference was more than two seconds with latter being faster:
int a = someMethod();
for(int i = 0; i<a; i++)
{//some code
}
My results on Java 8 (MacBook Pro, 2.2 GHz Intel Core i7) were:
using list object:
Start- 1565772380899,
End- 1565772381632
calling list in 'for' expression:
Start- 1565772381633,
End- 1565772384888
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
for loop optimization
In java i have a block of code:
List e = {element1, element2, ...., elementn};
for(int i = 0; i < e.size(); i++){//Do something in here
};
and another block:
List e = {element1, element2, ...., elementn};
int listSize = e.size();
for(int i = 0; i < listSize; i++){//Do something in here
};
I think that the second block is better, because in the first block, if i++, we have to calculate e.size() one more times to compare the condition in the for loop. Is it right or wrong?
And comparing the two block above, what is the best practice for writing for? And why?Explain clearly and try this loop yourself
Personally I'd use the enhanced for statement instead:
for (Object element : e) {
// Use element
}
Unless you need the index, of course.
If I had to use one of the two forms, I'd use the first as it's tidier (it doesn't introduce another local variable which is only used in that loop), until I had concrete evidence that it was causing a problem. (In most list implementations, e.size() is a simple variable access which can be inlined by the JIT anyway.)
Usually, the most brief and readable code is the best choice, all things being equal. In the case of Java, the enhanced for loop (which works with any class that implements Iterable) is the way to go.
for (Object object : someCollection) { // do something }
In terms solely of the two you posted, I think the first is the better option. It's more readable, and you have to remember that, under the hood, JIT will attempt to optimize a great deal of the code you write anyway.
EDIT: Have you heard the phrase "premature optimisation is the root of all evil"? Your second block is an example of premature optimisation.
If you check the size() implementation on a LinkedList class, you will find that the size is incemented or decremented when an element is added or removed from the list.
Calling size() just returns the value of this property and does not involve any calculation.
So directly calling size() method should be better as you will save on the save for another integer.
I would always use (if you need an index variable):
List e = {element1, element2, ...., elementn};
for(int i = 0, size = e.size(); i < size; i++){
// Do something in here
};
Since e.size() could be an expensive operation.
Your 2nd option is not good, since it introduces a new variable outside of the for loop. I recommend to keep variable visibility as limited as possible.
Otherwise a
for (MyClass myObj : list) {
// Do something here
}
is even cleaner, but might introduce a small performance hit (the index approach doesn't require to instantiate an Iterator).
Yes, the second form is marginally more efficient as you don't repeated perform the size() method invocation. Compilers are good are doing this sort of optimisation themselves.
However, it's unlikely that this would be the performance bottleneck of your application. Avoid premature optimisation. Make your code clean and readable foremost.
HotSpot will move e.size() from cycle in most cases. So it will calculate size of List only once.
As for me I prefer the following notation:
for (Object elem: e) {
//Do something
}
i think this should be much more better..
may be initializing the int variable every time can be escaped from this..
List e = {element1, element2, ...., elementn};
int listSize = e.size();
int i=0;
for(i = 0; i < listSize; i++){//Do something in here
};
Second one is better approach because in the first block, you are calling the e.size() is a method which is an operation in a loop that is a extra burden to JVM.
Im not so sure but i think the optimizer of java will replace the value with a static value, so in the end it will be the same.
To avoid all this numbering and iterators and checkings in writing the code use the following simple most readable code that has its performance to maximum.
Why this has maximum performance (details are coming up)
for (Object object : aCollection) {
// Do something here
}
If the index is needed then:
To choose between the above two forms:
The second is the better as you said because it only calculated the size once.
I think now we have a tendency to write short and understandable code, so the first option is better.
the second is better , cos in the firt loop in the body of it maybe u will do this statment
e.remove, and then the size of e will be changed , so it is better to save the size in a parameter before the looop
I have been refactoring throwaway code which I wrote some years ago in a FORTRAN-like style. Most of the code is now much more organized and readable. However the heart of the algorithm (which is performance-critical) uses 1- and 2-dimensional Java arrays and is typified by:
for (int j = 1; j < len[1]+1; j++) {
int jj = (cont == BY_TYPE) ? seq[1][j-1] : j-1;
for (int i = 1; i < len[0]+1; i++) {
matrix[i][j] = matrix[i-1][j] + gap;
double m = matrix[i][j-1] + gap;
if (m > matrix[i][j]) {
matrix[i][j] = m;
pointers[i][j] = UP;
}
//...
}
}
For clarity, maintainability and interfacing with the rest of the code I would like to refactor it. However on reading Java Generics Syntax for arrays and Java Generics and numbers I have the following concerns:
Performance. The code is planned to use about 10^8 - 10^9 secs/yr and this is just about manageable. My reading suggests that changing double to Double can sometimes add a factor of 3 in performance. I'd like other experience on this. I would also expect that moving from foo[] to List would be a hit as well. I have no first-hand knowledge and again experience would be useful.
Array-bound checking. Is this treated differently in double[] and List and does it matter? I expect some problems to violate bounds as the algorithm is fairly simple and has only been applied to a few data sets.
If I don't refactor then the code has an ugly and possibly fragile intermixture of the two approaches. I am already trying to write things such as:
List<double[]> and
List<Double>[]
and understand that the erasure does not make this pretty and at best gives rise to compiler warnings. It seems difficult to do this without very convoluted constructs.
Obsolescence. One poster suggested that Java arrays should be obsoleted. I assume this isn't going to happen RSN but I would like to move away from outdated approaches.
SUMMARY The consensus so far:
Collections have a significant performance hit over primitive arrays, especially for constructs such as matrices. This is incurred in auto(un)boxing numerics and in accessing list items
For tight numerical (scientific) algorithms the array notation [][] is actually easier to read but the variables should named as helpfully as possible
Generics and arrays do not mix well. It may be useful to wrap the arrays in classes to transport them in/out of the tight algorithm.
There is little objective reason to make the change
QUESTION #SeanOwen has suggested that it would be useful to take constant values out of the loops. Assuming I haven't goofed this would look like:
int len1 = len[1];
int len0 = len[0];
int seq1 = seq[1];
int[] pointersi;
double[] matrixi;
for (int i = 1; i < len0+1; i++) {
matrixi = matrix[i];
pointersi = pointers[i];
}
for (int j = 1; j < len1+1; j++) {
int jj = (cont == BY_TYPE) ? seq1[j-1] : j-1;
for (int i = 1; i < len0+1; i++) {
matrixi[j] = matrixi[j] + gap;
double m = matrixi[j-1] + gap;
if (m > matrixi[j]) {
matrixi[j] = m;
pointersi[j] = UP;
}
//...
}
}
I thought compilers were meant to be smart at doing this sort of thing. Do we need to still do this?
I read an excellent book by Kent Beck on coding best-practices ( http://www.amazon.com/Implementation-Patterns/dp/B000XPRRVM ). There are also interesting performance figures.
Specifically, there are comparison between arrays and various collections., and arrays are really much faster (maybe x3 compared to ArrayList).
Also, if you use Double instead of double, you need to stick to it, and use no double, as auto(un)boxing will kill your performance.
Considering your performance need, I would stick to array of primitive type.
Even more, I would calculate only once the upper bound for the condition in loops.
This is typically done the line before the loop.
However, if you don't like that the upper bound variable, used only in the loop, is accessible outside the loop, you can take advantage of the initialization phase of the for loop like this:
for (int i=0, max=list.size(); i<max; i++) {
// do something
}
I don't believe in obsolescence for arrays in java. For performance-critical loop, I can't see any language designer taking away the fastest option (especially if the difference is x3).
I understand your concern for maintainability, and for coherence with the rest of the application. But I believe that a critical loop is entitled to some special practices.
I would try to make the code the clearest possible without changing it:
by carefully questionning each variable name, ideally with a 10-min brainstorming session with my collegues
by writing coding comments (I'm against their use in general, as a code that is not clear should be made clear, not commented ; but a critical loop justifies it).
by using private methods as needed (as Andreas_D pointed out in his answer). If made private final, chances are very good (as they would be short) that they will get inlined when running, so there would be no performance impact at runtime.
I fully agree with KLE's answer. Because the code is performance-critical, I'd keep the array based datastructures as well. And I believe, that just introducing collections, wrappers for primitive types and generics will not improve maintainability and clarity.
In addition, if this algorithm is the heart of the application and has been in use for several years now, chance are fairly low, that it will need maintenance like bug fixing or improvements.
For clarity, maintainability and
interfacing with the rest of the code
I would like to refactor it.
Instead of changing datastructures I'd concentrate on renaming and maybe moving some part of the code to private methods. From looking at the code, I have no idea what's happening, and the problem, as I see it, are the more or less short and technical variable and field names.
Just an example: one 2-dimensional array is just named 'matrix'. But it's obviously clear, that this is a matrix, so naming it 'matrix' is pretty redundant. It would be more helpful to rename it so that it becomes clear, what this matrix is really used for, what kind of data is inside.
Another candidate is your second line. With two refactorings, I'd rename 'jj' to something more meaningful and move the expression to a private method with a 'speaking' name.
The general guideline is to prefer generified collections over arrays in Java, but it's only a guideline. My first thought would be to NOT change this working code. If you really want to make this change, then benchmark both approaches.
As you say, performance is critical, in which case the code that meets the needed performance is better than code that doesn't.
You might also run into auto-boxing issues when boxing/unboxing the doubles - a potentially more subtle problem.
The Java language guys have been very strict about keeping the JVM compatible across different versions so I don't see arrays going anywhere - and I wouldn't call them obsolete, just more primitive than the other options.
Well I think that arrays are the best way to store process data in algorithms. Since Java doesn't support operator overloading (one of the reasons why I think arrays won't be obsolete that soon) switching to collections would make the code quite hard to read:
double[][] matrix = new double[10][10];
double t = matrix[0][0];
List<List<Double>> matrix = new ArrayList<List<Double>>(10);
Collections.fill(matrix, new ArrayList<Double>(10));
double t = matrix.get(0).get(0); // autoboxing => performance
As far as I know Java prestores some wrapper Object for Number instances (e.g. the first 100 integers), so that you can access them faster but I think that won't help much with that many data.
I thought compilers were meant to be
smart at doing this sort of thing. Do
we need to still do this?
You are probably right that the JIT takes care of it, but if this section is so performance critical, trying and benchmarking wouldn't hurt.
When you know the exact dimensions of the list you should stick with arrays. Arrays are not inherently bad, and they're not going anywhere. If you are performing a lot of (non-sequential) read and write operations you should use arrays and not lists, because the access methods of lists introduce a large overhead.
In addition to sticking with arrays, I think you can tighten up this code in some meaningful ways. For instance:
Indeed, don't compute the loop bounds every time, save them off
You repeatedly reference matrix[i]. Just save off a reference to this subarray rather than dereferencing the 2D array every time
That trick gets even more useful if you can loop over i in the outer loop instead of inner loop
It's getting extreme, but saving the value of j-1 in a local might even prove to be worth it rather than recomputing
Finally if you are really really concerned about performance, run the ProGuard optimizer over the resulting byte code to have it perform some compiler optimizations like unrolling or peephole optimizations
String s = "";
for(i=0;i<....){
s = some Assignment;
}
or
for(i=0;i<..){
String s = some Assignment;
}
I don't need to use 's' outside the loop ever again.
The first option is perhaps better since a new String is not initialized each time. The second however would result in the scope of the variable being limited to the loop itself.
EDIT: In response to Milhous's answer. It'd be pointless to assign the String to a constant within a loop wouldn't it? No, here 'some Assignment' means a changing value got from the list being iterated through.
Also, the question isn't because I'm worried about memory management. Just want to know which is better.
Limited Scope is Best
Use your second option:
for ( ... ) {
String s = ...;
}
Scope Doesn't Affect Performance
If you disassemble code the compiled from each (with the JDK's javap tool), you will see that the loop compiles to the exact same JVM instructions in both cases. Note also that Brian R. Bondy's "Option #3" is identical to Option #1. Nothing extra is added or removed from the stack when using the tighter scope, and same data are used on the stack in both cases.
Avoid Premature Initialization
The only difference between the two cases is that, in the first example, the variable s is unnecessarily initialized. This is a separate issue from the location of the variable declaration. This adds two wasted instructions (to load a string constant and store it in a stack frame slot). A good static analysis tool will warn you that you are never reading the value you assign to s, and a good JIT compiler will probably elide it at runtime.
You could fix this simply by using an empty declaration (i.e., String s;), but this is considered bad practice and has another side-effect discussed below.
Often a bogus value like null is assigned to a variable simply to hush a compiler error that a variable is read without being initialized. This error can be taken as a hint that the variable scope is too large, and that it is being declared before it is needed to receive a valid value. Empty declarations force you to consider every code path; don't ignore this valuable warning by assigning a bogus value.
Conserve Stack Slots
As mentioned, while the JVM instructions are the same in both cases, there is a subtle side-effect that makes it best, at a JVM level, to use the most limited scope possible. This is visible in the "local variable table" for the method. Consider what happens if you have multiple loops, with the variables declared in unnecessarily large scope:
void x(String[] strings, Integer[] integers) {
String s;
for (int i = 0; i < strings.length; ++i) {
s = strings[0];
...
}
Integer n;
for (int i = 0; i < integers.length; ++i) {
n = integers[i];
...
}
}
The variables s and n could be declared inside their respective loops, but since they are not, the compiler uses two "slots" in the stack frame. If they were declared inside the loop, the compiler can reuse the same slot, making the stack frame smaller.
What Really Matters
However, most of these issues are immaterial. A good JIT compiler will see that it is not possible to read the initial value you are wastefully assigning, and optimize the assignment away. Saving a slot here or there isn't going to make or break your application.
The important thing is to make your code readable and easy to maintain, and in that respect, using a limited scope is clearly better. The smaller scope a variable has, the easier it is to comprehend how it is used and what impact any changes to the code will have.
In theory, it's a waste of resources to declare the string inside the loop.
In practice, however, both of the snippets you presented will compile down to the same code (declaration outside the loop).
So, if your compiler does any amount of optimization, there's no difference.
In general I would choose the second one, because the scope of the 's' variable is limited to the loop. Benefits:
This is better for the programmer because you don't have to worry about 's' being used again somewhere later in the function
This is better for the compiler because the scope of the variable is smaller, and so it can potentially do more analysis and optimisation
This is better for future readers because they won't wonder why the 's' variable is declared outside the loop if it's never used later
If you want to speed up for loops, I prefer declaring a max variable next to the counter so that no repeated lookups for the condidtion are needed:
instead of
for (int i = 0; i < array.length; i++) {
Object next = array[i];
}
I prefer
for (int i = 0, max = array.lenth; i < max; i++) {
Object next = array[i];
}
Any other things that should be considered have already been mentioned, so just my two cents (see ericksons post)
Greetz, GHad
To add on a bit to #Esteban Araya's answer, they will both require the creation of a new string each time through the loop (as the return value of the some Assignment expression). Those strings need to be garbage collected either way.
I know this is an old question, but I thought I'd add a bit that is slightly related.
I've noticed while browsing the Java source code that some methods, like String.contentEquals (duplicated below) makes redundant local variables that are merely copies of class variables. I believe that there was a comment somewhere, that implied that accessing local variables is faster than accessing class variables.
In this case "v1" and "v2" are seemingly unnecessary and could be eliminated to simplify the code, but were added to improve performance.
public boolean contentEquals(StringBuffer sb) {
synchronized(sb) {
if (count != sb.length())
return false;
char v1[] = value;
char v2[] = sb.getValue();
int i = offset;
int j = 0;
int n = count;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
}
return true;
}
It seems to me that we need more specification of the problem.
The
s = some Assignment;
is not specified as to what kind of assignment this is. If the assignment is
s = "" + i + "";
then a new sting needs to be allocated.
but if it is
s = some Constant;
s will merely point to the constants memory location, and thus the first version would be more memory efficient.
Seems i little silly to worry about to much optimization of a for loop for an interpreted lang IMHO.
When I'm using multiple threads (50+) then i found this to be a very effective way of handling ghost thread issues with not being able to close a process correctly ....if I'm wrong, please let me know why I'm wrong:
Process one;
BufferedInputStream two;
try{
one = Runtime.getRuntime().exec(command);
two = new BufferedInputStream(one.getInputStream());
}
}catch(e){
e.printstacktrace
}
finally{
//null to ensure they are erased
one = null;
two = null;
//nudge the gc
System.gc();
}