String s = "";
for(i=0;i<....){
s = some Assignment;
}
or
for(i=0;i<..){
String s = some Assignment;
}
I don't need to use 's' outside the loop ever again.
The first option is perhaps better since a new String is not initialized each time. The second however would result in the scope of the variable being limited to the loop itself.
EDIT: In response to Milhous's answer. It'd be pointless to assign the String to a constant within a loop wouldn't it? No, here 'some Assignment' means a changing value got from the list being iterated through.
Also, the question isn't because I'm worried about memory management. Just want to know which is better.
Limited Scope is Best
Use your second option:
for ( ... ) {
String s = ...;
}
Scope Doesn't Affect Performance
If you disassemble code the compiled from each (with the JDK's javap tool), you will see that the loop compiles to the exact same JVM instructions in both cases. Note also that Brian R. Bondy's "Option #3" is identical to Option #1. Nothing extra is added or removed from the stack when using the tighter scope, and same data are used on the stack in both cases.
Avoid Premature Initialization
The only difference between the two cases is that, in the first example, the variable s is unnecessarily initialized. This is a separate issue from the location of the variable declaration. This adds two wasted instructions (to load a string constant and store it in a stack frame slot). A good static analysis tool will warn you that you are never reading the value you assign to s, and a good JIT compiler will probably elide it at runtime.
You could fix this simply by using an empty declaration (i.e., String s;), but this is considered bad practice and has another side-effect discussed below.
Often a bogus value like null is assigned to a variable simply to hush a compiler error that a variable is read without being initialized. This error can be taken as a hint that the variable scope is too large, and that it is being declared before it is needed to receive a valid value. Empty declarations force you to consider every code path; don't ignore this valuable warning by assigning a bogus value.
Conserve Stack Slots
As mentioned, while the JVM instructions are the same in both cases, there is a subtle side-effect that makes it best, at a JVM level, to use the most limited scope possible. This is visible in the "local variable table" for the method. Consider what happens if you have multiple loops, with the variables declared in unnecessarily large scope:
void x(String[] strings, Integer[] integers) {
String s;
for (int i = 0; i < strings.length; ++i) {
s = strings[0];
...
}
Integer n;
for (int i = 0; i < integers.length; ++i) {
n = integers[i];
...
}
}
The variables s and n could be declared inside their respective loops, but since they are not, the compiler uses two "slots" in the stack frame. If they were declared inside the loop, the compiler can reuse the same slot, making the stack frame smaller.
What Really Matters
However, most of these issues are immaterial. A good JIT compiler will see that it is not possible to read the initial value you are wastefully assigning, and optimize the assignment away. Saving a slot here or there isn't going to make or break your application.
The important thing is to make your code readable and easy to maintain, and in that respect, using a limited scope is clearly better. The smaller scope a variable has, the easier it is to comprehend how it is used and what impact any changes to the code will have.
In theory, it's a waste of resources to declare the string inside the loop.
In practice, however, both of the snippets you presented will compile down to the same code (declaration outside the loop).
So, if your compiler does any amount of optimization, there's no difference.
In general I would choose the second one, because the scope of the 's' variable is limited to the loop. Benefits:
This is better for the programmer because you don't have to worry about 's' being used again somewhere later in the function
This is better for the compiler because the scope of the variable is smaller, and so it can potentially do more analysis and optimisation
This is better for future readers because they won't wonder why the 's' variable is declared outside the loop if it's never used later
If you want to speed up for loops, I prefer declaring a max variable next to the counter so that no repeated lookups for the condidtion are needed:
instead of
for (int i = 0; i < array.length; i++) {
Object next = array[i];
}
I prefer
for (int i = 0, max = array.lenth; i < max; i++) {
Object next = array[i];
}
Any other things that should be considered have already been mentioned, so just my two cents (see ericksons post)
Greetz, GHad
To add on a bit to #Esteban Araya's answer, they will both require the creation of a new string each time through the loop (as the return value of the some Assignment expression). Those strings need to be garbage collected either way.
I know this is an old question, but I thought I'd add a bit that is slightly related.
I've noticed while browsing the Java source code that some methods, like String.contentEquals (duplicated below) makes redundant local variables that are merely copies of class variables. I believe that there was a comment somewhere, that implied that accessing local variables is faster than accessing class variables.
In this case "v1" and "v2" are seemingly unnecessary and could be eliminated to simplify the code, but were added to improve performance.
public boolean contentEquals(StringBuffer sb) {
synchronized(sb) {
if (count != sb.length())
return false;
char v1[] = value;
char v2[] = sb.getValue();
int i = offset;
int j = 0;
int n = count;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
}
return true;
}
It seems to me that we need more specification of the problem.
The
s = some Assignment;
is not specified as to what kind of assignment this is. If the assignment is
s = "" + i + "";
then a new sting needs to be allocated.
but if it is
s = some Constant;
s will merely point to the constants memory location, and thus the first version would be more memory efficient.
Seems i little silly to worry about to much optimization of a for loop for an interpreted lang IMHO.
When I'm using multiple threads (50+) then i found this to be a very effective way of handling ghost thread issues with not being able to close a process correctly ....if I'm wrong, please let me know why I'm wrong:
Process one;
BufferedInputStream two;
try{
one = Runtime.getRuntime().exec(command);
two = new BufferedInputStream(one.getInputStream());
}
}catch(e){
e.printstacktrace
}
finally{
//null to ensure they are erased
one = null;
two = null;
//nudge the gc
System.gc();
}
Related
Let's say I have this function in java
public static Character firstNonrepeatedChar(String in) {
int[] repeated = new int[256];
for(int i=0; i<256; i++){
repeated[i] = 0;
}
// First time calling in.length()
for(int j=0; j<in.length(); j++){
repeated[in.charAt(j)]++;
}
// Second time calling in.length()
// I could have used "int length = in.length() and use this variable in this second loop"
for(int j=0; j<in.length(); j++){
if(repeated[in.charAt(j)] == 1)
return in.charAt(j);
}
return null;
}
As you can see I have used in.length() twice. Another approach could be saving the in.length() once in a variable and use the variable. can someone tell me how big of difference this makes? I know if I wanted to use that value like 100 times I should keep the value in a variable but in this case we are deciding between just one more function call or using an integer variable.
The JIT will inline simple methods like length() If you want improve performance you would have to look at different algorithms.
Something you can do is assume an array is already full of 0's so you don't need to zero it out. Note: you might have characters > 255.
Also I would return a char as you cannot have a null value.
generally, optimize when you can measure that the optimization 1) is beneficial and 2) is worth it. By worth it, I mean that a decrease in readability warrants the increased performance. Point 1 means that your change may be detrimental, or may just not do anything.
For example, adding a local variable will make your method's stack larger, which may be worse than the runtime gain in extreme environments. Also, String.length() simply returns a variable's value, so not calling the method does not save you much. Your JIT may (and probably will) optimize the loop condition anyway, meaning that your optimization was not actually beneficial.
It really depends on the function you are calling. Since you are calling String.length(), it is perfectly fine to call it more than once in the same context, even if the return value is not expected to change.
However, it is considered best practice to cache function return values in variables, especially for complex functions. In your case, there isn't much of a difference.
Take for example a loop like this:
public boolean method(){
for (int i = 0; i < 5; i++) {
if (this.object.getSomething().getSomeArray().get(i).getArray().size() > 0)
return false;
}
return true;
}
Each get method simply retrieves a private attribute. A more readable version of the same code would be:
public boolean method(){
MySomeArray mySomeArray = this.object.getSomething().getSomeArray();
for (int i = 0; i < 5; i++) {
MyArray array = mySomeArray.get(i).getArray();
if (array.size() > 0)
return false;
}
return true;
}
Another version is:
public boolean method(){
MySomeArray mySomeArray = this.object.getSomething().getSomeArray();
MyArray array;
for (int i = 0; i < 5; i++) {
array = mySomeArray.get(i).getArray();
if (array.size() > 0)
return false;
}
return true;
}
I know that in theory compilers can optimize many things and in this case, (in my opinion) the three versions of the loop should be optimized in exactly the same machine code.
Am I correct or there would be difference in terms of number of instructions executed in the three versions?
If MySomeArray, as well as all other classes involved in your dereference chain, are at the bottom of their respective class hierarchies, then HotSpot will have an easy time turning all those virtual function calls into "plain" (non-virtual) calls by a technique known as monomorphic call site optimization.
This can also happen even if the classes involved are not leaf classes. The important thing is that at each call site, only one object type ever gets dispatched on.
With the uncertainty of virtual functions out of the way, the compiler can proceed to inline all the calls, and then to perform any further optimizations, like hoisting in your case. The ultimate values retrieved from the chain of dereferencing can be bound to registers, etc.
Note that much of the above is subject to the entire code path being free of any happens-before relations to the actions of other threads. In practice this mostly means no volatile variable access and no synchronized blocks (within your own code as well as within all the code called from your code).
Write a test case that uses this method and print the generated assembly code when you run it. You can then check yourself how many of the calls are inlined. I'm skeptical about the compiler being able to inline them all, but the JIT compiler can be surprising.
I would prefer the more readable version anyway, because it's more readable.
With enough inlining, the compiler can indeed hoist the method calls out of the loop, very much like you did by hand in your second and third examples. The details of whether it will actually do this depend entirely on the behavior and size of the methods in question, and the sophistication of the JIT involved.
I wrote up your example and tested it with Caliper, and all the methods have equivalent timings. I didn't inspect the assembly, since that's more involved - but I'll bet they are near equivalent.
The trouble is that you are making assumptions that the compiler cannot make.
You know that this.object.getSomething().getSomeArray() does not change each time around the loop but the compiler has no way to know that. Especially since other threads may potentially be modifying those variables at the same time...
Ive got one simple question. Normally I write code like this:
String myString = "hello";
for (int i=0, i<10; i++)
{
myString = "hello again";
}
Because I think the following would not be good style cause it would create too many unnecessary objects.
for (int i=0, i<10; i++)
{
String myString = "hello again";
}
Is this even correct? Or is this just the case when Ive got an explicit object like an object from a class I created? What if it was a boolean or an int? What is better coding style? Instantiate it once before the loop and use it in the loop or instantiate it every time in the loop again? And why? Because the program is faster or less storage is used or...?
Some one told me, if it was a boolean I should instantiate it directly in the loop. He said it would not make a difference for the heap and it would be more clear that the variable belongs inside the loop. So what is correct?
Thanks for an answer! :-)
====
Thanks for all your answers!
In conclusion: it is preferable to declare an object inside the smallest scope possible. There are no performance improvements by declaring and instantiating objects outside the loop, even if in every looping the object is reinstantiated.
No, the latter code isn't actually valid. It would be with braces though:
for (int i=0; i<10; i++)
{
String myString = "hello again";
}
(Basically you can't use a variable declaration as a single-statement body for an if statement, a loop etc.)
It would be pointless, but valid - and preferable to the first version, IMO. It takes no more memory, but it's generally a good idea to give your local variables the narrowest scope you can, declaring as late as you can, ideally initializing at the same point. It makes it clearer where each variable can be used.
Of course, if you need to refer to the variable outside the loop (before or afterwards) then you'll need to declare it outside the loop too.
You need to differentiate between variables and objects when you consider efficiency. The above code uses at most one object - the String object referred to by the literal "hello again".
As Binyamin Sharet mentioned, you generally want to declare a variable within the smallest scope possible. In your specific examples, the second one is generally preferable unless you need access to the variable outside your loop.
However, under certain conditions this can have performance implications--namely, if you are instantiating the same object over and over again. In your particular example, you benefit from Java's automatic pooling of String literals. But suppose you were actually creating a new instance of the same object on every iteration of the loop, and this loop was being executed hundreds or thousands of times:
for (int i=0, i<1000; i++)
{
String myString = new String("hello again"); // 1000 Strings are created--one on every iteration
...
}
If your loop is looping hundreds or thousands of times but it just so happens that you're instantiating the same object over and over again, instantiating it inside the loop is going to result in a lot of unnecessary garbage collection, because you create and throw away a new object on every iteration. In that case, you would be better off declaring and instantiating the variable once outside of the loop:
String myString = new String("hello again"); // only one String is created
for (int i=0, i<1000; i++)
{
...
}
And, to come full circle, you can manually limit the scope by adding extra braces around the relevant section of code:
{ // Limit the scope
String myString = new String("hello again");
for (int i=0, i<1000; i++)
{
...
}
}
Seems like you mean declare, not instantiate and in general, you should declare a variable in the smallest scope required (in this case - in the loop).
if you are going to use the variable outside the for loop, then declare it out side, otherwise its better to keep the scope to minimum
The problem with the second is you create object and someone (the GC) has to clean them, of course for a 10 iteration it is unimportant.
BTW in your specific example I would have wrote
String myString = null;
final String HELLO_AGAIN="hello again";
for (int i=0; i<10; i++)
myString = HELLO_AGAIN;
Unless value is changed, you should definitely instantiate outside of the loop.
The problem here is that String is an immutable object: you cannot change the value of a string, only you can create new String objects. Either way, if your goal is to assign a variable a new object instance, then limit your scope and declare it inside the body of your loop.
If your object is mutable, then it would be reasonable to reuse the object in every next iteration of the loop, and just change those attributes you need. This concept is used to run the same query multiple times, but with different parameters, you use a PreparedStatement.
In the extreme case, you would even maintain pools of objects which can be shared within the whole application. You create additional objects as you run out of resources, you shrink if you detect a reasonable amount of non-use. This concept is used to maintain a Connection Pool.
I have a loop like:
String tmp;
for(int x = 0; x < 1000000; x++) {
// use temp
temp = ""; // reset
}
This string is holding at most 10 characters.
What would be the most effecient way of creating a variable for this use case?
Should I use a fixed size array? Or a stringbuffer instead?
I don't want to create 1million variables when I don't have to, and it matters for this method (performance).
Edit
I simplified my scenerio, I actually need this variable to be at the class level scope as there are some events that take place i.e. it can't be declared within the loop.
Why not simply declare temp inside the loop like so:
for(int x = 0; x < 1000000; x++) {
String temp;
// use temp
}
You even get a very (very, very) slight performance increase because you don't have to waste time resetting the value of temp to "".
With regards to your update, It still depends on what you do with temp but a StringBuffer would probably be the easiest to use. And especially if you need to concatenate together a Sting, it would be quite fast.
What exactly are you looking to do with tmp (or temp)?
Honestly, I'd just try declaring your variables within the loop if they aren't needed afterwards, and profile it. Many of the obscurities that have been used in the past to help with performance issues within loops are no longer needed in recent versions of Java, due to optimizations and other improvements in the compiler and the Hotspot JVM.
Whats the problem with using fixed array? I think array will do. Here is similar question i found Making a very large Java array
Well, stringbuffer or StringBuilder will do too. But stringBuilder is fast than stringBuffer.
And if it based on the performance level, i think you might want to check the types of loops that give better performance.
Try this
public class Robal {
public void looping()
{
for(int x = 0; x < 1000000; x++) {
String temp=x+"";
System.out.println(temp);
temp = ""; // reset
}
}
The answer really depends on what you do with temp in the loop.
String instances are immutable by definition. If your processing includes string manipulation, you should not use String since you'll end up creating a lot of unnecessary very short-lived immutable instances. In this case use StringBuilder (or StringBuffer if thread-safety is required) instead.
If you merely create a new String (or obtain it from an external source) in every iteration and use it without any string manipulation operations that create new String objects, then you're OK using String. Note that creating a new String instance every iteration is usually quite fast and unless your profiler specifically points to this being a problem, you should not attempt to optimize this prematurely.
Note, also, that unless you specifically rely in each iteration on temp initial value being a reference to an empty string, there is no need to do temp = ""
I come from a C background, so I admit that I'm still struggling with letting go of memory management when writing in Java. Here's one issue that's come up a few times that I would love to get some elaboration on. Here are two ways to write the same routine, the only difference being when double[] array is declared:
Code Sample 1:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Code Sample 2:
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Here, private double[] calculateSomethingAndReturnAnArray(int i) always returns an array of the same length. I have a strong aversion to Code Sample 2 because it creates a new array for each iteration when it could just overwrite the existing array. However, I think this might be one of those times when I should just sit back and let Java handle the situation for me.
What are the reasons to prefer one of the ways over the other or are they truly identical in Java?
There's nothing special about arrays here because you're not allocating for the array, you're just creating a new variable, it's equivalent to:
Object foo;
for(...){
foo = func(...);
}
In the case where you create the variable outside the loop it, the variable (which will hold the location of the thing it refers to) will only ever be allocated once, in the case where you create the variable inside the loop, the variable may be reallocated for in each iteration, but my guess is the compiler or the JIT will fix that in an optimization step.
I'd consider this a micro-optimization, if you're running into problems with this segment of your code, you should be making decisions based on measurements rather than on the specs alone, if you're not running into issues with this segment of code, you should do the semantically correct thing and declare the variable in the scope that makes sense.
See also this similar question about best practices.
A declaration of a local variable without an initializing expression will do NO work whatsoever. The work happens when the variable is initialized.
Thus, the following are identical with respects to semantics and performance:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
// ...
}
and
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
// ...
}
(You can't even quibble that the first case allows the array to be used after the loop ends. For that to be legal, array has to have a definite value after the loop, and it doesn't unless you add an initializer to the declaration; e.g. double[] array = null;)
To elaborate on #Mark Elliot 's point about micro-optimization:
This is really an attempt to optimize rather than a real optimization, because (as I noted) it should have no effect.
Even if the Java compiler actually emitted some non-trivial executable code for double[] array;, the chances are that the time to execute would be insignificant compared with the total execution time of the loop body, and of the application as a whole. Hence, this is most likely to be a pointless optimization.
Even if this is a worthwhile optimization, you have to consider that you have optimized for a specific target platform; i.e. a particular combination of hardware and JVM version. Micro-optimizations like this may not be optimal on other platforms, and could in theory be anti-optimizations.
In summary, you are most likely wasting your time if you focus on things like this when writing Java code. If performance is a concern for your application, focus on the MACRO level performance; e.g. things like algorithmic complexity, good database / query design, patterns of network interactions, and so on.
Both create a new array for each iteration. They have the same semantics.