scope of local variable in enhanced for-loop - java

I have a rather simple question about variable scope.
I am familiar with the Enhanced For-Loops but I do not get why I should declare a new variable to keep each element. One example might clarify my question:
int[] ar = {1, 2, 3};
int i = 0;
for(i : ar) { // this causes an error if I do not declare a new variable: int i
// for(int i : ar) // this works fine
System.out.println(i);
}
So why I should declare this new variable? After all i is accessible inside the for loop. I did not want to use any previous value of i, just did not want to declare a new variable. (I guessed for other iterable items it might be faster using the same variable).
I guess that's how Enhanced For-Loops were built but does not this break the whole scope idea?
There is a question rising from the above behavior. Whether the compiler uses the same variable for the whole for loop and just updates its value or it creates a new variable for each iteration?
An interesting part is that if I keep both declaration of int i (before and inside the for loop) I even get a compiler error about
Duplicate local variable i
which makes (at least for me) things a bit more strange. So I cannot use the previous declared variable i inside the for loop but neither can I declare a new one inside it with the same name.

So why I should declare this new variable?
Because that's the way the syntax is defined.
After all i is accessible inside the for loop.
That's semantics. It's irrelevant to syntax.
I did not want to use any previous value of i, just did not want to declare a new variable. (I guessed for other iterable items it might be faster using the same variable).
Don 't guess about performance. Test and measure. But in this case there's nothing to measure, because any working code is faster than any non-working code.

Does this means that I have a local variable that gets different values or a different variable in each loop?
From a language point of view you have a different variable in each iteration. That’s why you can write:
for(final ItemType item: iterable) {
…
}
which makes a great difference as you can create inner class instances within the loop referring to the current element. With Java 8 you can use lambdas as well and even omit the final modifier but the semantic does not change: you don’t get the surprising results like in C#.
I guessed for other iterable items it might be faster using the same variable
That’s nonsense. As long as you don’t have a clue of how the produced code looks like you shouldn’t even guess.
But if you are interested in the details of Java byte code: within a stack frame local variables are addressed by a number rather than by a name. And the local variables of your program are mapped to these storage locations by reusing the storage of local variables that went out of scope. It makes no difference whether the variable exists during the entire loop or is “recreated” on every iteration. It will still occupy just one slot within the stack frame. Hence, trying to “reuse local variables” on a source code level makes no sense at all. It just makes your program less readable.

Just to have the reference here: The JLS Section 14.14.2, The enhanced for statement defines the enhanced for-loop to have the following structure (relevant for this question):
EnhancedForStatement:
for ( {VariableModifier} UnannType VariableDeclaratorId : Expression ) Statement
where UnannType can be summarized to be "a type" (primitive, reference...). So giving the type of the loop variable is simply obligatory according to the language specification - causing the (admittedly: somewhat confusing) observations described in the question.

The int i in the program is visible to the for loop and maybe other for loops beneath it (if present) under the same scope. But the i inside the for(int i : ar) is local to the for loop. Hence ending once the execution of loop is over. Thats the syntax defined for foreach loop that "you have to use a variable with scope limited to the loop".
So why I should declare this new variable? After all i is accessible inside the for loop. I did not want to use any previous value of i, just did not want to declare a new variable. (I guessed for other iterable items it might be faster using the same variable).
Why would there be any considerable performance benefit if you use the same variable tiny primitive variable over and over versus creating a one only when needed and which gets destroyed after loop ends.

I don't think anyone has answered the original question beyond just declaring that that is the syntax. We all know that that is the syntax. The question is, logically speaking, why?
After all, you can use a variable defined just before a loop as the loop variable, as long as the loop is a non-enhanced for loop!

Related

How to increment a value in Java Stream?

I want to increment value of index with the each iteration by 1. Easily to be achieved in the for-loop. The variable image is an array of ImageView.
Here is my for-loop.
for (Map.Entry<String, Item> entry : map.entrySet()) {
image[index].setImage(entry.getValue().getImage());
index++;
}
In order to practise Stream, I have tried to rewrite it to the Stream:
map.entrySet().stream()
.forEach(e -> item[index++].setImage(e.getValue().getImage()));
Causing me the error:
error: local variables referenced from a lambda expression must be final or effectively final
How to rewrite the Stream incrementing the variable index to be used in?
You shouldn't. These two look similar, but they are conceptually different. The loop is just a loop, but a forEach instructs the library to perform the action on each element, without specifying neither the order of actions (for parallel streams) nor threads which will execute them. If you use forEachOrdered, then there are still no guarantees about threads, but at least you have the guarantee of happens-before relationship between actions on subsequent elements.
Note especially that the docs say:
For any given element, the action may be performed at whatever time
and in whatever thread the library chooses. If the action accesses
shared state, it is responsible for providing the required
synchronization.
As #Marko noted in the comments below, though, it only applies to parallel streams, even if the wording is a bit confusing. Nevertheless, using a loop means that you don't even have to worry about all this complicated stuff!
So the bottom line is: use loops if that logic is a part of the function it's in, and use forEach if you just want to tell Java to “do this and that” to elements of the stream.
That was about forEach vs loops. Now on the topic of why the variable needs to be final in the first place, and why you can do that to class fields and array elements. It's because, like it says, Java has the limitation that anonymous classes and lambdas can't access a local variable unless it never changes. Meaning not only they can't change it themselves, but you can't change it outside them as well. But that only applies to local variables, which is why it works for everything else like class fields or array elements.
The reason for this limitation, I think, is lifetime issues. A local variable exists only while the block containing it is executing. Everything else exists while there are references to it, thanks to garbage collection. And that everything else includes lambdas and anonymous classes too, so if they could modify local variables which have different lifetime, that could lead to problems similar to dangling references in C++. So Java took the easy way out: it simply copies the local variable at the time the lambda / anonymous class is created. But that would lead to confusion if you could change that variable (because the copy wouldn't change, and since the copy is invisible it would be very confusing). So Java just forbids any changes to such variables, and that's that.
There are many questions on the final variables and anonymous classes discussed already, like this one.
Some kind of "zip" operation would be helpful here, though standard Stream API lacks it. Some third-party libraries extending Stream API provide it, including my free StreamEx library:
IntStreamEx.ints() // get stream of numbers 0, 1, 2, ...
.boxed() // box them
.zipWith(StreamEx.ofValues(map)) // zip with map values
.forKeyValue((index, item) -> image[index].setImage(item.getImage()));
See zipWith documentation for more details. Note that your map should have meaningful order (like LinkedHashMap), otherwise this would be pretty useless...

How to add an element to an array?

Suppose I need to add an element to my array1. The approach I took was to copy array1 into array2 using a for loop, than delete array1, and re-declare array1, with one more element than the previous array1. And than copy array2 into array1, and than initialize the new element in array1. But I am guessing there is no way to delete an array, so how can I add an element to an array after its declaration.
If you have naming conflicts, the best solution is to use a more descriptive variable name.
In your case, use nameOfThing1 or nameOfThing2 in order to distinguish them. In most cases though, when something's "name" is a number, it's usually called an "id".
Depending on the situation, another strategy is to restructure your code, so that the variables aren't in scope at the same time. You can declare a block using { } in order to make a separate scope for each variable.
Suppose i have an int named NAME, and I need to delete it, to make a String with the same name. Will NAME = null delete the int??
No. It will give you a compilation error.
If it won't, then how can you delete a variable?
You can't. Variables cannot be deleted.
(Aside: Java's static typing would break if it was possible to change the binding between a variable's name and its declared type. If you (actually) deleted the original variable, that wouldn't be so bad, but it would be bad for readability. Anyhow ... you can't.)
One solution is to declare a different variable with a different variable name. Using better variable names will help.
Alternatively, you could use block scoping to manage the scope / lifetime of a (local) variable. For example:
public void doIt() {
int a = 1;
{
String a = "hey"; // compilation error
int b = 2;
}
String b = "bee"; // OK
}
Note you can only do this with local variables.
If you are worried about the "efficiency overheads" of having multiple variables at the same time, don't. If there is a significant efficiency concern, the JIT compiler should be able to deal with it. (If not now, then in a future Java release.)

pattern for getting around final limitation of Java closure

I'm trying to write a very simple piece of code and can't figure out an elegant solution to do it:
int count = 0;
jdbcTemplate.query(readQuery, new RowCallbackHandler() {
#Override
public void processRow(ResultSet rs) throws SQLException {
realProcessRow(rs);
count++;
}
});
This obviously doesn't compile. The 2 solutions that I'm aware of both stink:
I don't want to make count a class field because it's really a local variable that I just need for logging purposes.
I don't want to make count an array because it is plain ugly.
This is just silly, there got to be a reasonable way to do it?
A third possibility is to use a final-mutable-int-object, for example:
final AtomicInteger count = new AtomicInteger(0);
....
count.incrementAndGet();
Apache Commons also have a MutableInteger I believe, but I have not used it.
You seem to already be aware of the solutions (they are different though); and you are probably aware of the reasons (it cannot capture local variables by reference because the variable might not exist by the time the closure is run, so it must capture by value (have multiple copies); it is bad to have the same variable refer to different copies in different scopes that each can be changed independently, so they cannot be changed).
If your closure does not need to share state back to the enclosing scope, then a field in the class is the right thing to do. I don't understand what your objection is. If the closure needs to be able to be called multiple times and it needs to increment each time, then it needs to maintain state in the object. A field (instance variable) properly expresses the storing of state in an object. The field can be initialized with the captured value from the outside scope.
If your closure needs to share state back to the enclosing scope (which is not a very common situation), then using a mutable structure (like an array) is the right thing to do, because it avoids the problem of the lifetime of the local variable.
I typically make count a class field but add a comment that it is only a field because it is used by an inner closure, Runnable etc...

Multiple Objects vs Changing One Object

I saw something today talking about this:
aClass something;
while (condition) {
something = new aClass();
...
}
while (condition) {
aClass something = new aClass();
...
}
It said you should use the second one rather than the first. Is this true, and if so, why?
Your first example leaks a useless variable into the outer scope.
The second method keeps the something variable only in the scope of that specific loop iteration.If you want to use the object outside the loop and / or keep the changes saved between iterations then you must use the first method.
Also, the second method doesn't define multiple variables, the compiler will usually optimize it in a way that makes sure only one variable is defined.
You should use the second example unless you need to use the object after the while loop is complete. If you don't need the variable in the outer scope it's better to declare it in the narrowest scope where it will be used (inside the loop). This simplifies the code for maintenance programmers who have to make sense of it.

Why does javac complain about not initialized variable?

For this Java code:
String var;
clazz.doSomething(var);
Why does the compiler report this error:
Variable 'var' might not have been initialized
I thought all variables or references were initialized to null. Why do you need to do:
String var = null;
??
Instance and class variables are initialized to null (or 0), but local variables are not.
See §4.12.5 of the JLS for a very detailed explanation which says basically the same thing:
Every variable in a program must have a value before its value is used:
Each class variable, instance variable, or array component is initialized with a default value when it is created:
[snipped out list of all default values]
Each method parameter is initialized to the corresponding argument value provided by the invoker of the method.
Each constructor parameter is initialized to the corresponding argument value provided by a class instance creation expression or explicit constructor invocation.
An exception-handler parameter is initialized to the thrown object representing the exception.
A local variable must be explicitly given a value before it is used, by either initialization or assignment, in a way that can be verified by the compiler using the rules for definite assignment.
It's because Java is being very helpful (as much as possible).
It will use this same logic to catch some very interesting edge-cases that you might have missed. For instance:
int x;
if(cond2)
x=2;
else if(cond3)
x=3;
System.out.println("X was:"+x);
This will fail because there was an else case that wasn't specified. The fact is, an else case here should absolutely be specified, even if it's just an error (The same is true of a default: condition in a switch statement).
What you should take away from this, interestingly enough, is don't ever initialize your local variables until you figure out that you actually have to do so. If you are in the habit of always saying "int x=0;" you will prevent this fantastic "bad logic" detector from functioning. This error has saved me time more than once.
Ditto on Bill K. I add:
The Java compiler can protect you from hurting yourself by failing to set a variable before using it within a function. Thus it explicitly does NOT set a default value, as Bill K describes.
But when it comes to class variables, it would be very difficult for the compiler to do this for you. A class variable could be set by any function in the class. It would be very difficult for the compiler to determine all possible orders in which functions might be called. At the very least it would have to analyze all the classes in the system that call any function in this class. It might well have to examine the contents of any data files or database and somehow predict what inputs users will make. At best the task would be extremely complex, at worst impossible. So for class variables, it makes sense to provide a reliable default. That default is, basically, to fill the field with bits of zero, so you get null for references, zero for integers, false for booleans, etc.
As Bill says, you should definitely NOT get in the habit of automatically initializing variables when you declare them. Only initialize variables at declaration time if this really make sense in the context of your program. Like, if 99% of the time you want x to be 42, but inside some IF condition you might discover that this is a special case and x should be 666, then fine, start out with "int x=42;" and inside the IF override this. But in the more normal case, where you figure out the value based on whatever conditions, don't initialize to an arbitrary number. Just fill it with the calculated value. Then if you make a logic error and fail to set a value under some combination of conditions, the compiler can tell you that you screwed up rather than the user.
PS I've seen a lot of lame programs that say things like:
HashMap myMap=new HashMap();
myMap=getBunchOfData();
Why create an object to initialize the variable when you know you are promptly going to throw this object away a millisecond later? That's just a waste of time.
Edit
To take a trivial example, suppose you wrote this:
int foo;
if (bar<0)
foo=1;
else if (bar>0)
foo=2;
processSomething(foo);
This will throw an error at compile time, because the compiler will notice that when bar==0, you never set foo, but then you try to use it.
But if you initialize foo to a dummy value, like
int foo=0;
if (bar<0)
foo=1;
else if (bar>0)
foo=2;
processSomething(foo);
Then the compiler will see that no matter what the value of bar, foo gets set to something, so it will not produce an error. If what you really want is for foo to be 0 when bar is 0, then this is fine. But if what really happened is that you meant one of the tests to be <= or >= or you meant to include a final else for when bar==0, then you've tricked the compiler into failing to detect your error. And by the way, that's way I think such a construct is poor coding style: Not only can the compiler not be sure what you intended, but neither can a future maintenance programmer.
I like Bill K's point about letting the compiler work for you- I had fallen into initializing every automatic variable because it 'seemed like the Java thing to do'. I'd failed to understand that class variables (ie persistent things that constructors worry about) and automatic variables (some counter, etc) are different, even though EVERYTHING is a class in Java.
So I went back and removed the initialization I'd be using, for example
List <Thing> somethings = new List<Thing>();
somethings.add(somethingElse); // <--- this is completely unnecessary
Nice. I'd been getting a compiler warning for
List<Thing> somethings = new List();
and I'd thought the problem was lack of initialization. WRONG. The problem was I hadn't understood the rules and I needed the <Thing> identified in the "new", not any actual items of type <Thing> created.
(Next I need to learn how to put literal less-than and greater-than signs into HTML!)
I don't know the logic behind it, but local variables are not initialized to null. I guess to make your life easy. They could have done it with class variables if it were possible. It doesn't mean you have to have it initialized in the beginning. This is fine :
MyClass cls;
if (condition) {
cls = something;
else
cls = something_else;
Sure, if you've really got two lines on top of each other as you show- declare it, fill it, no need for a default constructor. But, for example, if you want to declare something once and use it several or many times, the default constructor or null declaration is relevant. Or is the pointer to an object so lightweight that its better to allocate it over and over inside a loop, because the allocation of the pointer is so much less than the instantiation of the object? (Presumably there's a valid reason for a new object at each step of the loop).
Bill IV

Categories

Resources