Does the JVM optimize aliased variables? - java

I'm curious. If I make a few local variables that are used simply as aliases to other variables (that is, a new "alias" variable is simply assigned the value of some other variable and the alias' assignment never changes), does the JVM optimize this by using the original variables directly?
Say, for example, that I'm writing an (immutable) quaternion class (a quaternion is a vector with four values). I write a multiply() method that takes another Quaternion, multiplies the two, and returns the resulting Quaternion. To save on typing and increase readability, I make several aliases before performing the actual computation:
public class Quaternion {
private double[] qValues;
public Quaternion(double q0, double q1, double q2, double q3) {
qValues = new double[] {q0, q1, q2, q3};
}
// ...snip...
public Quaternion multiply(Quaternion other) {
double a1 = qValues[0],
b1 = qValues[1],
c1 = qValues[2],
d1 = qValues[3],
a2 = other.qValues[0],
b2 = other.qValues[1],
c2 = other.qValues[2],
d2 = other.qValues[3];
return new Quaternion(
a1*a2 - b1*b2 - c1*c2 - d1*d2,
a1*b2 + b1*a2 + c1*d2 - d1*c2,
a1*c2 - b1*d2 + c1*a2 + d1*b2,
a1*d2 + b1*c2 - c1*b2 + d1*a2
);
}
}
So, in this case, would the JVM do away with a1, b1, etc. and simply use qValues[n] and other.qValues[n] directly?

There is no such thing as an alias as you've described it in Java. When you assign a value from one memory location to a new variable, the JVM makes a copy of that value. If it were to create an alias, changing the underlying arrays during the calculation from another thread would alter the result. This does not happen in your example because you specifically told the JVM to make copies of the values first.
If you're worried about performance, don't be. Program correctness trumps all performance concerns. Any program that produces incorrect results faster is worthless. I'm not saying that accessing the arrays directly inside the calculation will necessarily produce incorrect results as I haven't seen the rest of the code, but I am saying that this type of micro-optimization is not worth your effort without first finding a performance problem and next performing timing tests to verify where that problem lies.

The javac compiler won't do it. Disassembling a simple piece of code like this:
int a = 1;
int b = a;
System.out.println("" + (a - b));
Shows:
0: iconst_1
1: istore_1
2: iload_1
3: istore_2
...
But this is what the interpreter will be executing (and even the interpreter sometimes can do some basic optimization). The JIT compiler will handle these kinds of optimizations and many others; in the case of you method, it's even small enough to be inlined, so you don't even get the method call overhead once the JIT kicks in.
(e.g., in my example, the JIT can very easily do constant propagation and just do away with the variables and the calculation, just calling using "" + 0 as the argument to the println() method.)
But, at the end, just follow what JIT hackers always say: write your code to be maintainable. Don't worry about what the JIT will or will not do.
(Note: David is correct about variables not being "aliases", but copies of the original values.)

As other answers have pointed out, there no concept of aliasing variables in Java, and the value of a variable is stored per variable declared.
Using local variables to store values of an array for future calculations is a better idea as it makes code more readable, and eliminates extra reads from an array.
That being said, creating local variables does increase the size of the allocated java stack frame in the thread for the method. This would not be an issue in this specific question, but greater number of local variables would increase the stack size required for execution. This would be especially relevant if recursion is involved.

Your program should work but the answer to your questions is no. JVM will not treat a1 as an alias of qValues[0], instead it copy the value of the latter to the former.
Check this good referenece: http://www.yoda.arachsys.com/java/passing.html

Related

Java AtomincInteger vs one element array for streams

int[] arr = new int[]{0};
l.stream().forEach(x -> {if (x > 10 && x < 15) { arr[0] += 1;}});
l is List<Integer>. Here I use one element arr array to store value that is changed inside the stream. An alternative solution is to use an instance of AtomicInteger class. But I don't understand what is the difference between these two approaches in terms of memory usage, time complexity, safety...
Please note: I am not trying to use AtomicInteger (or array) in this particular piece of code. This code is used only as an example. Thanks!
Knowing which is the best way is important and #rzwitserloot's explanation covers that in great detail. In your specific example, you could avoid the issue by doing it like this.
List<Integer> list = List.of(1,2,11,12,15,11,11,9,10,2,3);
int count = list.stream().filter(x->x > 10 && x < 15).reduce(0, (a,b)->a+1);
// or
int count = list.stream().filter(x->x > 10 && x < 15).mapToInt(x->1).sum();
Both return the value 4
In the first example, reduce sets an initial value of 0 and then adds 1 to it (b is syntactically required but not used). To sum the actual elements rather than 1, replace 1 with b in the reduce method.
In the second example, the values are replace with 1 in the stream and then summed. Since the method sum() doesn't exist for streams of objects, the 1 needs to be mapped to an int to create an IntStream. To sum the actual elements here, use mapToInt(x->x)
As suggested in the comments, you can also do it like this.
long count = list.stream().filter(x->x > 10 && x < 15).count();
count() returns a long so it would have to be down cast to an int if that is what you want.
You should always use AtomicInteger:
The performance impact is negligible. Technically, new int[1] is 'faster', but they are the same size, or, the array is actually larger in heap (but unlikely; depends on your OS architecture, usually they'd end up being the same size), and the array does not spend any cycles on guaranteeing proper concurrency protections, but there are really only two options: [A] the concurrency protections are required (because it's a lambda that runs in another thread), and thus the int array is a non-starter; it would result in hard to find bugs, quite horrible, or [B] they aren't required, and the hotspot engine is likely going to figure that out and eliminate this cost entirely. Even if it doesn't, the overhead of concurrency protection when there is no contention is low in any case.
It is more readable. Only slightly so, but new int[1] is weirder than new AtomicInteger(), I'd say. AtomicInteger at least suggests: I want a mutable int that I'm going to mess with from other contexts.
It is more convenient. System.out.println-ing an atomicinteger prints the value. sysouting an array prints garbage.
The convenience methods in AtomicInteger might be relevant. Maybe compareAndSet is useful.
But why?
Lambdas are not transparent in the following 3 things:
Checked exceptions (you cannot throw a checked exception inside a lambda even if the context around your lambda catches it).
Mutable local vars (you cannot touch, let alone change, any variable declared outside of the lambda, unless it is (effectively) final).
Control flow. You can't use break, continue, or return from inside a lambda and have it act like it wasn't: You can't break or continue a loop located outside of your lambda and you can't return form the method outside of your lambda (you can only return from the lambda itself).
These are all very bad things when the lambda runs 'in context', but they are all very good things when the lambda doesn't run in context.
Here is an example:
new TreeSet<String>((a, b) -> a - b);
Here I have created a TreeSet (which is a set that keeps its elements sorted automatically). To make one, you pass in code that determines for any 2 elements which one is 'the higher one', and TreeSet takes care of everything else. That TreeSet can survive your method (just store it in a field or pass it to a method that ends up storing it in a field) and could even escape your thread (have another thread read that field). That means when that code (a - b in this code) is invoked, we could be 5 days from the creation of that TreeSet, in another thread, with the code that 'surrounds' your new TreeSet statement having loooong gone.
In this scenario, all those transparencies make no sense at all:
What does it mean to break back to a loop that has long since completed and the system doesn't even know what it is about anymore?
That catch block uses context that is long gone, such as local vars or the parameters. It can't survive, so if your a - b were to throw something that is checked, the fact that you've wrapped your new TreeSet<> in a try/catch block is meaningless.
What does it mean to access a variable that no longer exists? For that matter, if it still does exist but the lambda runs in a separate thread, do we now start making local vars volatile and declare them on heap instead of stack just in case?
Of course, if your lambda runs within context, as in, you pass the lambda to some method and that method 'uses it or loses it': Runs your lambda a certain amount of times and then forgets all about it, then those lacking transparencies are really annoying.
It's annoying that you can't do this:
public List<String> toLines(List<Path> files) throws IOException {
var allLines = files.stream()
.filter(x -> x.toString().endsWith(".txt"))
.flatMap(x -> Files.readAllLines().stream())
.toList();
}
The only reason the above code fails is that Files.readAllLines() throws IOException. We declared that we throws this onwards but that won't work. You have to kludge up this code, make it bad, by trying to somehow transit that exception out of the lambda or otherwise work around it (the right answer is NOT the use the stream API at all here, write it with a normal for loop!).
Whilst trying to dance around checked exceptions in lambdas is generally just not worth it, you CAN work around the problem of wanting to share a variable with outer context:
int sum = 0;
listOfInts.forEach(x -> sum += x);
The above doesn't work - sum is from the outer scope and thus must be effectively final, and it isn't. There's no particular reason it can't work, but java won't let you. The right answer here is to use int sum = listOfInts.mapToInt(Integer::intValue).sum(); instead, but you can't always find a terminal op that just does what you want. Sometimes you need to kludge around it.
That's where new int[1] and AtomicInteger comes in. These are references - and the reference is final, so you CAN use them in the lambda. But the reference points at an object and you can change it at will, hence, you can use this 'trick' to 'share' a variable:
AtomicInteger sum = new AtomicInteger();
listOfInts.forEach(x -> sum.add(x));
That DOES work.

Java: Efficiency cost of new object to store field vs object access

I produced code
for(String buff : Arrays.asList(
FORMULA_PATTERN.matcher(s).group().split("\n")
)
And I started wondering where performance lies with not creating an explicit local container for method returns. (Also, I'm not sure that this works, I'm just prototyping some stuff while waiting for full specs)
Related to Java coding style, local variables vs repeated method calls but I'm curious of the actuality of it.
for the coding pattern (not the specific case, just examples)
Integer.parseInt(buff.substring(buff.length()-1));
Is there any gain/loss over
int x = buff.length() -1;
Integer.parseInt(buff.substring(x);
Or, for the more general case of the language, avoiding primitives
Integer x = buff.length() -1;
Integer.parseInt(buff.substring(x);
This may differ for collections, but my curiosity is the cost associated with Java's pass by value having a cost to instantiate an object to be returned.
Not really concerned with readability, just wondering if, when, and how one method would outperform the other.

Overuse of Method-chaining in Java

I see a lot of this kind of code written by Java developers and Java instructors:
for ( int x = 0 ; x < myArray.length ; x++ )
accum += (mean() - myArray[x]) * (mean() - myArray[x] );
I am very critical of this because mean() is being invoked twice for every element in the array, when it only has to be invoked once:
double theMean = mean();
for ( int x = 0 ; x < myArray.length ; x++ )
accum += (theMean - myArray[x]) * (theMean - myArray[x]);
Is there something about optimization in Java that makes the first example acceptable? Should I stop riding developers about this?
*** More information. An array of samples is stored as an instance variable. mean() has to traverse the array and calculate the mean every time it is invoked.
You are right. Your way (second code sample) is more efficient. I don't think Java can optimize the first code sample to call mean() just once and re-use its return value, since mean() might have side effects, so the compiler can't decide to call it once if your code calls it twice.
Leave your developers alone, it's fine -- it's readable and it works, without introducing unnecessary names and variables.
Optimization should only ever be done under the guidance of a performance monitoring tool which can show you where you're actually slow. And, typically, performance is enhanced more effectively by considering the large scale architecture of an application, not line by line bytecode optimization, which is expensive and usually unhelpful.
Your version will likely run faster, though an optimizing compiler may be able to detect if the mean() method returns the same value every time (e.g. if the value is hard-coded or stored in a field) and eliminate the method call.
If you are recommending this change for efficiency reasons, you may be falling foul of premature optimization. You don't really know where the bottlenecks are in your system until you measure in the appropriate environment under appropriate loads. Even then, improved hardware is often more cost-effective solution than developer time.
If you are recommending it because it will eliminate duplication then I think you might be on stronger ground. If the mean() method took arguments too, it would be especially reasonable to pull that out of the loop and call the method once and only once.
Yes, some compilers will optimize this to just what you say.
Yes, you should stop riding developers about this.
I think your preferred way is better, but not mostly because of the optimization. It is more clear that the value is the same in both places if it does not involve a method call, particularly in cases where the method call is more complex than the one you have here.
For that matter, I think it's better to write
double theMean = mean();
for (int x=0; x < myArray.length; x++)
{ double curValue = myArray[x];
double toSquare = theMean - curValue;
accum += toSquare * toSquare;
}
Because it makes it easier to determine that you are squaring whatever is being accumulated, and just what it is that's being sqaured.
Normally the compiler will not optimize the method call since it cannot know whether the return value would be the same (this is especially true when mean processes an array as it has no way of checking whether the result can be cached). So yes the mean() method would be invoked twice.
In this case, if you know for sure that the array is kept the same regardless of the values of x and accum in the loop (more generally, regardless of any change in the program values), then the second code is more optimal.

Java variable declaration efficiency

As I understand, in case of an array, JAVA checks the index against the size of the Array.
So instead of using array[i] multiple times in a loop, it is better to declare a variable which stores the value of array[i], and use that variable multiple times.
My question is, if I have a class like this:
public class MyClass(){
public MyClass(int value){
this.value = value;
}
int value;
}
If I create an instance of this class somewhere else: (MyClass myobject = new MyClass(7)), and I have to use the objects value multiple times, is it okay to use myobject.value often or would it be better to declare a variable which stores that value and use that multiple times, or would it be the same?
In your case, it wouldn't make any difference, since referencing myobject.value is as fast and effective as referencing a new int variable.
Also, the JVM is usually able to optimize these kinds of things, and you shouldn't spend time worrying about it unless you have a highly performance critical piece of code. Just concentrate on writing clear, readable code.
The short answer is yes (in fact, in the array case, it does not only have to check the index limit but to calculate the actual memory position of the reference you are looking for -as in i=7, get the base position of the array and add 7 words-).
The long answer is that, unless you are really using that value a lot (and I mean a lot) and you are really constrained due to speed, it is not worth the added complexity of the code. Add to that that the local variable means that your JVM uses more memory, may hit a cache fault, and so on.
In general, you should worry more about the efficiency of your algorithm (the O(n)) and less about these tiny things.
The Java compiler is no bozo. He will do that optimization for you. There is 0 speed difference between all the options you give, usually.
I say 'usually' because whether or not accessing the original object or your local copy isn't always the same. If your array is globally visible, and another thread is accessing it, the two forms will yield different results, and the compiler cannot optimize one into the other. It is possible that something confuses the compiler into thinking there may be a problem, even though there isn't. Then it won't apply a legal optimization.
However, if you aren't doing funny stuff, the compiler will see what you're doing and optimize variable access for you. Really, that's what a compiler does. That's what it's for.
You need to optimize at least one level above that. This one isn't for you.

How bad is declaring arrays inside a for loop in Java?

I come from a C background, so I admit that I'm still struggling with letting go of memory management when writing in Java. Here's one issue that's come up a few times that I would love to get some elaboration on. Here are two ways to write the same routine, the only difference being when double[] array is declared:
Code Sample 1:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Code Sample 2:
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Here, private double[] calculateSomethingAndReturnAnArray(int i) always returns an array of the same length. I have a strong aversion to Code Sample 2 because it creates a new array for each iteration when it could just overwrite the existing array. However, I think this might be one of those times when I should just sit back and let Java handle the situation for me.
What are the reasons to prefer one of the ways over the other or are they truly identical in Java?
There's nothing special about arrays here because you're not allocating for the array, you're just creating a new variable, it's equivalent to:
Object foo;
for(...){
foo = func(...);
}
In the case where you create the variable outside the loop it, the variable (which will hold the location of the thing it refers to) will only ever be allocated once, in the case where you create the variable inside the loop, the variable may be reallocated for in each iteration, but my guess is the compiler or the JIT will fix that in an optimization step.
I'd consider this a micro-optimization, if you're running into problems with this segment of your code, you should be making decisions based on measurements rather than on the specs alone, if you're not running into issues with this segment of code, you should do the semantically correct thing and declare the variable in the scope that makes sense.
See also this similar question about best practices.
A declaration of a local variable without an initializing expression will do NO work whatsoever. The work happens when the variable is initialized.
Thus, the following are identical with respects to semantics and performance:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
// ...
}
and
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
// ...
}
(You can't even quibble that the first case allows the array to be used after the loop ends. For that to be legal, array has to have a definite value after the loop, and it doesn't unless you add an initializer to the declaration; e.g. double[] array = null;)
To elaborate on #Mark Elliot 's point about micro-optimization:
This is really an attempt to optimize rather than a real optimization, because (as I noted) it should have no effect.
Even if the Java compiler actually emitted some non-trivial executable code for double[] array;, the chances are that the time to execute would be insignificant compared with the total execution time of the loop body, and of the application as a whole. Hence, this is most likely to be a pointless optimization.
Even if this is a worthwhile optimization, you have to consider that you have optimized for a specific target platform; i.e. a particular combination of hardware and JVM version. Micro-optimizations like this may not be optimal on other platforms, and could in theory be anti-optimizations.
In summary, you are most likely wasting your time if you focus on things like this when writing Java code. If performance is a concern for your application, focus on the MACRO level performance; e.g. things like algorithmic complexity, good database / query design, patterns of network interactions, and so on.
Both create a new array for each iteration. They have the same semantics.

Categories

Resources