I'm reading a blog post and trying to understand what's going on.
This is the blogpost.
it has this code:
if (validation().hasErrors())
throw new IllegalArgumentException(validation().errorMessage());
In the validation() method we have some object initialization and calculations so let' say it's an expensive call. Is it going to be executed twice? Or will it be optimized by the compiler to be something like this?
var validation = validation();
if (validation.hasErrors())
throw new IllegalArgumentException(validation.errorMessage());
Thanks!
The validation method will be called twice, and it will do the same work each time. First, the method is relatively big, and so it won't get inlined. Without being inlined, the compiler doesn't know what it does. Therefore, it safely assumes that the method has side effects, and so it cannot optimize away the second call.
Even if the method was inlined, and the compiler could examine it, it would see that there are in fact side effects. Calling LocalDate.now() returns a different result each time. For this reason, the code that you linked to is defective, although it's not likely to experience a problem in practice.
It's safer to capture the validation result in a local variable not for performance reasons, but for stability reasons. Imagine the odd case in which the initial validation call fails, but the second call passes. You'd then throw an exception with no message in it.
The Java to Bytecode compiler has a limited set of optimization techniques (e.g. 9*9 in the condition would turn into 81).
The real optimization happens by the JIT (Just In Time) compiler. This compiler is the result of over a decade and a half of extensive research and there is no simple answer to tell what it is capable of in every scenario.
With that being said, as a good practice, I always handle repetitive identical method calls by storing their result before approaching any loop structure where that result is needed. Example:
int[] grades = new int[500];
int countOfGrades = arr.length;
for (int i = 0; i < countOfGrades; i++) {
// Some code here
}
For your code (which is only run twice), you shouldn't worry as much about such optimization. But if you're looking for the ultimate – guaranteed – optimization on the account of a fraction of space (which is cheap), then you're better off using a variable to store any identical method result when needed more than once:
var validation = validation();
if (validation.hasErrors())
throw new IllegalArgumentException(validation.errorMessage());
However, I must simply question ... "these days," does it even actually matter anymore? Simply write the source-code "in the most obvious manner available," as the original programmer certainly did.
"Microseconds" really don't matter anymore. But, "clarity still does." To me, the first version of the code is frankly more understandable than the second, and "that's what matters to me most." Please don't bother to try to "out-smart" the compiler, if it results in source-code that is in any way harder to understand.
Related
Given the following code snippet in which an ad hoc instance of a Set is created, will todays Java compiler see that they do not need to create the instance for each loop pass and optimize set as a kind of final constant such that the instance is the same for the entire loop? The question applies similarly to while and do while loops.
for (/* a loop that iterates quite often */)
{
var set = Set.of("foo", "blah", "baz");
// Do something with the set.
}
I would also be interested whether such an optimization is done already at compile-time (meaning that the byte code is already optimized) or whether there exist runtime optimizations (made by the just-in-time compiler) that essentially achieve the same.
If a compiler does not see this, the only option seems to instantiate the set outside the loop in order to be optimal:
var set = Set.of("foo", "blah", "baz");
for (/* a loop that iterates quite often */)
{
// Do something with the set.
}
Side note, for those finding the first snippet rather bad practice and that would write it like the second snippet anyway: This snippet is actually a simplified variant of my real use case, made for simplicity reasons of the question. The real, slightly more complex use case that brought me to this question is the following, where a string needs to be checked whether it is one of a very few strings:
for (/* a loop that iterates quite often */)
{
// ...
var testString = // some String to check ...
if (Set.of("foo", "blah", "baz").contains(testString))
{
// Do something.
}
// ...
}
Assuming that the if condition is surrounded by additional code (i.e., the loop body is rather large), I think one wants to declare the set inline rather than further away outside the loop.
The answer will inevitably depend on the Java tools that you use.
The javac bytecode compiler is (deliberately) naive, and I would not expect it to do any optimization of this. The JVM's JIT compiler may optimize this, but that will depend on:
your Java version,
the JIT compiler tier used
what /* do something with the set */ actually entails.
If this really matters to you, I would advise a couple of approaches.
Rewrite the code yourself to hoist the Set declaration out of the loop. IMO, there is nothing fundamentally wrong with this approach, though it is potentially a premature optimization. (I'd probably do this myself in the examples you gave. It just "feels right" to me.)
Use the JVM options for dumping out the native code produced by the JIT compiler and see what it actually does. But beware that the results are liable to vary: see above.
But even if you know with a good degree of certainty how the compilers will do, it is not clear that you should be worrying about this in general.
Note looking at the bytecodes with javap -c will tell you if the java compiler is doing any optimization. (Or more likely, confirm that it isn't.)
I'm attempting to understand what's happening in this bit of Java code as its owner are no longer around and possibly fixing it or simplifying it. I'm guessing these blocks had a lot more in them at some point and what's left in place was not cleaned up properly.
It seems all occurrences of orElse(false) don't set anything to false and can be removed.
Then the second removeDiscontinued method is returning a boolean that I don't think is used anywhere. Is this just me or this is written in a way that makes it hard to read?
I'm hesitant removing anything from it since I haven't used much of the syntax like orElse, Lazy, Optional. Some help would be much appreciated.
private void removeDiscontinued(Optional<Map<String, JSONArrayCache>> dptCache, Lazy<Set<String>> availableTps) {
dptCache.map(pubDpt -> removeDiscontinued(pubDpt.keySet(), availableTps)).orElse(false);
}
private boolean removeDiscontinued(Set<String> idList, Lazy<Set<String>> availableTps) {
if (availableTps.get().size() > 0) {
Optional.ofNullable(idList).map(trIds -> trIds.removeIf(id -> !availableTps.get().contains(id)))
.orElse(false);
}
return true;
}
This code is indeed extremely silly. I know why - there's a somewhat common, extremely misguided movement around. This movement makes claims that are generally interpreted as 'write it 'functional' and then it is just better'.
That interpretation is obvious horse exhaust. It's just not true.
We can hold a debate on who is to blame for this - is it folks hearing the arguments / reading the blogposts and drawing the wrong conclusions, or is it the 'functional fanfolks' fanning the flames, so to speak, making ridiculous claims that do not hold up?
Point is: This code is using functional style when it is utterly inappropriate to do so and it has turned into a right mess as a result. The code is definitely bad; the author of this code is not a great programmer, but perhaps most of the blame goes to the functional evangelistst. At any rate, it's very difficult to read; no wonder you're having a hard time figuring out what this stuff does.
The fundamental issue
The fundamental issue is that this functional style strongly likes being a side-effect free process: You start with some data, then the functional pipeline (a chain of stream map, orElse, etc operations) produces some new result, and then you do something with that. Nothing within the pipeline should be changing anything, it's just all in service of calculating new things.
Both of your methods fail to do so properly - the return value of the 'pipeline' is ignored in both of them, it's all about the side effects.
You don't want this: The primary point of the pipelines is that they can skip steps, and will aggressively do so if they think they can, and the pipeline assumes no side-effects, so it makes wrong calls.
That orElse is not actually optional - it doesn't seem to do anything, except: It forces the pipeline to run, except the spec doesn't quite guarantee that it will, so this code is in that sense flat out broken, too.
These methods also take in Optional as an argument type which is completely wrong. Optional is okay as a return value for a functional pipeline (such as Stream's own max() etc methods). It's debatable as a return value anywhere else, and it's flat out silly and a style error so bad you should configure your linter to aggressively flag it as not suitable for production code if they show up in a field declaration or as a method argument.
So get rid of that too.
Let's break down what these methods do
Both of them will call map on an Optional. An optional is either 'NONE', which is like null (as in, there is no value), or it is a SOME, which means there is exactly one value.
Both of your methods invoke map on an optional. This operation more or less boils down, in these specific methods, as:
If the optional is NONE, do nothing, silently. Otherwise, perform the operation in the parens.
Thus, to get rid of the Optional in the argument of your first method, just remove that, and then update the calling code so that it decides what to do in case of no value, instead of this pair of methods (which decided: If passing in an optional.NONE, silently do nothing. "Silently do nothing" is an extremely stupid default behaviour mode, which is a large part of why Optional is not great). Clearly it has an Optional from somewhere - either it made it (with e.g. Optional.ofNullable in which case undo that too, or it got one from elsewhere, for example because it does a stream operation and that returned an optional, in which case, replace:
Optional<Map<String, JSONArrayCache>> optional = ...;
removeDiscontinued(thatOptionalThing, availableTps);
with:
optional.map(v -> removeDiscontinued(v, availableTps));
or perhaps simply:
if (optional.isPresent()) {
removeDiscontinued(optional.get(), availableTps);
} else {
code to run otherwise
}
If you don't see how it could be null, great! Optional is significantly worse than NullPointerException in many cases, and so it is here as well: You do NOT want your code to silently do nothing when some value is absent in a place where the programmer of said code wasn't aware of that possibility - an exception is vastly superior: You then know there is a problem, and the exception tells you where. In contrast to the 'silently do not do anything' approach, where it's much harder to tell something is off, and once you realize something is off, you have no idea where to look. Takes literally hundreds of times longer to find the problem.
Thus, then just go with:
removeDiscontinued(thatOptionalThing.get(), availableTps);
which will NPE if the unexpected happens, which is good.
The methods themselves
Get rid of those pipelines, functional is not the right approach here, as you're only interested in the side effects:
private void removeDiscontinued(Map<String, JSONArrayCache> dptCache, Lazy<Set<String>> availableTps) {
Set<String> keys = dptCache.keySet();
if (availableTps.get().size() > 0) {
keys.removeIf(id -> availableTps.get().contains(id));
}
}
That's it - that's all you need, that's what that code does in a very weird, sloppy, borderline broken way.
Specifically:
That boolean return value is just a red herring - the author needed that code to return something so that they could use it as argument in their map operation. The value is completely meaningless. If a styleguide that promises: "Your code will be better if you write it using this style" ends up with extremely confusing pointless variables whose values are irrelevant, get rid of the style guide, I think.
The ofNullable wrap is pointless: That method is private and its only caller cannot possibly pass null there, unless dptCache is an instance of some bizarro broken implementation of the Map interface that deigns to return null when its keySet() method is invoked: If that's happening, definitely fix the problem at the source, don't work around it in your codebase, no sane java reader would expect .keySet to return null there. That ofNullable is just making this stuff hard to read, it doesn't do anything here.
Note that the if (availableTps.get().size() > 0) check is just an optimization. You can leave it out if you want. That optimization isn't going to have any impact unless that dptCache object is a large map (thousands of keys at least).
Are there any performance or memory differences between the two snippets below? I tried to profile them using visualvm (is that even the right tool for the job?) but didn't notice a difference, probably due to the code not really doing anything.
Does the compiler optimize both snippets down to the same bytecode? Is one preferable over the other for style reasons?
boolean valid = loadConfig();
if (valid) {
// OK
} else {
// Problem
}
versus
if (loadConfig()) {
// OK
} else {
// Problem
}
The real answer here: it doesn't even matter so much what javap will tell you how the corresponding bytecode looks like!
If that piece of code is executed like "once"; then the difference between those two options would be in the range of nanoseconds (if at all).
If that piece of code is executed like "zillions of times" (often enough to "matter"); then the JIT will kick in. And the JIT will optimize that bytecode into machine code; very much dependent on a lot of information gathered by the JIT at runtime.
Long story short: you are spending time on a detail so subtle that it doesn't matter in practical reality.
What matters in practical reality: the quality of your source code. In that sense: pick that option that "reads" the best; given your context.
Given the comment: I think in the end, this is (almost) a pure style question. Using the first way it might be easier to trace information (assuming the variable isn't boolean, but more complex). In that sense: there is no "inherently" better version. Of course: option 2 comes with one line less; uses one variable less; and typically: when one option is as readable as another; and one of the two is shorter ... then I would prefer the shorter version.
If you are going to use the variable only once then the compiler/optimizer will resolve the explicit declaration.
Another thing is the code quality. There is a very similar rule in sonarqube that describes this case too:
Local Variables should not be declared and then immediately returned or thrown
Declaring a variable only to immediately return or throw it is a bad practice.
Some developers argue that the practice improves code readability, because it enables them to explicitly name what is being returned. However, this variable is an internal implementation detail that is not exposed to the callers of the method. The method name should be sufficient for callers to know exactly what will be returned.
https://jira.sonarsource.com/browse/RSPEC-1488
I would like to know which one is good. I am writing a for loop. In the condition part I am using str.length(). I wonder is this a good idea. I can also assign the value to an integer variable and use it in the loop.
Which one is the suitable/better way?
If you use str.length() more than once or twice in the code, it's logical to extract it to a local var simply for brevity's sake. As for performance, it will most probably be exactly the same because the JIT compiler will inline that call, so the native code will be as if you have used a local variable.
There is no distinct downside to calling a function in the loop condition expression in the sense that "you really should never do it". You want to watch out when calling functions that have side effects, but even that can be acceptable in some circumstances.
There are three major reasons for moving function calls out of the loop (including the loop condition expressions):
Performance. The function may (depending on the JIT compiler) get called for every iteration of the loop, which costs you execution time. Particularly if the function's code has a higher order of complexity than O(1) after the first execution, this will increase the execution time. By how much depends entirely on exactly what the function in question does and how it is implemented.
Side effects. If the function has any side effects, those may (will) be executed repeatedly. This might be exactly what you want, but you need to be aware of it. A side effect is basically something that is observable outside of the function that is being called; for example, disk or network I/O are often considered to be side effects. A function that simply performs calculations on already available data is generally a pure function.
Code clarity. Admittedly str.length() isn't very long, but if you have a complex calculation based around a function call in the loop conditional, code clarity can very easily suffer. For this reason it may be advantageous to move the loop termination condition calculation out of the loop condition expression itself. Beware of awakening the sleeping beast, however; make very sure that the refactored code actually is more readable.
For str.length() it doesn't really matter unless you are really after the last bit of performance you can get, particularly as as has been pointed out by other answerers, String#length() is an O(1) complexity operation. Especially in the general case, if you need the additional performance, consider introducing a variable to hold the result of the function call and comparing against that rather than making the function call repeatedly.
Personally, I'd consider code clarity before worrying about micro-optimizations like exactly where to place a specific function call. But if you have everything else down and still need to ooze a little bit more performance out of the code, moving the function call out of the condition expression and using a local variable (preferably of a primitive type) is something worth considering. Chances are, though, that if you are worried about that, you'll see bigger gains by considering a different algorithm. (Do you really need to iterate over the string the way you are doing? Is there no other way to do what you are after?)
It usually doesn't matter. Use whichever makes your code clearer.
If a value is going to be used more than once, then there are two advantages to assigning it to a local variable:
You can give the variable a good name, which makes your code easier to read an understand
You can sometimes avoid a small amount of overhead by calling the method only once. This helps performance (although the difference is often too small to be noticeable - if in doubt you should benchmark)
Note: This advice only applies to pure functions. You need to be much more careful if the function has side effects, or might return a different value each time (like Math.random()) - in these cases you need to think much more carefully about the effect of multiple function calls.
Calling length costs O(1) since the length is stored as a member - It's a constant operation, don't waste your time thinking about complexity and performance of this thing.
there are no difference at all between the two
But suppose if the str.length changes then in the for loop you need to manualy change the value
for example
String str="hi";
so in the for loop you write this way
for int i=0;i<str.length();i++)
{
}
or
for int i=0;i<2;i++)
{
}
Now suppose you want to change the str String str="hi1";
so in the for loop
for int i=0;i<3;i++)
{
}
So I would suggest you to go for str.length()
If you use str.length always this will evaluated. It is better to assign this value to variable and use that in for loop.
for(int i=0; i<str.length;i++){ // str.length always evaluvated
}
int k=str.length; // only one time evaluvated
for(int i=0;i<k;i++){
}
If you are concern about performance you may use second approach.
If you are using str.length() in the code more than one time then you need to assign it to another variable and use it. Otherwise you can use str.length() itself.
Reason for need
When we call a method, each time the current position is stored in a DS (heap/stack) and go to the corresponding called method and make their operations
And come back and from the DS retrieve the current position and do the normal operations.
That is actually happening. So when we do it so many times in a program it will cause the above mentioned scenario for several times.
Therefore we need to create a local variable and assign into it and use where ever need in the program.
Would there be any performance differences between these two chunks?
public void doSomething(Supertype input)
{
Subtype foo = (Subtype)input;
foo.methodA();
foo.methodB();
}
vs.
public void doSomething(Supertype input)
{
((Subtype)input).methodA();
((Subtype)input).methodB();
}
Any other considerations or recommendations between these two?
Well, the compiled code probably includes the cast twice in the second case - so in theory it's doing the same work twice. However, it's very possible that a smart JIT will work out that you're doing the same cast on the same value, so it can cache the result. But it is having to do work at least once - after all, it needs to make a decision as to whether to allow the cast to succeed, or throw an exception.
As ever, you should test and profile your code if you care about the performance - but I'd personally use the first form anyway, just because it looks more readable to me.
Yes. Checks must be done with each cast along with the actual mechanism of casting, so casting multiple times will cost more than casting once. However, that's the type of thing that the compiler would likely optimize away. It can clearly see that input hasn't changed its type since the last cast and should be able to avoid multiple casts - or at least avoid some of the casting checks.
In any case, if you're really that worried about efficiency, I'd wonder whether Java is the language that you should be using.
Personally, I'd say to use the first one. Not only is it more readable, but it makes it easier to change the type later. You'll only have to change it in one place instead of every time that you call a function on that variable.
I agree with Jon's comment, do it once, but for what it's worth in the general question of "is casting expensive", from what I remember: Java 1.4 improved this noticeably with Java 5 making casts extremely inexpensive. Unless you are writing a game engine, I don't know if it's something to fret about anymore. I'd worry more about auto-boxing/unboxing and hidden object creation instead.
Acording to this article, there is a cost associated with casting.
Please note that the article is from 1999 and it is up to the reader to decide if the information is still trustworthy!
In the first case :
Subtype foo = (Subtype)input;
it is determined at compile time, so no cost at runtime.
In the second case :
((Subtype)input).methodA();
it is determined at run time because compiler will not know. The jvm has to check if it can converted to a reference of Subtype and if not throw ClassCastException etc. So there will be some cost.