I've always thought that using a "if" is way better (in performance terms) than catching an exception. For example, doing this:
User u = Users.getUser("Michael Jordan");
if(u!=null)
System.out.println(u.getAge());
vs.
User u = Users.getUser("Michael Jordan");
try{
System.out.println(u.getAge());
}catch(Exception e){
//Do something with the exception
}
If we benchmark this, it's quite obvious that the first snippet is faster than the second one. That's what i've always thought.
But, yesterday, a guy told me something like this:
Have you ever considered what happens
in thousands of executions of your
programs? Every time your execution
goes through your "if" you've a little
(real little, but still something)
performance cost. With exceptions it
doesn't happens. Becouse it could
never arise.
To make it clear: thousands of times executing an "if" vs one exception catch.
I think it has sense, but don't have any proof.
Can you help me?
Thanks!!
No. Not once the JIT kicks in. Tell him to read up on trace caches.
A dynamic trace ("trace path") contains only instructions whose results are actually used, and eliminates instructions following taken branches (since they are not executed); a dynamic trace can be a concatenation of multiple basic blocks. This allows the instruction fetch unit of a processor to fetch several basic blocks, without having to worry about branches in the execution flow.
Basically, if the same branches are taken every time through a loop body, then that body ends up as a single trace in the trace cache. The processor will not incur the cost of fetching the extra branch instructions (unless that pushes the basic block over the limit of what can be stored in the trace cache, unlikely) and does not have to wait for the result of the test before starting to execute following instructions.
Do not ever, EVER, sacrifice your code quality for performance until you've proven beyond a reasonable doubt it's actually called for. I highly doubt that the performance of if() statement will ever become the bottleneck of you program. If it does, you should re-write it in C. In Java land, 99% of the time the bottleneck is the I/O - disk and/or network.
Except for maybe machine exceptions, any exception you've caught has been preceded by some kind of if conditional. Even a NullPointerException was thrown following a if(something == null) down in the JVM. Don't worry about the performance of an if statement. Don't worry about the performance of try/catch either since presumably an error should occur much less frequently than successful execution. I don't think you need to optimize how fast your errors are thrown. :-)
You have no proof because you have no case. It is certainly an interesting abstract question if you have a piece of code that was a bottleneck that you could eliminate an if for because one side of the condition was very rare, then it might mean something. But in the abstract, doing that with exceptions is making code harder to read and maintain, so it should not be done without a real world problem in front of you. In the real world, it may throw the exception 50% of the time. You don't know until you have a real-world scenario.
Related
When I was learning C, I was taught to do stuff, say, if I wanted to loop through a something strlen(string) times, I should save that value in an 'auxiliary' variable, say count and put that in the for condition clause instead of the strlen, as I'd save the need of processing that many times.
Now, when I started learning Java, I noticed this is quite not the norm. I've seen lots of code and programmers doing just what I was told not to do in C.
What's the reason for this? Are they trading efficiency for 'readibility'? Or does the compiler manage to fix that?
E: This is NOT a duplicate to the linked question. I'm not simply asking about string length, mine is a more general question.
In the old times, every function call was expensive, compilers were dumb, usable profilers yet to come, and computers slow. This way C macros and other terrible things were born. Java is not that old.
Efficiency is important, but the impact of most program parts on efficiency is very small. But reading code still needs programmers time and this is much more costly than CPU. So we'd better optimize for readability most of the time and care about speed just in the most important places.
A local variable can make the code simpler, when it avoids repetitions of complicated expressions - this happens sometimes. It can make it faster, when it avoids expensive computation which the compiler can't do - this happens rather rarely. When neither condition is met, it's just a wasted line, so why bother?
Which is faster:
if(this.foo != 1234)
this.foo = 1234;
or
this.foo = 1234;
is the penalty of write high enough that one should check value before writing or is it faster to just write?
wouldn't having a branch cause possible mispredictions and screwup cpu pipleine? but what is the field is volatile, with writes having higher cost than reads?
yes, it is easy to say that in isolation these operations themselves are 'free' or benchmark it but that is not an answer.
There is a nice example illustrating this dilemma in the very recent talk by Sergey Kuksenko about hardware counters (slides 45-49), where the right answer for the same code depends on the data size! The idea is that "compare and set" approach cause more branch misses and loads, but less stores and L1 store misses. The difference is subtle, and I can't even rationalize why one factors overweight different on small data sizes, but become less signigicant on large data sizes.
So, measure, don't guess.
Both those operations are free: they really take almost no time!
Now if this code is in a loop, you should definitely favor the second option as it will minimize branch mispredictions.
Otherwise, what matters here is what makes the code the more readable. And again in my opinion, the second option is clearer.
Also as mentionned in the comments, assigning is an atomic operation which makes it thread safe. An other advantage for the second option.
They are not free. They cost time and space. And branching in a tight loop can actually be very costly because of branch prediction (kid's these days and their modern CPUs) . See Mysticial's answer.
But mostly it's pointless. Either just set to what it should be or throw when it's not what you expect.
Any code you make me read had better have a good reason to exist.
What I think you are trying to do is express what you expect it's value to be and assert that it should be that. Without context I can't tell you if you should throw when your expectations are violated or simply assign to assert what it should be. But making your expectations clear and enforcing them one way or another is certainly worth a few CPU cycles. I'd rather you were a little slower than quickly giving me garbage in and garbage out.
I believe this is actually a general question rather than java-related because of low level of this operations (CPU, not JVM level).
First of all, let's see what the choice is. On one hand we have reading from memory + comparison + (optionally) writing to memory, on other hand - writing to memory.
Memory access is much more expensive than registry operations (operations on data, already loaded to CPU). Therefore, choise is read + (sometimes) write vs write.
What is more expensive, read or write? Short answer - write. Long answer - write, but difference is probably small and depends on system caching strategy. It is not easy to explain in a few words, you can learn more about caching in the beautiful book "Operating Systems" by William Stallings.
I think in practice you can ignore distinction between read and write operations and just write without a test. That is because (returning back to Java) your object with all it's fields will be in cache for this moment.
Another thing to consider is branch prediction - others already mentioned that this is the reason to just write value without test too.
It depends on what you're really interested in.
If this is a plain old vanilla program, not only does the fetch/compare/branch of the first scheme take extra time, but it's extra code and complexity, and even if the first scheme did actually save a miniscule amount of time (instead of costing time) it wouldn't be worth doing it.
However, there are scenarios where it would be different. In an intensely multi-threaded environment with multiple processors modifying shared storage can be expensive, since changes to that storage need to be propagated to other processors. In such an environment it could be well worth it to spend a few extra instructions to avoid "dirtying" cache.
Debugging performance problems using a standard debugger is almost hopeless since the level of detail is too high. Other ways are using a profiler, but they seldom give me good information, especially when there is GUI and background threads involved, as I never know whether the user was actually waiting for the computer, or not. A different way is simply using Control + C and see where in the code it stops.
What I really would like is to have Fast Forward, Play, Pause and Rewind functionality combined with some visual repressentation of the code. This means that I could set the code to run on Fast Forward until I navigate the GUI to the critical spot. Then I set the code to be run in slow mode, while I get some visual repressentation of, which lines of are being executed (possibly some kind of zoomed out view of the code). I could for example set the execution speed to something like 0.0001x. I believe that I would get a very good visualization this way of whether the problem is inside a specific module, or maybe in the communication between modules.
Does this exist? My specific need is in Python, but I would be interested in seeing such functionality in any language.
The "Fast Forward to critical spot" function already exists in any debugger, it's called a "breakpoint". There are indeed debuggers that can slow down execution, but that will not help you debug performance problems, because it doesn't slow down the computer. The processor and disk and memory is still exactly as slow as before, all that happens is that the debugger inserts delays between each line of code. That means that every line of code suddenly take more or less the same time, which means that it hides any trace of where the performance problem is.
The only way to find the performance problems is to record every call done in the application and how long it took. This is what a profiler does. Indeed, using a profiler is tricky, but there probably isn't a better option. In theory you could record every call and the timing of every call, and then play that back and forwards with a rewind, but that would use an astonishing amount of memory, and it wouldn't actually tell you anything more than a profiler does (indeed, it would tell you less, as it would miss certain types of performance problems).
You should be able to, with the profiler, figure out what is taking a long time. Note that this can be both by certain function calls taking a long time because they do a lot of processing, or it can be system calls that take a long time becomes something (network/disk) is slow. Or it can be that a very fast call is called loads and loads of times. A profiler will help you figure this out. But it helps if you can turn the profiler on just at the critical section (reduces noise) and if you can run that critical section many times (improves accuracy).
The methods you're describing, and many of the comments, seem to me to be relatively weak probabilistic attempts to understand the performance impact. Profilers do work perfectly well for GUIs and other idle-thread programs, though it takes a little practice to read them. I think your best bet is there, though -- learn to use the profiler better, that's what it's for.
The specific use you describe would simply be to attach the profiler but don't record yet. Navigate the GUI to the point in question. Hit the profiler record button, do the action, and stop the recording. View the results. Fix. Do it again.
I assume there is a phase in the app's execution that takes too long - i.e. it makes you wait.
I assume what you really want is to see what you could change to make it faster.
A technique that works is random-pausing.
You run the app under the debugger, and in the part of its execution that makes you wait, pause it, and examine the call stack. Do this a few times.
Here are some ways your program could be spending more time than necessary.
I/O that you didn't know about and didn't really need.
Allocating and releasing objects very frequently.
Runaway notifications on data structures.
others too numerous to mention...
No matter what it is, when it is happening, an examination of the call stack will show it.
Once you know what it is, you can find a better way to do it, or maybe not do it at all.
If the program is taking 5 seconds when it could take 1 second, then the probability you will see the problem on each pause is 4/5. In fact, any function call you see on more than one stack sample, if you could avoid doing it, will give you a significant speedup.
AND, nearly every possible bottleneck can be found this way.
Don't think about function timings or how many times they are called. Look for lines of code that show up often on the stack, that you don't need.
Example Added: If you take 5 samples of the stack, and there's a line of code appearing on 2 of them, then it is responsible for about 2/5 = 40% of the time, give or take. You don't know the precise percent, and you don't need to know.
(Technically, on average it is (2+1)/(5+2) = 3/7 = 43%. Not bad, and you know exactly where it is.)
I'm profiling a program using sampling profiling in YourKit and JProfiler, and also "manually" (I launch it and press Ctrl-Break several times to get thread dumps).
All three methods give me extremely strange results: some tens of percents of time spent in a 3-line method that does not even do any allocation or synchronization and doesn't have loops etc. Moreover, after I made this method into a NOP and even removed its invocation completely, the observable program performance didn't change at all (although it got a negligible memory leak, since it was a method for freeing a cheap resource).
I'm thinking that this might be because of the constraints that JVM puts on the moments at which a thread's stacktrace may be taken, and it somehow turns out that in my program it is exactly the moments where this method is invoked, although there is absolutely nothing special about it or the context in which it is invoked.
What can be the explanation for this phenomenon?
What are the aforementioned constraints?
What further measurements can I take to clarify the situation?
To my minds, these results only show that this method gets called a huge number of times. Since its code is quite small, and it may be called as an invocation tree leaf, its impact on your profiling results seems neglectable. However, I had many time that kind of weird results.
Some 3rd party libraries cause the heap dumps to go completely haywire due to unexpected usage patterns, for example if cglib is used, it will mask away the actual cause of the issues and instead show a lot of Proxy objects (if I remember correctly) filling up the VM instead.
So in short, code generation and reflection may cause the stats to go wrong.
When you did the Ctrl-Break several times and got thread dumps, I'm curious what you saw. The call stacks are, to my mind, the most useful information. If your 3-liner is on the stack a large % of time, then you can see why by looking at where it is called from, and where that is called from, etc. Those call sites are just as responsible for the time being spent as the 3-liner is.
If the stack traces seem to make no sense, it may be because they are being delayed until after something completes. If that is so, look up the stack to see what has just completed, because the break could have occurred within that.
I've got a question related to java performance and method execution.
In my app there are a lot of place where I have to validate some parameter, so I've written a Validator class and put all the validation methods into it. Here is an example:
public class NumberValidator {
public static short shortValidator(String s) throws ValidationException{
try{
short sh = Short.parseShort(s);
if(sh < 1){
throw new ValidationException();
}
return sh;
}catch (Exception e) {
throw new ValidationException("The parameter is wrong!");
}
}
...
But I'm thinking about that. Is this OK? It's OO and modularized, but - considering performance - is it a good idea?
What if I had awful lot of invocation at the same time? The snippet above is short and fast, but there are some methods that take more time.
What happens when there are a lot of calling to a static method or an instance method in the same class and the method is not synchronized? All the calling methods have to fall in line and the JVM executes them sequentially?
Is it a good idea to have some class that are identical to the above-mentioned and randomly call their identical methods? I think it is not, because "Don't repeat yourself " and "Duplication is Evil" etc. But what about performance?
Thanks is advance.
On reentrancy of your method: if it's static, it doesn't hold any state, so it's perfectly safe.
On performance: look at your use cases. Since you're validating Strings, I can only assume you are validating user input. In any case, the number of simultaneous users of your system is not probable to incur a performance bottleneck.
Just two comments:
1) Factoring out validation into methods may in fact improve performance a little. As far as I know, the JIT compiler is designed to detect frequent method invocations. The validation methods are therefore good candidates for JIT optimization.
2) Try avoiding 'catch(Exception e)'. This is not recommended since you are capturing all kinds of RuntimeException as well. If you have a bug in one of the non-trivial validations, you may throw a ValidationException that hides a bug in the code.
Not sure what are your concern.
Since you mentioned the method not beeing synchronized I suppose you have concurrent invocations from multiple threads.
And since the method is not sychronized any invocation can be executed concurrently without problems.
For sure you won't get any performance improvements by copy and paste this method in the calling classes. May be you would reduce performance because your code size will increase and will waste space in che processor cache, but for such a short method I think it's a trascurable effect.
Are you experiencing performance problems?
Make the code easy to maintain first, and if it's not living up to expectations, it'll be easy to read and easy to refactor.
If you make the code optimized for speed on every choice, you'll often wind up with something unreadable, and have to start from scratch to fix simple bugs.
That said, your method is a static, and will only need to be initialized once. That is the fast version. :-)
I'm suspicious of this code, not so much on performance grounds as that I don't think it is successfully abstracting out anything that deserves abstraction.
If it is for checking user input, it replaces reasonable error messages like 'maximum number of widgets allowed is 9999' with 'ValidationException'. And if you add something like arguments (or try/catch clauses) to get the messages right in context, then almost certainly the required call-site code is more complex, harder to write and maintain, than the straightforward way of doing things.
If it is for internal sanity checking, you may well start to lose meaningful performance, and certainly greatly increase complexity and bugginess, if you are passing arguments round as strings all over the place and continually parsing and validating them.