Assume you have quite long method with around 200 lines of very time sensitive code. Is it possible that extracting some parts of code to separate methods will slow down execution?
Most probably, you'll get a speedup. The problem is that optimizing a 200 lines beast is hard. Actually, Hotspot gives it up when the method is too long. Once I achieved a speedup factor of 2 by simply splitting a long method.
Short methods are fine, and they'll be inlined as needed. So the method call overhead gets minimized. By inlining, Hotspot may re-create your original method (improbable due to its excessive length) or create multiple methods, where some of them may contain code not present in the original method.
The answer is "yes, it may get slower." The problem is that the chosen inlining may be suboptimal. However, it's very improbable, and I'd expect a speedup instead.
The overhead is negligible, the compiler will inline the methods to execute them.
EDIT:
similar question
I don't think so.
Yes some calls would be added and some stackframes but that doesn't cost a lot of time and depending on your compiler it might even optimize the code in such a way that there is basically no difference in the version with one method compared to the one with many.
The loss of readability and reusability you would get by implementing all in one method is definitely not worth the (if at all existing) performance increase.
It is important that the factored out methods will be either declared private or final. The just in time compiler in the JVM will then inline everything, which means there will be a single big method executed as result.
However, always benchmark your code, when modifying it.
Related
When I was learning C, I was taught to do stuff, say, if I wanted to loop through a something strlen(string) times, I should save that value in an 'auxiliary' variable, say count and put that in the for condition clause instead of the strlen, as I'd save the need of processing that many times.
Now, when I started learning Java, I noticed this is quite not the norm. I've seen lots of code and programmers doing just what I was told not to do in C.
What's the reason for this? Are they trading efficiency for 'readibility'? Or does the compiler manage to fix that?
E: This is NOT a duplicate to the linked question. I'm not simply asking about string length, mine is a more general question.
In the old times, every function call was expensive, compilers were dumb, usable profilers yet to come, and computers slow. This way C macros and other terrible things were born. Java is not that old.
Efficiency is important, but the impact of most program parts on efficiency is very small. But reading code still needs programmers time and this is much more costly than CPU. So we'd better optimize for readability most of the time and care about speed just in the most important places.
A local variable can make the code simpler, when it avoids repetitions of complicated expressions - this happens sometimes. It can make it faster, when it avoids expensive computation which the compiler can't do - this happens rather rarely. When neither condition is met, it's just a wasted line, so why bother?
Consider a simple following code in Java:
void func(String test)
{
if(str.length() > 0)
{
//do something
}
}
Does executing str.length() > 0 means that every time this function is called, 4 bytes of memory will be allocated to store 0 integer value ?
The memory needed to run this function (including the 0) would be part of the compiled program (.class / .jar/.apk), and has nothing to do with how many times the function is run. Even if the function is inlined, only the code size grows based on how many different locations the function is called, and there is NO memory allocation in run time, while the code runs.
Meanwhile 2 comments
There are far bigger issues with hardcoding.
I doubt length > 0 counts as hardcoding in any but the strictest sense.
If you write clean, clear and simple code the JIT will optimise the code best in 95+% of cases. If you attempt to out smart it, it is far more likely you will make the code worse, not better.
There are been some notable exceptions to this rule, but these tend to only last a few years. For example Locks in Java 5.0 were much faster than synchronized, however in Java 7 synchronized can be much faster.
When considering performance you should look at the behaviour of the whole system, not individual lines of code or even individual libraries. Failing to do this can mean you spend time worrying about something which makes no difference while a much more important thing is being ignored.
I have seen whole teams work on optimising a piece of a system for years when they could have made the whole thing faster just be changing a configuration setting. This is because they limited their view to the code they were writing, and didn't consider how they used the systems they connected to. Imagine wasting years of work when they could have seen more of a speed up with something trivial, and make sure this doesn't happen to you. ;)
No memory allocations are done when this code executes.
Its not looking anything serious in the above code as memory allocation point of view.
if(str.length() > 0) { }
Here its genuine requirement for comparision so it wont be consider as hard coding values.
If you are very strict towards memory utilization then always pick exact required data type.
This code is method local, hence after executing memory will reclaim automatically.
AS it is inside method.
Yes, but destroyed immediately after exiting the function. int is primitive type. Primitive types are considered the fastest in Java. So, I think it won't cost much.
I have endeavored to concurrently implement Dixon's algorithm, with poor results. For small numbers <~40 bits, it operates in about twice the time as other implementations in my class, and after about 40 bits, takes far longer.
I've done everything I can, but I fear it has some fatal issue that I can't find.
My code (fairly lengthy) is located here. Ideally the algorithm would work faster than non-concurrent implementations.
Why would you think it would be faster? Spinning up a thread and adding synchronized calls are HUGE time syncs. If you can't avoid the synchronized keyword, I highly recommend a single-threaded solution.
You may be able to avoid them in various ways--for instance by ensuring that a given variable is only written by one thread even if read by others or by acting like a functional language and making all your variables final using Recursion for variable storage (Iffy, hard to imagine this would speed anything).
If you really need to be fast, however, I did find some very counter-intuitive things out recently from my own attempt at finding a speedy solution...
Static methods didn't help over actual class instances.
Breaking the code down into smaller classes and methods actually INCREASED speed.
Final methods helped more than I would have thought they would
Once I noticed that adding a method call helped speed things along
Don't stress over one-time class allocations or data allocations but avoid allocating objects in loops (This one is obvious but I think it's the most critical)
What I've been able to intuit is that the compiler is extremely smart at optimizing and is tuned to optimize "Ideal" java code. Static methods are no where near ideal--they are kind of a counter-pattern.. one of the most.
I suggest you write the clearest, best OO code you can that actually runs correctly as a reference--then time it and start attempting tweaks to speed it up.
I was using newInstance() in a sort-of performance-critical area of my code.
The method signature is:
<T extends SomethingElse> T create(Class<T> clasz)
I pass Something.class as argument, I get an instance of SomethingElse, created with newInstance().
Today I got back to clear this performance TODO from the list, so I ran a couple of tests of new operator versus newInstance(). I was very surprised with the performance penalty of newInstance().
I wrote a little about it, here: http://biasedbit.com/blog/new-vs-newinstance/
(Sorry about the self promotion... I'd place the text here, but this question would grow out of proportions.)
What I'd love to know is why does the -server flag provide such a performance boost when the number of objects being created grows largely and not for "low" values, say, 100 or 1000.
I did learn my lesson with the whole reflections thing, this is just curiosity about the optimisations the JVM performs in runtime, especially with the -server flag. Also, if I'm doing something wrong in the test, I'd appreciate your feedback!
Edit: I've added a warmup phase and the results are now more stable. Thanks for the input!
I did learn my lesson with the whole reflections thing, this is just curiosity about the optimisations the JVM performs in runtime, especially with the -server flag. Also, if I'm doing something wrong in the test, I'd appreciate your feedback!
Answering the second part first, your code seems to be making the classic mistake for Java micro-benchmarks and not "warming up" the JVM before making your measurements. Your application needs to run the method that does the test a few times, ignoring the first few iterations ... at least until the numbers stabilize. The reason for this is that a JVM has to do a lot of work to get an application started; e.g. loading classes and (when they've run a few times) JIT compiling the methods where significant application time is being spent.
I think the reason that "-server" is making a difference is that (among other things) it changes the rules that determine when to JIT compile. The assumption is that for a "server" it is better to JIT sooner this gives slower startup but better throughput. (By contrast a "client" is tuned to defer JIT compiling so that the user gets a working GUI sooner.)
IMHO the performance penalty comes from the class loading mechanism.
In case of reflection all the security mechanism are used and thus the creation penalty is higher.
In case of new operator the classes are already loaded in VM (checked and prepared by the default classloader) and the actual instantiation is a cheap process.
The -server parameter does a lot of JIT optimizations for the frequently used code. You might want to try it also with -batch parameter that will trade off the startup-time but then the code will run faster.
Among other things, the garbage collection profile for the -server option has significantly different survivor space sizing defaults.
On closer reading, I see that your example is a micro-benchmark and the results may be counter-intuitive. For example, on my platform, repeated calls to newInstance() are effectively optimized away during repeated runs, making newInstance() appear 12.5 times faster than new.
I've got a question related to java performance and method execution.
In my app there are a lot of place where I have to validate some parameter, so I've written a Validator class and put all the validation methods into it. Here is an example:
public class NumberValidator {
public static short shortValidator(String s) throws ValidationException{
try{
short sh = Short.parseShort(s);
if(sh < 1){
throw new ValidationException();
}
return sh;
}catch (Exception e) {
throw new ValidationException("The parameter is wrong!");
}
}
...
But I'm thinking about that. Is this OK? It's OO and modularized, but - considering performance - is it a good idea?
What if I had awful lot of invocation at the same time? The snippet above is short and fast, but there are some methods that take more time.
What happens when there are a lot of calling to a static method or an instance method in the same class and the method is not synchronized? All the calling methods have to fall in line and the JVM executes them sequentially?
Is it a good idea to have some class that are identical to the above-mentioned and randomly call their identical methods? I think it is not, because "Don't repeat yourself " and "Duplication is Evil" etc. But what about performance?
Thanks is advance.
On reentrancy of your method: if it's static, it doesn't hold any state, so it's perfectly safe.
On performance: look at your use cases. Since you're validating Strings, I can only assume you are validating user input. In any case, the number of simultaneous users of your system is not probable to incur a performance bottleneck.
Just two comments:
1) Factoring out validation into methods may in fact improve performance a little. As far as I know, the JIT compiler is designed to detect frequent method invocations. The validation methods are therefore good candidates for JIT optimization.
2) Try avoiding 'catch(Exception e)'. This is not recommended since you are capturing all kinds of RuntimeException as well. If you have a bug in one of the non-trivial validations, you may throw a ValidationException that hides a bug in the code.
Not sure what are your concern.
Since you mentioned the method not beeing synchronized I suppose you have concurrent invocations from multiple threads.
And since the method is not sychronized any invocation can be executed concurrently without problems.
For sure you won't get any performance improvements by copy and paste this method in the calling classes. May be you would reduce performance because your code size will increase and will waste space in che processor cache, but for such a short method I think it's a trascurable effect.
Are you experiencing performance problems?
Make the code easy to maintain first, and if it's not living up to expectations, it'll be easy to read and easy to refactor.
If you make the code optimized for speed on every choice, you'll often wind up with something unreadable, and have to start from scratch to fix simple bugs.
That said, your method is a static, and will only need to be initialized once. That is the fast version. :-)
I'm suspicious of this code, not so much on performance grounds as that I don't think it is successfully abstracting out anything that deserves abstraction.
If it is for checking user input, it replaces reasonable error messages like 'maximum number of widgets allowed is 9999' with 'ValidationException'. And if you add something like arguments (or try/catch clauses) to get the messages right in context, then almost certainly the required call-site code is more complex, harder to write and maintain, than the straightforward way of doing things.
If it is for internal sanity checking, you may well start to lose meaningful performance, and certainly greatly increase complexity and bugginess, if you are passing arguments round as strings all over the place and continually parsing and validating them.