Taking a snapshot of optimized JVM runtime

Taking a snapshot of optimized JVM runtime - java

I know that the JVM can do some pretty serious optimizations at runtime, especially in -server mode. Of course, it takes a little while for the JVM to settle down and reach peak performance. Is there any way to take a snapshot of those optimizations so they can be applied immediately the next time you run your app?
"Hey JVM! Great job optimizing my code. Could you write that down for me for later?"

Basically not yet with Sun's VM, but they have it in mind.
See various postings/comments under here:
http://blogs.oracle.com/fatcatair/category/Java
(Sorry: I can't find quite the right one about retaining stats over restart for immediate C1 compilation of known-hot-at-startup methods.)
But I don't know where all this stuff is right now.
Note that optimisations appropriate in steady-state may well not be appropriate at start-up and might indeed reduce start-up performance, and indeed two runs may not have the same hotspots...

Perhaps this might help: http://wikis.sun.com/display/HotSpotInternals/PrintAssembly.

Related

Can I force the JVM to natively compile a given method?

I have a performance-critical method called often when my app starts up. Eventually, it gets JIT-compiled, but not after some noticeable time being run in the interpreter.
Is there any way I can tell the JVM that I want this method compiled right from the start (without tweaking other internals with stuff like -XX:CompileThreshold)?

The only way I know of is the -Xcomp flag, but that is not generally advisable to use. It forces immediate JIT compilation of ALL classes and methods first time they are run. The downside is that you will see a performance decrease on initial startup (due to increased JIT activity). The other major limitation with this flag is that it appears to disable the incremental profiling-based optimization that JIT would normally do. In standard mixed mode, the JIT compiler can (and will) deoptimize and re-compile parts of the code continually based on profiling and run-time information collected. This allows it to "correct" faulty optimizations like boundary checks that were omitted but turned out to be needed, sub-optimal inlinings etc. -Xcomp disables the profiling-based optimization and depending on program, can cause significant performance losses overall for only a small or no real gain in startup, which is why it's not recommended to use.
Beyond to -Xcomp (which is pretty brutal) and -XX:CompileThreshold (which controls how many executions of a given method the JIT will run in intepreted mode to gather stats before compiling/optimizing it), there is also -Xbatch. This forces JIT compilation to the "foreground", essentially blocking calls to methods until it's been compiled, rather than compiling it in the background as it normally does.
You didn't specify which Java version you are using, but if Java 7 is an option for you, it introduces a new JIT model called "Tiered compilation" (activated with the -XX:+TieredCompilation switch). What tiered compilation does is that it allows an initial, smaller compilation pass on the first use of a method and than an additional, larger compilation/optimization later, based on collected profiling data. Sounds like it should be interesting to you.
It supposedly requires some additional tweaking and parameters/configurations, but I've not got around to checking it out further.

im not sure if it'll completely precompile the code, but you could add your class with the critical method to the JVM's shared data dump. see this question for more details.
also, have you considered JNI? if your method is very CPU intensive it might speed things up considerably.

Does attaching a profiler cause some things to run slower then others?

Is it possible that attaching a profiler to a JVM (let's say VisualVM) could make some methods run slower, while not effecting others and thus causing a skew in the results that makes it look like a certain piece of code is a hotspot when in fact it's not. I will ask specifically about reflection calls for an example. I'm running some code that shows a lot of time spent in Spring AOP calls (specifically invokeJoinpointUsingReflection) - which the author says runs fine in testing (using an in code microbenchmark) but when they profiled it showed this method to take longer then other non-reflection methods. (sorry if that' a little unclear) So it got my wondering if the profiler could really have this effect and lead a developer down a false trail. Feel free to answer with any examples, the reflection part is just my example.

Profilers regularly give mis-leading information, but in generally they are usually right. Where they tend to skew the result is in very simple methods which might be further optimised if profiling wasn't enabled.
If in doubt I suggest you use another profiler, such as YourKit (evalation version should be fine) It has more light weight recording, but can have the same issues.

Heisenberg famously observed that collecting information from a system always disturbs it, so you can't get an undisturbed observation. (Thus the software term, "Heisenbug"). Yes, collecting profiling information can cause the actual performance to be changed in ways that will misdirect you.
Whether that is true in a significant way for your particular JVM or profiler, and how much disturbance occurs, is a matter of engineering.

Most profilers are sample based, and thus the more data you collect, the more accurate the results are. As far as I know, there is no bias for or against methods written purely in Java.

Certain profilers require a calibration step, e.g. NetBeans and VisualVM. You might verify the vintage and settings for your chosen profiler.

Performance in Java through code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
First of all I should mention that I'm aware of the fact that performance optimizations can be very project specific. I'm mostly not facing these special issues right now. I'm facing a bunch of performance issues with the JVM itself.
I wonder now:
which code-optimization make sense
from a compiler perspective: for
example to support the garbage
collector I declared variables as
final - very much following PMD's
suggestions here from Eclipse.
what best practices there are for: vmargs,
heap and other stuff passed to the
JVM for initialization. How do I get
the right values here? Is there any
formula or is it try and error?
Java automates a lot, does many optimization on byte-code level and stuff. However I think most of that must be planed by a developer in order to work.
So how do you speed up your programs in Java? :)

Which code-optimization make sense from a compiler perspective: for example to support the garbage collector I declared variables as final - very much following PMD's suggestions here from Eclipse.
Assuming you are talking about potential micro-optimizations you can make to your code, the answer is pretty much none. The best way to increase your application performance is to run a profiler to figure out where the performance bottlenecks are, then figure out if there is anything you can do to speed them up.
All of the classic tricks like declaring classes, variables and methods final, reorganizing loops, changing primitive types are pretty much a waste of effort in most cases. The JIT compiler can typically do a much better job than you can. For example, recent JIT compilers will analyse all loaded classes to figure out which method calls are not subject to overloading, without you declaring the classes or methods as final. It will then use a quicker call sequence, or even inline the method body.
Indeed, the Sun experts say that some programmer attempts at optimization fail because they actually make it harder for JIT compiler to apply the optimizations it knows about.
On the other hand, higher level algorithmic optimizations are definitely worthwhile ... provided that your profiler tells you that your application is spending a significant amount of time in that area of the code.
Using arrays instead of collections can be a worthwhile optimization in unusual cases, and in rare cases using object pools might be too. But these optimizations 1) will make your code more complicated and bug prone and 2) can slow your application down if used inappropriately. These kinds of optimizations should only be tried as a last resort. For example, if your profiling says that such and such a HashMap<Integer,Integer> is a CPU bottleneck or a memory hog, then it is a better idea to look for an existing specialized Map or Map-like library class than to try and implement the map yourself using arrays. In other words, optimize at the high level.
If you spend long enough or your application is small enough, careful micro-optimization will probably give you a faster application (on a given JVM version / hardware platform) than just relying on the JIT compiler. If you are implementing a smallish application to do large-scale number crunching in Java, the pay-off of micro-optimization may well be considerable. But this is clearly not a typical case! For typical Java applications, the effort is large enough and the performance difference is small enough that micro-optimization is not worthwhile.
(Incidentally, I don't see how declaring a variable can make any possible difference to GC performance. The GC has to trace a variable every time it is encountered whether or not it is final. Besides, it is an open secret that final variables can actually change under certain circumstances, so it would be unsafe for the GC to assume that they don't. Unsafe as in "creates a dangling pointer resulting in a JVM crash".)

I see this a lot. The sequence generally goes:
Thinking performance is about compiler optimizations, big-O, and so on.
Designing software using the recommended ideas, lots of classes, two-way linked lists, trees with pointers up, down, left, and right, hash sets, dictionaries, properties that invoke other properties, event handlers that invoke other event handlers, XML writing, parsing, zipping and unzipping, etc. etc.
Since all those data structures were like O(1) and the compiler's optimizing its guts out, the app should be "efficient", right? Well, then, what's that little voice telling one that the startup is slow, the shutdown is slow, the loading and unloading could be faster, and why is the UI so sluggish?
Hand it off to the "performance expert". With luck, that person finds out, all this stuff is done in the recommended way, but that's why it's cranking its heart out. It's doing all that stuff because it's the recommended way to do things, not because it's needed.
With luck, one has the chance to re-engineer some of that stuff, to make it simple, and gradually remove the "bottlenecks". I say, "with luck" because often it's just not possible, so development relies on the next generation of faster processors to take away the pain.
This happens in every language, but moreso in Java, C#, C++, where abstraction has been carried to extremes. So by all means, be aware of best practices, but also understand what simple software is. Typically it consists of saving those best practices for the circumstances that really need them.

which code-optimization make sense
from a compiler perspective?
All the ones that a compiler can't reason about, because a compiler is very dumb and Java doesn't have "design by contract" (which, hence, cannot help the dumb compiler reason about your code).
For example if you're crunching data and using use int[] or long[] arrays, you may know something about your data that is IMPOSSIBLE for the compiler to figure out and you may use low-level bit-packing/compacting to improve the locality of reference in that part of your code.
Been there, done that, saw gigantic speedup. So much for the "super smart compiler".
This is just one example. There are a huge number of cases like this.
Remember that a compiler is really stupid: it cannot know that if ( Math.abs(42) > 0 ) will always return true.
This should give some food for thoughts to people that think that those compilers are "smart" (things would be different here if Java had DbC, but it doesn't).
what best practices there are for:
vmargs, heap and other stuff passed to
the JVM for initialization. How do I
get the right values here? Is there
any formula or is it try and error?
The real answer is: there shouldn't be. Sadly the situation is so pathetic that such low-level hackery is needed, due to serious failure on Java's part. Oh, one more "tiny" detail: playing with VM fine-tuning only works for server-side app. It doesn't work for desktop apps.
Anyone who has worked on Java desktop applications installed on hundreds or thousands of machines, on various OSes knows all too well what the issue is: full GC pauses making your app look like it's broken. The Apple VM on OS X 10.4 comes to mind for it's particularly afwul, but ALL the JVMs are subject to that issue.
What is worse: it is impossible to "fine tune" the GC's parameters across different OSes / VMs / memory configuration when your application is going to be run on hundreds/thousands of different configuration.
Anyone disputing that: please tell me how you "fine tune" your app knowing that it is going to be run both on octo-cores Mac loaded with 20 GB of ram (I've got users with such setups) and old OS X 10.4 PowerBook that have 768 MB of ram. Please?
But it is not bad: you should not, in the first place, have to be concerned with super-low-level detail like GC "fine tuning". The very fact that this is hinted to is a testimony to one area where Java has a major issue.
Java fans will keep on saying "the GC is super fast, object creation is cheap" while this is blatantly wrong. There's a reason with Trove' TIntIntHashMap runs around circles an HashMap<Integer,Integer>.
There's also a reason why at every new JVM release you'll get countless release notes explaining why -XXGCHyperSteroidMultiTopNotch offers better performance than the last "big JVM param" that every cool Java programmer had to know: maybe the JVM wasn't that great at GC'ing after all.
So to answer your question: how do you speed up Java programs? Easy, do like what the Trove guys did: stop needlessly creating gigantic amount of objects and stop needlessly auto(un)boxing primitives because they will kill your app's perfs.
A TIntIntHashMap OWNS the default HashMap<Integer,Integer> for a reason: for the same reason my apps are now much faster than before.
I stopped believing in crap like "object creation costs nothing" and "the GC is super-optimized, don't worry about it".
I'm using Java to crunch data (I know, I'm a bit crazy) and the one thing that made my app faster was to stop believing all the propaganda surrounding the "cheap object creation" and "amazingly fast GC".
The truth is: INSTEAD OF TRYING TO FINE-TUNE YOUR GC SETTINGS, STOP CREATING THAT MUCH GARBAGE IN THE FIRST PLACE. This can be stated this way: if changing the GC settings radically changes the way your app run, it may be time to wonder if all the needless junk objects your creating are really needed.
Oh, you know what, I'm betting we'll see more and more release notes explaining why Java version x.y.z's GC is faster than version x.y.z-1's GC ;)

Generally there are two kinds of performance optimizations you need to do with Java:
Algorithmic optimization. Choose an algorithm which behaves like you need to. For instance, a simple algorithm may perform best for small datasets, but the overhead of preparing a smarter algorithm may first pay off for much larger datasets.
Bottleneck identification. Here you need to be familiar with a profiler that can tell you what the problem is (humans always guess wrong) - memory leak?, slow method? etc... A good one to start with is VisualVM which can attach to a running program, and is available in the latest Sun JDK. When you know the problem, you can fix it.

Todays JVM's are surprisingly robust when it comes to performance. Any microoptimizations you can apply will, in practically all cases, have only very minor impact on performance. This is easy to understand if you take a look on how typical language constructs (e.g. FOR vs WHILE) translate to bytecode - they are almost indistinguishable.
Making methods/variables final has absolutely no impact on performance on a decent JIT'd JVM. The JIT will keep track of which methods are really polymorphic and optimize away the dynamic dispatch where possible. Static methods can still be faster, since they don't have a this-reference = one less local variable (which at the same time, limits their application). Most efficient micro optimizations are not so much Java specific, for example code with lots of conditional statements can become very slow due to branch mispredictions by the processor. Sometimes conditionals can be replaced by other, sequential code flow constructs (often at the cost of readability), reducing the number of mispredicted branches (and this applies to all languages that somehow compile to native code).
Note that profilers tend to inflate the time spent in short, frequently called methods. This is due to the fact that profilers need to instrument the code to keep track of invocations - this can interfere with the JIT's ability to inline those methods (and the instrumentation overhead becomes significantly larger than the time spent actually executing the methods body). Manual inlining, while apparently very performance boosting under a profiler has in most cases no effect under "real world" conditions. Don't rely purely on the profilers results, verify that optimizations you make have real impact under real runtime conditions, too.
Notable performance boosts can only be expected from changes that reduce the amount of work done, more cache friendly data layout or superior algorytms. Java partially limits your possibilities for cache friendly data layouts, since you have no control where the parts (arrays/objects) that form your data structure will be located in memory in relation to each other. Still, there are plenty of opportunities where choosing the right data structure for the job can make a huge difference (e.g. ArrayList vs LinkedList).
There is little you can do to aid the garbage collector. However, a point worth noting is, while object allocation in Java is very very fast, there is still the cost of object initialization (which is mostly under your control). Poor performance of applications that creating lots of (short lived) objects is more likely to be attributed to poor cache utilization than to the garbage collectors work.
Different applications types require different optimization strategies - so before asking about specific optimizations, find out where your application really spends its time.

If you are experiencing performance issues with your application, you should seriously consider trying some profiling (eg: hprof) to see whether the problem is algorithmic in nature, and also checking the GC performance logging (eg: -verbose:gc) to see if you could benefit from tuning your JVM GC options.

It is worth noting that the compiler does next to no optimisations, and the JVM doesn't optimise at the byte code level either. Most of the optimisations are performed by the JIT in the JVM and it optmises how the code is converted to native machine code.
The best way to optimise your code is to use a profiler which measures how much time and resources your application is using when you give it a realistic data set. Without this information you are just guessing and you can change alot of code where it really, really doesn't matter and find you have added bugs in the process.
Many come to the conclusion that its never worth optmising you code, even counter productive as it can waste time and introduce bugs and I would say that is true for 95+% of your code. However, with aprofiler you can measure the critical pieces of code and optmise the <5% worth optimising and done carefully, you won't get too many issues from trying to optimise your code.

It's hard to answer this too thoroughly because you haven't even mentioned what sort of project you're talking about. Is it a desktop application? A server-side application?
Desktop applications favor application startup time, so the HotSpot client VM is a good start. Client applications don't necessarily need all of their heap space all the time, so a good balance between starting heap and max heap is useful. (Like, maybe -Xms128m -Xmx512m)
Server applications favor overall throughput, which is something the HotSpot server VM is tuned for. You should always allocate the min and max heap sizes the same on a server application. There is an added cost at the system level to it having to malloc() and free() during garbage collection. Use something like -Xms1024m -Xmx1024m.
There are several different garbage collectors also, which are tuned to different application types.
Take a read through the Java SE 6 Performance White Paper if you want more info on the garbage collector and other performance related items from Java 6.

Profile CPU usage in Java on a Mac

I'm looking for a way to measure the cpu usage for different methods in my java code. I understand that this can be achieved using JNI and C, but I wouldn't know where to start...
The purpose of this is to compare different algorithms, and provide qualitative results.

Probably the most common way is to use sampling. The JVM provides facilities to ask it the current stack trace of all threads (or ones you're interested in), along with how much CPU they've consumed. So you periodically do this. On each call, if a thread is inside the method you're interested in, then assume that it's spent half of the reported CPU time since the last poll inside that method.
If this method sounds appropriate, a little while back I wrote some material on the Java 5 profiling facilities that might help you.
Java 5 also provides an Instrumentation framework, by which you can doctor classes as they're being loaded in to include calls on the entry and exit to your given method, so you can measure CPU usage just inside that method. However, this is a little more complex to program because you need to doctor the actual class binaries as they're being loaded.

I don't think you can really identify CPU usage down to the method level with the current range of profilers. For most methods it's pretty obvious (if the method is compute-bound and single-threaded then it'll use 100% of CPU subject to allocation by the OS).
You may want to identify hot-spots though (methods consuming more CPU than you'd anticipate - or possibly less?) and I'd recommend looking at YourKit for an easy-to-configure profiler.
Failing that, take a look at the JVM Profiling Interface (JVMPI), which may give you some further pointers.

Sun VisualVM is integrated in recent JDK's and its profiling capabilities are explained here. Note that it seems to require a pretty up to date version of OSX if I understand this correctly.
Netbeans has basically the same profiling machinery on board, I don't know if that's of any help.

If you want to use one of already available profiling tools, then you can try Shark

Have a look at the OperatingSystemMXBean perhaps you could look at something before and after your method
long startProcessCpuTime = operatingSystemMXBean.getProcessCpuTime();
long endProcessCpuTime = operatingSystemMXBean.getProcessCpuTime();
Java6 only

I I'm not sure if this is what you want but I've used jrat for profiling in the past with decent results.

Check JaMon. Is very easy to use.

Call your methods in separate threads and measure the delta of time before and after execution of given procedure.
To measure the time of a process call:
ManagementFactory.getThreadMXBean().setThreadCpuTimeEnabled(true);
long threadTime = ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime();

Jython Optimizations

Are their any ways to optimize Jython without resorting to profiling or significantly changing the code?
Specifically are there any flags that can be passed to the compiler, or code hints in tight loops.

No flags, no code hints. You can optimize by tweaking your code much as you would for any other Python implementation (hoisting, etc), but profiling helps by telling you where it's worth your while to expend such effort -- so, sure, you can optimize "without resorting to profiling" (and the code changes to do so may well be deemed to be not significant), but you're unlikely to guess right about where your time and energy are best spent, while profiling helps you determine exactly that.

Jython compiler does not offer lots of optimization choices. However, since the Java virtual machine (java) and perhaps compiler (javac) are getting invoked in the back end or at runtime, you should take a look at them.
Java has different runtime switches to use depending on whether you are going to launch it as a server process, client process, etc. You can also tell how much memory to allocate too.

I know this is an old question, but I'm just putting this for completeness.
You can use:-J-server flag to launch Jython in the Java server mode, which can help speed up the hot loops. (JVM will look to aggressively optimize, but might slow up the start up time)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.