How to Optimize JVM & GC through Load Testing

How to Optimize JVM & GC through Load Testing - java

Edit: Of the several extremely generous and helpful responses this question has already received, it is obvious to me that I didn't make an important part of this question clear when I asked it earlier this morning. The answers I've received so far are more about optimizing applications & removing bottlenecks at the code level. I am aware that this is way more important than trying to get an extra 3- or 5% out of your JVM!
This question assumes we've already done just about everything we could to optimize our application architecture at the code level. Now we want more, and the next place to look is at the JVM level and garbage collection; I've changed the question title accordingly. Thanks again!
We've got a "pipeline" style backend architecture where messages pass from one component to the next, with each component performing different processes at each step of the way.
Components live inside of WAR files deployed on Tomcat servers. Altogether we have about 20 components in the pipeline, living on 5 different Tomcat servers (I didn't choose the architecture or the distribution of WARs for each server). We use Apache Camel to create all the routes between the components, effectively forming the "connective tissue" of the pipeline.
I've been asked to optimize the GC and general performance of each server running a JVM (5 in all). I've spent several days now reading up on GC and performance tuning, and have a pretty good handle on what each of the different JVM options do, how the heap is organized, and how most of the options affect the overall performance of the JVM.
My thinking is that the best way to optimize each JVM is not to optimize it as a standalone. I "feel" (that's about as far as I can justify it!) that trying to optimize each JVM locally without considering how it will interact with the other JVMs on other servers (both upstream and downstream) will not produce a globally-optimized solution.
To me it makes sense to optimize the entire pipeline as a whole. So my first question is: does SO agree, and if not, why?
To do this, I was thinking about creating a LoadTester that would generate input and feed it to the first endpoint in the pipeline. This LoadTester might also have a separate "Monitor Thread" that would check the last endpoint for throughput. I could then do all sorts of processing where we check for average end-to-end travel time for messages, maximum throughput before faulting, etc.
The LoadTester would generate the same pattern of input messages over and over again. The variable in this experiment would be the JVM options passed to each Tomcat server's startup options. I have a list of about 20 different options I'd like to pass the JVMs, and figured I could just keep tweaking their values until I found near-optimal performance.
This may not be the absolute best way to do this, but it's the best way I could design with what time I've been given for this project (about a week).
Second question: what does SO think about this setup? How would SO create an "optimizing solution" any differently?
Last but not least, I'm curious as to what sort of metrics I could use as a basis of measure and comparison. I can really only think of:
Find the JVM option config that produces the fastest average end-to-end travel time for messages
Find the JVM option config that produces the largest volume throughput without crashing any of the servers
Any others? Any reasons why those 2 are bad?
After reviewing the play I could see how this might be construed as a monolithic question, but really what I'm asking is how SO would optimize JVMs running along a pipeline, and to feel free to cut-and-dice my solution however you like it.
Thanks in advance!

Let me go up a level and say I did something similar in a large C app many years ago.
It consisted of a number of processes exchanging messages across interconnected hardware.
I came up with a two-step approach.
Step 1. Within each process, I used this technique to get rid of any wasteful activities.
That took a few days of sampling, revising code, and repeating.
The idea is there is a chain, and the first thing to do is remove inefficiences from the links.
Step 2. This part is laborious but effective: Generate time-stamped logs of message traffic.
Merge them together into a common timeline.
Look carefully at specific message sequences.
What you're looking for is
Was the message necessary, or was it a retransmission resulting from a timeout or other avoidable reason?
When was the message sent, received, and acted upon? If there is a significant delay between being received and acted upon, what is the reason for that delay? Was it just a matter of being "in line" behind another process that was doing I/O, for example? Could it have been fixed with different process priorities?
This activity took me about a day to generate logs, combine them, find a speedup opportunity, and revise code.
At this rate, after about 10 working days, I had found/fixed a number of problems, and improved the speed dramatically.
What is common about these two steps is I'm not measuring or trying to get "statistics".
If something is spending too much time, that very fact exposes it to a dilligent programmer taking a close meticulous look at what is happening.

I would start with finding the optimum recommended jvm values specified for your hardware/software mix OR just start with what is already out there.
Next I would make sure that I have monitoring in place to measure Business throughputs and SLAs
I would not try to tweak just the GC if there is no reason to.
First you will need to find what are the major bottlenecks in your application. If it is I/O bound, SQL bound etc.
Key here is to MEASURE, IDENTIFY TOP bottlenecks, FIX them and conduct another iteration with a repeatable load.
HTH...

The biggest trick I am aware of when running multiple JVMs on the same machine is limiting the number of core the GC will use. Otherwise what can happen when one JVM does a full GC is it will attempt to grab every core, impacting the performance of all the JVMs even though they are not performing a GC. One suggestion is to limit the number of gc threads to 5/8 or less. (I can't remember where it is written)
I think you should test the system as a whole to ensure you have realistic interaction between the services. However, I would assume you may need to tune each service differently.
Changing command line options is useful if you cannot change the code. However if you profile and optimise the code you can make far for difference than tuning the GC parameters (in which cause you need to change them again)
For this reason, I would only change the command line parameters as a last resort, after you there is little improvement which can be made in the code of the application.

Related

What to consider when writing a java program that is supposed to run 'forever'

I have to write a program that is thought to run 'forever' , meaning that it won't terminate regularly. Up until now I always wrote programs that would run and be terminated at the end of the day. The program has to do some synchronizations, pause for n minutes and than sync again.
AFAIK there should be no problem with my current implementation and it should theoretically run just fine, but I'm lacking any real-world experience.
So are there any 'patterns' or best practices for writing very robust and resource efficient java programs that have a very long runtime? What could be possible problems after for example a month/year of runtime?
Some background :
Java : 1.7 but compiled down to 1.5
OS : Windows (exact version is not certain yet)
Thanks in advance

Just a brain dump of all the things I've had to keep in mind when writing this kind of app.
Avoid Memory Leaks
I had an app that runs once at mid day, every day, and in that I had a FileWriter. I wasn't closing that properly, and then we started wondering why our virtual machine was going into melt down after a few weeks. Memory leaks can come in the form of anyhing really, with one of the most common examples being that you don't de-reference an object appropriately. For example, using a class's field as a method of temporary storage. Often the class persists, and so does the reference. This leaves you with objects, sitting in memory and doing nothing.
Use the right kind of Scheduler
I used a java Timer in that app, and later I learnt that it's better to use a ScheduledThreadPoolExecutor when another app was changing the System clock. So if you plan on keeping it completely Java based, I would strongly recommend using that over a Timer for all of the reasons detailed in this question.
Be mindful of memory usage and your environment
If your app is loading large amounts of data each and every day, and you have other apps running on the same server, you may want to be careful about the timing. For example, say at mid day, three of the apps run their scheduled operation, I would say running it at any other time would probably be a smart move. Be mindful of the environment in which you're executing your code in.
Error handling
You probably want to configure your app to let you know if something has gone wrong, without the app breaking down. If it's running at a certain time every few hours, that means people are probably depending on it, so I would have a function in your Java code that sends out an email to you, detailing the nature of the exception.
Make it configurable
Again, if it needs to run at various points in the day, you don't want to have to pull the thing down for a few hours to work out some minor changes to your code. Instead, port it into a java Properties file, or into an XML Config (or really, whatever). The advantage of this is that you can update your program and get it up and running before anyone really noticed the difference.
Be afraid of the static keyword
That bad boy will make objects persist, even when you destroy their parent reference. It is the mother of all memory leaks if you are not careful with it. It's fine for constants, and things that you know don't need to change and need to exist within the project to run well, but if you're using it for random values inside a project, you're going to quickly wonder why your app is crashing every few hours rather than syncing.
Props to #X86 for reminding me of that one.

Memory leaks are likely to be the biggest problem. Ensure that there are no long-term references held after an iteration of your logic. Even a relatively small object being referenced forever, will exhaust the memory eventually (and worse, it's going to be harder to detect during testing if the growth rate is 1GB/month). One approach that may help is using the snapshot functionality of profilers: take a snapshot during the pause, let the sync run a few times, and take another snapshot. Comparing these should show the delta between the synchronizations, which should hopefully be zero.
Cache maintenance is another issue. The overall size of a cache needs to be strictly limited (whereas often you can get away without in short-running programs, because everything seen will be small enough to not cause problems). Equally it's more important to do cache-invalidation properly - broadly speaking, everything that gets cached will become stale at some point while your program is still running, and you need to be able to detect this and take appropriate action. This can be tricky depending on where the golden source of the cached data is.
The last thing I'll mention is exception-handling. For short-running processes, it's often enough to simply let the process die when an exception is encountered, so the issue can be dealt with, and the app rerun. With a long-running process you'll likely need to be more defensive than this. Consider running parts of your program in threads, which can be restarted* if/when they fail. You may need a supervisor-type module, which checks that everything else is still heartbeating and reboots it if not. If appropriate to your structure, this is anecdotally a lot easier to achieve with actors-style libraries rather than Java's standard executors. And if it's at all possible, you may want to have hooks (perhaps exposed over JMX/MBeans) that let you modify the behaviour somewhat, to allow a short-term hack/workaround to be affected without having to bring the process down. Though this requires quite some amount of foresight to predict exactly what's going to go wrong in several months...
*or rather, the job can be restarted in another thread

java first encounter with heap space error server data logger

I built my first Java program which is built on top of the Interactive Brokers Java API. That may or may not be important. I just extended the main API classes with a couple new classes.
The program is making data queries to a remote server. When the server responds, I log the received data to a local MySQL data base. Once the program finishes logging the data, the program will make the next data request.
I am having a problem after leaving the program running for some time, after making a couple hundred server requests. I will see this error, then the program doesn't continue to execute:
java.lang.OutOfMemoryError: Java heap space
I did some research, and from what I read, I conclude that the program is creating many new variables, and not destroying old worthless ones. Since I am using Netbeans for development, I used the Netbeans profiler to inspect if this was the case. See the picture here:
After running the program for quite some time, more and more of the memory is used up by Byte. So it seems that my theory is still true.
I don't really know where to go from here. There is no reference to a class or specific variable, just a variable type. How can pinpoint where the problem is coming from?
UPDATE
I corrected a specific problem that was mentioned by BigMike in the comments. Previiously, I was creating many Statements in the JDBC MySQL Connector API, and I was calling .execute() to execute the statements, but I wasn't closing the statement with .close().
I made sure the add the statement.close() call after each execution, and the program runs much better now. By looking at the RAM usage for this program, it seems to solved the problem. I am also not seeing the Java heap space error anymore, which is nice.
Thanks!

It's very hard to say what might be wrong by simply that.
It might have to do with Streams that you are opening that aren't being closed when you no longer need them.
Double check methods that allocate resources (reading from files, database, etc), especially if they read data into streams, and make sure you close those streams in a finally clause.
Apart from that, you can try and profile what methods are being called more often, etc, to try and narrow down the problem to a specific part of your code.
I found a site with a reasonable explanation of how Garbage Collection works, and what can cause OutOfMemoryErrors:
http://www.kdgregory.com/index.php?page=java.outOfMemory
If you read through that, there's a specific reference to high allocation of Object[] and byte[], that might point you in the right direction.

Generally speaking, this comes about for one of two reasons:
There is a memory leak in the application, such that the application fails to release items for garbage collection, leading to the JVM running out of memory over time.
The application attempted a one-off operation that would require more memory than is available, leading to the JVM running out of memory due to the operation.
Since your output seems to indicate that the bulk of the memory is consumed by literally a million plus small byte arrays, my guess is that #1 is probably the culprit; however, to verify this, restart your application and watch it's memory consumption over time. It will bounce up and down, but really you only need to watch the trend of consumption. If the consumption average continues to climb over time, you have a memory leak.
To solve this issue, you typically need the source code, and need to find the parts of the code where the troubling objects are being created, used, and then "stored" far beyond the last time that they will ever be used. The solution is to correct the code to no longer store them. HashMaps, Lists, and other Collections are often accomplices in memory leak problems.
If you lack the source code, you can attempt to measure the trend of the memory consumption, and schedule shutdowns and restarts of the application to effectively "reset the clock" such that you choose your downtime instead of watching the application choose it for you.
If it is a one-off operation (not likely considering your data) then you won't see an upward trend in memory consumption until the triggering event occurs. In such a case, with access to the source code, you should protect your application from processing data that grows very far outside of normal operating parameters. For example, reading a message from the network typically takes only a few KB, but in exceptional circumstances a client might transmit forever. In such a case, kill the message processing and throw the message away with an error if you exceed a maximum message size limit of 10 MB.
Without access to the source code in the latter scenario, the only hope is to identify the incoming upset, hunt down the source of the errant transmission, and attempt to manipulate it to prevent the overload of output.
The variations on how to approach these techniques are vast, but now you have a few ideas.

Detecting and pinpointing performance regressions

Are there any known techniques (and resources related to them, like research papers or blog entries) which describe how do dynamically programatically detect the part of the code that caused a performance regression, and if possible, on the JVM or some other virtual machine environment (where techniques such as instrumentation can be applied relatively easy)?
In particular, when having a large codebase and a bigger number of committers to a project (like, for example, an OS, language or some framework), it is sometimes hard to find out the change that caused a performance regression. A paper such as this one goes a long way in describing how to detect performance regressions (e.g. in a certain snippet of code), but not how to dynamically find the piece of the code in the project that got changed by some commit and caused the performance regression.
I was thinking that this might be done by instrumenting pieces of the program to detect the exact method which causes the regression, or at least narrowing the range of possible causes of the performance regression.
Does anyone know about anything written about this, or any project using such performance regression detection techniques?
EDIT:
I was referring to something along these lines, but doing further analysis into the codebase itself.

Perhaps not entirely what you are asking, but on a project I've worked on with extreme performance requirements, we wrote performance tests using our unit testing framework, and glued them into our continuous integration environment.
This meant that every check-in, our CI server would run tests that validated we hadn't slowed down the functionality beyond our acceptable boundaries.
It wasn't perfect - but it did allow us to keep an eye on our key performance statistics over time, and it caught check-ins that affected the performance.
Defining "acceptable boundaries" for performance is more an art than a science - in our CI-driven tests, we took a fairly simple approach, based on the hardware specification; we would fail the build if the performance tests exceeded a response time of more than 1 second with 100 concurrent users. This caught a bunch of lowhanging fruit performance issues, and gave us a decent level of confidence on "production" hardware.
We explicitly didn't run these tests before check-in, as that would slow down the development cycle - forcing a developer to run through fairly long-running tests before checking in encourages them not to check in too often. We also weren't confident we'd get meaningful results without deploying to known hardware.

With tools like YourKit you can take a snapshot of the performance breakdown of a test or application. If you run the application again, you can compare performance breakdowns to find differences.
Performance profiling is more of an art than a science. I don't believe you will find a tool which tells you exactly what the problem is, you have to use your judgement.
For example, say you have a method which is taking much longer than it used to do. Is it because the method has changed or because it is being called a different way, or much more often. You have to use some judgement of your own.

JProfiler allows you to see list of instrumented methods which you can sort by average execution time, inherent time, number of invocations etc. I think if this information is saved over releases one can get some insight into regression. Offcourse the profiling data will not be accurate if the tests are not exactly same.

Some people are aware of a technique for finding (as opposed to measuring) the cause of excess time being taken.
It's simple, but it's very effective.
Essentially it is this:
If the code is slow it's because it's spending some fraction F (like 20%, 50%, or 90%) of its time doing something X unnecessary, in the sense that if you knew what it was, you'd blow it away, and save that fraction of time.
During the general time it's being slow, at any random nanosecond the probability that it's doing X is F.
So just drop in on it, a few times, and ask it what it's doing.
And ask it why it's doing it.
Typical apps are spending nearly all their time either waiting for some I/O to complete, or some library function to return.
If there is something in your program taking too much time (and there is), it is almost certainly one or a few function calls, that you will find on the call stack, being done for lousy reasons.
Here's more on that subject.

I'm asked to tune a long starting app into a short time period

I'm asked to shorten the start-up period of a long starting app, however I have also to obligate to my managers to the amount of time i will reduce the startup - something like 10-20 seconds.
As I'm new in my company I said I can obligate with timeframe of months (its a big server and I'm new and I plan to do lazy load + performance tuning).
That answer was not accepted I was required to do some kind of a cache to hold important data in another server and then when my server starts up it would reach all its data from that cache - I find it a kind of a workaround and I don't really like it.
do you like it?
What do you think I should do?
PS when I profiled the app I saw many small issues that make the start-up long (like 2 minutes) it would not be a short process to fix all and to make lazy load.
Any kind of suggestions would help.
Language is Java.
Thanks

Rule one of performance optimisation: measure it. Get hard figures. At each stage of optimisation measure the performance gain/loss/lack of change. You (and your managers) are not in a position to say that a particular optimisation will or will not work before you try it and measure it. You can always ask to test & measure a solution before implementing it.
Rule two of performance optimisation (or anything really): choose your battles. Please bear in mind that your managers may be very experienced with the system in question, and may know the correct solution already; there may be other things (politics) involved as well, so don't put your position at risk by butting heads at this point.

I agree with MatthieuF. The most important thing to do is to measure it. Then you need to analyze the measurements to see which parts are most costly and also which resource (memory, CPU, network, etc) is the bottleneck.
If you know these answers you can propose solutions. You might be able to create small tests (proof of concepts) of your solution so you can report back early to your managers.
There can be all kind of solutions for example simply buying more hardware might be the best way to go. It's also possible that buying more hardware will have no results and you need to make modifications. The modifications can be optimizing the software, the database or other software. It might be to choose better algorithms, to introduce caching (in expense of more memory usage) or introduce multi threading to take advantage of multiple CPU cores. You can also make modifications to the "surroundings" of your application such as the configuration/version of your operating system, Java virtual machine, application server, database server and others. All of these components have settings which can affect the performance.
Again, it's very important to measure, identify the problem, think of a solution, build solution (maybe in proof of concept) and measure if solution is working. Don't fall in to the trap of first choosing a solution without knowing the problem.

It sounds to me as if you've come in at a relatively junior position, and your managers don't (yet) trust your abilities and judgment.
I don't understand why they would want you to commit to a particular speed-up without knowing if it was achievable.
Maybe they really understand the code and its problems, and know that a certain level of speed-up is achievable. In this case, they should have a good idea how to do it ... so try and get them to tell you. Even if their ideas are not great, you will get credit for at least giving them a try.
Maybe they are just trying to apply pressure (or pass on pressure applied to them) in order to get you to work harder. In this case, I'd probably give them a worth-while but conservative estimate. Then spend some time investigating the problems more thoroughly. And if after a few days research you find that your "off the cuff" estimates are significantly off the mark, go back to the managers with a more accurate estimate.
On the technical side, a two minute start-up times sounds rather excessive to me. What is the application doing in all that time? Loading data structures from files or a database? Recalculating things? Profiling may help answer some of these questions, but you also need to understand the system's architecture to make sens of the profile stats.
Without knowing what the real issues are here, I'd suggest trying to get the service to become available early while doing some of the less critical initialization in the background, or lazily. (And your managers' idea of caching some important data may turn out to be a good one, if viewed in this light.) Alternatively, I'd see if it was feasible to implement a "hot standby" for the system, or replicate it in such a way that allowed you to reduce startup times.

Java performance Inconsistent

I have an interpreter written in Java. I am trying to test the performance results of various optimisations in the interpreter. To do this I parse the code and then repeatedly run the interpreter over the code, this continues until I get 5 runs which differ by a very small margin (0.1s in the times below), the mean is taken and printed. No I/O or randomness happens in the interpreter. If I run the interpreter again I am getting different run times:
91.8s
95.7s
93.8s
97.6s
94.6s
94.6s
107.4s
I have tried to no avail the server and client VM, the serial and parallel gc, large tables and windows and linux. These are on 1.6.0_14 JVM. The computer has no processes running in the background. So I asking what may be causing these large variations or how can I find out what is?
The actualy issue was caused because the program had to iterate to a fixed point solution and the values were stored in a hashset. The hashed values differed between runs, resulting in a different ordering which in turn led to a change in the amount of iterations needed to reach the solution.

"Wall clock time" is rarely a good measurement for benchmarking. A modern OS is extremely unlikely to "[have] no processes running in the background" -- for all you know, it could be writing dirty block buffers to disk, because it's decided that there's no other contention.
Instead, I recommend using ThreadMXBean to track actual CPU consumption.

Your variations don't look that large. It's simply the nature of the beast that there are other things running outside of your direct control, both in the OS and the JVM, and you're not likely to get exact results.
Things that could affect runtime:
if your test runs are creating objects (may be invisible to you, within library calls, etc) then your repeats may trigger a GC
Different GC algorithms, specifications will react differently, different thresholds for incremental gc. You could try to run a System.gc() before every run, although the JVM is not guaranteed to GC when you call that (although it always has when I've played with it).T Depending on the size of your test, and how many iterations you're running, this may be an unpleasantly (and nearly uselessly) slow thing to wait for.
Are you doing any sort of randomization within your tests? e.g. if you're testing integers, values < |128| may be handled slightly differently in memory.
Ultimately I don't think it's possible to get an exact figure, probably the best you can do is an average figure around the cluster of results.

The garbage collection may be responsible. Even though your logic is the same, it may be that the GC logic is being scheduled on external clock/events.
But I don't know that much about JVMs GC implementation.

This seems like a significant variation to me, I would try running with -verbosegc.
You should be able to get the variation to much less than a second if your process has no IO, output or network of any significance.
I suggest profiling your application, there is highly likely to be significant saving if you haven't done this already.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.