Java Identifying Source of Latency

Java Identifying Source of Latency - java

I am working on a java process which is conceptually rather simple. It is a single thread constantly fetching data from various sources and making decisions based on these. I have recently noticed a suspicious delay between 2 log lines, where I would not expect much processing to happen between the 2. (Tens of milis delay vs expectation of a 1 to a few milis)
Since this suspicious delay is not always there, my first thought was that I did a poor job at minimizing the need for garbage collection, causing the JVM to pause execution at unwanted times.
While I still believe I did a poor job with that, it doesn't seem to be the cause. I have added the following JVM parameters: -Xlog:gc*=info,safepoint::timemillis,level,tags
and I see no pause between my suspicious log lines. Could there be other JVM pauses that these JVM params would not reveal?
Anyways, would java pros have any recommendations to try and efficiently track down the source of this latency? Any (preferably free) tools I could use to monitor and understand what's happening?
Environment info: Linux 3.10, java 11. The process in question is running on an isolated core, other than that I have not done much tuning.

depending on where the information is coming from, it may be latency in the grabbing of the information. like if it has to reach out to a server, then the connecting and querying of the server may be introducing that latency. or maybe you are accessing your disk too quickly, and you are limited by the speed of your drive. besides that, I am not sure. maybe add in some of these.
long time = System.currentTimeMillis();
foo();
System.err.println(System.currentTimeMillis() - time + "ms");

Related

What to consider when writing a java program that is supposed to run 'forever'

I have to write a program that is thought to run 'forever' , meaning that it won't terminate regularly. Up until now I always wrote programs that would run and be terminated at the end of the day. The program has to do some synchronizations, pause for n minutes and than sync again.
AFAIK there should be no problem with my current implementation and it should theoretically run just fine, but I'm lacking any real-world experience.
So are there any 'patterns' or best practices for writing very robust and resource efficient java programs that have a very long runtime? What could be possible problems after for example a month/year of runtime?
Some background :
Java : 1.7 but compiled down to 1.5
OS : Windows (exact version is not certain yet)
Thanks in advance

Just a brain dump of all the things I've had to keep in mind when writing this kind of app.
Avoid Memory Leaks
I had an app that runs once at mid day, every day, and in that I had a FileWriter. I wasn't closing that properly, and then we started wondering why our virtual machine was going into melt down after a few weeks. Memory leaks can come in the form of anyhing really, with one of the most common examples being that you don't de-reference an object appropriately. For example, using a class's field as a method of temporary storage. Often the class persists, and so does the reference. This leaves you with objects, sitting in memory and doing nothing.
Use the right kind of Scheduler
I used a java Timer in that app, and later I learnt that it's better to use a ScheduledThreadPoolExecutor when another app was changing the System clock. So if you plan on keeping it completely Java based, I would strongly recommend using that over a Timer for all of the reasons detailed in this question.
Be mindful of memory usage and your environment
If your app is loading large amounts of data each and every day, and you have other apps running on the same server, you may want to be careful about the timing. For example, say at mid day, three of the apps run their scheduled operation, I would say running it at any other time would probably be a smart move. Be mindful of the environment in which you're executing your code in.
Error handling
You probably want to configure your app to let you know if something has gone wrong, without the app breaking down. If it's running at a certain time every few hours, that means people are probably depending on it, so I would have a function in your Java code that sends out an email to you, detailing the nature of the exception.
Make it configurable
Again, if it needs to run at various points in the day, you don't want to have to pull the thing down for a few hours to work out some minor changes to your code. Instead, port it into a java Properties file, or into an XML Config (or really, whatever). The advantage of this is that you can update your program and get it up and running before anyone really noticed the difference.
Be afraid of the static keyword
That bad boy will make objects persist, even when you destroy their parent reference. It is the mother of all memory leaks if you are not careful with it. It's fine for constants, and things that you know don't need to change and need to exist within the project to run well, but if you're using it for random values inside a project, you're going to quickly wonder why your app is crashing every few hours rather than syncing.
Props to #X86 for reminding me of that one.

Memory leaks are likely to be the biggest problem. Ensure that there are no long-term references held after an iteration of your logic. Even a relatively small object being referenced forever, will exhaust the memory eventually (and worse, it's going to be harder to detect during testing if the growth rate is 1GB/month). One approach that may help is using the snapshot functionality of profilers: take a snapshot during the pause, let the sync run a few times, and take another snapshot. Comparing these should show the delta between the synchronizations, which should hopefully be zero.
Cache maintenance is another issue. The overall size of a cache needs to be strictly limited (whereas often you can get away without in short-running programs, because everything seen will be small enough to not cause problems). Equally it's more important to do cache-invalidation properly - broadly speaking, everything that gets cached will become stale at some point while your program is still running, and you need to be able to detect this and take appropriate action. This can be tricky depending on where the golden source of the cached data is.
The last thing I'll mention is exception-handling. For short-running processes, it's often enough to simply let the process die when an exception is encountered, so the issue can be dealt with, and the app rerun. With a long-running process you'll likely need to be more defensive than this. Consider running parts of your program in threads, which can be restarted* if/when they fail. You may need a supervisor-type module, which checks that everything else is still heartbeating and reboots it if not. If appropriate to your structure, this is anecdotally a lot easier to achieve with actors-style libraries rather than Java's standard executors. And if it's at all possible, you may want to have hooks (perhaps exposed over JMX/MBeans) that let you modify the behaviour somewhat, to allow a short-term hack/workaround to be affected without having to bring the process down. Though this requires quite some amount of foresight to predict exactly what's going to go wrong in several months...
*or rather, the job can be restarted in another thread

Throttling CPU from within Java

I have seen many questions in this (and others) forum with the same title, but none of them seemed to address exactly my problem. This is it:
I have got a JVM that eats all the CPU on the machine that hosts it. I would like to throttle it, however I cannot rely on any throttling tool/technique external to Java as I cannot make assumptions as to where this Vm will be run. Thus, for instance, I cannot use processor affinity because if the VM runs on a Mac the OS won't make process affinity available.
What I would need is an indication as to whether means exist within Java to ensure the thread does not take the full CPU.
I would like to point straightaway that I cannot use techniques based on alternating process executions and pauses, as suggested in some forums, because the thread needs to generate values continuously.
Ideally I'd like some mean for, for instance, setting some VM or thread priority, or cap in some way the percentage of CPU consumed.
Any help would be much appreciated.

What I would need is an indication as to whether means exist within Java to ensure the thread does not take the full CPU.
There is no way that I know of to do this within Java except for tuning your application to use less CPU.
You could put some Thread.sleep(...); calls in your calculation methods. A profiler would help with showing you the hot loops/methods/etc..
Forking fewer threads would also affect the CPU used. Moving to fixed sized thread-pools or lowering the number of threads in your pools.
It may not be CPU that is the problem but other resources. Watch your IO bandwidth for example. Slowing down your network or disk reads/writes might restore your server to proper operation.
From outside of the JVM you could use the ~unix nice command to affect the priority of the running JVM to not dominate the system. This will give it CPU if available but will let other applications get more of the CPU.

I take it you want something more reliable than setting the threads' priorities?
If you want throttled execution of some code that is constantly generating values, you need to look into chunking up the work the thread(s) do, and coding in your own timer. For example, the java.util.Timer allows for scheduling execution at a fixed rate.
Any other technique will still consume as much CPU as is available (1 core per thread, assuming no locks preventing concurrent execution) when the scheduler doesn't have other tasks to prioritize ahead of yours.

The detail is simply that you said "must generate values continuously", and if that, to the extreme, is true, then CPU saturation is actually the goal.
But, if you define "continuously" as X values per second, then there is room to work.
Because then you can run your process at 100% CPU, measure the number of values over time, and if you find that it's generates more values than necessary (more than X/sec), then you can now insert pauses in to the process as appropriate until the value rate reaches your desired goal.
The plan being to continually monitor and adjust the pauses to maintain your value rate over time. Then your process will take as much CPU as necessary to meet your values/sec goal.
Addenda:
If you have a benchmark of values/sec that you are happy with, then interjecting the sleeps will give "all the priority necessary" to the other applications, but still maintain your throughput. If, on the other hand, you don't have any solid requirement, that is the requirement is "run as fast as possible when nothing else is running, with no actual requirement for ANY results if some other process dominates the CPU", then that's truly a kernel issue of the host OS, and not something the JVM has any direct, portable mechanism to address.
On Unix systems, you have the nice(1) command to adjust process (not thread) priority, and Windows has their own mechanism. With these commands, you can knock the priority of your Java process to just above "idle" (the default "process" that always runs when nothing else is running). But it's platform specific, as this is an inherently platform specific problem. This may well be managed through platform specific startup scripts that launch your Java program (or even a Java launcher that detects the platform and "does the right thing" before executing your actual code).
Most systems will allow you to lower your own process priorities, but few will let you raise unless you're an admin/superuser or have whatever the appropriate role is for your host OS.

Check to see if you have any "tight loops" in your code.
while (true) {
if (object.checkSomething()) {
...
}
}
If you do, then you are burning the CPU cycles on millions of checks that are probably not that time critical. The JVM will oblige (because it doesn't know if the check is "important" or not) and you'll get 100% CPU.
If you find such loops, rewrite them like so
while (true) {
if (object.checkSomething()) {
...
}
try {
Thread.sleep(100);
} catch (InterruptedException e) {
// purposefully do nothing
}
}
and the sleeping will voluntarily release the CPU within the loop, preventing it from running too quickly (and checking the condition too many times).

Really interesting thread. I found out Java does not provide means for doing what I want to do, and the only way to do this is from outside the JVM.
I ended up using nice to alter the scheduling priority in my test (Linux) environment, will still need to find something similar for WIn-based OSs.
Everyone's intervention has been much appreciated.

How to Optimize JVM & GC through Load Testing

Edit: Of the several extremely generous and helpful responses this question has already received, it is obvious to me that I didn't make an important part of this question clear when I asked it earlier this morning. The answers I've received so far are more about optimizing applications & removing bottlenecks at the code level. I am aware that this is way more important than trying to get an extra 3- or 5% out of your JVM!
This question assumes we've already done just about everything we could to optimize our application architecture at the code level. Now we want more, and the next place to look is at the JVM level and garbage collection; I've changed the question title accordingly. Thanks again!
We've got a "pipeline" style backend architecture where messages pass from one component to the next, with each component performing different processes at each step of the way.
Components live inside of WAR files deployed on Tomcat servers. Altogether we have about 20 components in the pipeline, living on 5 different Tomcat servers (I didn't choose the architecture or the distribution of WARs for each server). We use Apache Camel to create all the routes between the components, effectively forming the "connective tissue" of the pipeline.
I've been asked to optimize the GC and general performance of each server running a JVM (5 in all). I've spent several days now reading up on GC and performance tuning, and have a pretty good handle on what each of the different JVM options do, how the heap is organized, and how most of the options affect the overall performance of the JVM.
My thinking is that the best way to optimize each JVM is not to optimize it as a standalone. I "feel" (that's about as far as I can justify it!) that trying to optimize each JVM locally without considering how it will interact with the other JVMs on other servers (both upstream and downstream) will not produce a globally-optimized solution.
To me it makes sense to optimize the entire pipeline as a whole. So my first question is: does SO agree, and if not, why?
To do this, I was thinking about creating a LoadTester that would generate input and feed it to the first endpoint in the pipeline. This LoadTester might also have a separate "Monitor Thread" that would check the last endpoint for throughput. I could then do all sorts of processing where we check for average end-to-end travel time for messages, maximum throughput before faulting, etc.
The LoadTester would generate the same pattern of input messages over and over again. The variable in this experiment would be the JVM options passed to each Tomcat server's startup options. I have a list of about 20 different options I'd like to pass the JVMs, and figured I could just keep tweaking their values until I found near-optimal performance.
This may not be the absolute best way to do this, but it's the best way I could design with what time I've been given for this project (about a week).
Second question: what does SO think about this setup? How would SO create an "optimizing solution" any differently?
Last but not least, I'm curious as to what sort of metrics I could use as a basis of measure and comparison. I can really only think of:
Find the JVM option config that produces the fastest average end-to-end travel time for messages
Find the JVM option config that produces the largest volume throughput without crashing any of the servers
Any others? Any reasons why those 2 are bad?
After reviewing the play I could see how this might be construed as a monolithic question, but really what I'm asking is how SO would optimize JVMs running along a pipeline, and to feel free to cut-and-dice my solution however you like it.
Thanks in advance!

Let me go up a level and say I did something similar in a large C app many years ago.
It consisted of a number of processes exchanging messages across interconnected hardware.
I came up with a two-step approach.
Step 1. Within each process, I used this technique to get rid of any wasteful activities.
That took a few days of sampling, revising code, and repeating.
The idea is there is a chain, and the first thing to do is remove inefficiences from the links.
Step 2. This part is laborious but effective: Generate time-stamped logs of message traffic.
Merge them together into a common timeline.
Look carefully at specific message sequences.
What you're looking for is
Was the message necessary, or was it a retransmission resulting from a timeout or other avoidable reason?
When was the message sent, received, and acted upon? If there is a significant delay between being received and acted upon, what is the reason for that delay? Was it just a matter of being "in line" behind another process that was doing I/O, for example? Could it have been fixed with different process priorities?
This activity took me about a day to generate logs, combine them, find a speedup opportunity, and revise code.
At this rate, after about 10 working days, I had found/fixed a number of problems, and improved the speed dramatically.
What is common about these two steps is I'm not measuring or trying to get "statistics".
If something is spending too much time, that very fact exposes it to a dilligent programmer taking a close meticulous look at what is happening.

I would start with finding the optimum recommended jvm values specified for your hardware/software mix OR just start with what is already out there.
Next I would make sure that I have monitoring in place to measure Business throughputs and SLAs
I would not try to tweak just the GC if there is no reason to.
First you will need to find what are the major bottlenecks in your application. If it is I/O bound, SQL bound etc.
Key here is to MEASURE, IDENTIFY TOP bottlenecks, FIX them and conduct another iteration with a repeatable load.
HTH...

The biggest trick I am aware of when running multiple JVMs on the same machine is limiting the number of core the GC will use. Otherwise what can happen when one JVM does a full GC is it will attempt to grab every core, impacting the performance of all the JVMs even though they are not performing a GC. One suggestion is to limit the number of gc threads to 5/8 or less. (I can't remember where it is written)
I think you should test the system as a whole to ensure you have realistic interaction between the services. However, I would assume you may need to tune each service differently.
Changing command line options is useful if you cannot change the code. However if you profile and optimise the code you can make far for difference than tuning the GC parameters (in which cause you need to change them again)
For this reason, I would only change the command line parameters as a last resort, after you there is little improvement which can be made in the code of the application.

Large CPU usage on a java process under Linux

I am running into trouble to determine what is wrong with my software.
The situation is;
-The program is always running on background and every X minutes performs some actions.
-Right now it is set to check every 1 minute a certain directory and see if there are new files in it.
-If there are new files, they are processed and moved somewhere else.
-If not, it simply logs the event and goes idle again.
I Assume that when new files appear, CPU usage can be somewhat high.
The problem comes when, even if I dont put new files in the directory for many days, the CPU usage will raise to ~90% every minute it checks for new entrys, then after some seconds, return back to <1% usage.
The same process under windows seems somehow stable, staying always on low cpu usage.
If I monitor the CPU activty monthly, I can see that the average CPU usage for my java process keeps growing up (without putting new files to 'activate' the rest of the process), and I have to restart the process for it to return to lower CPU usage levels.
I really dont happen to understand this behaviour, so I dont really know what may be affecting this.
If the log file is somewhat 'big', like 10-20mb would it require that much cpu to log a new entry every minute?
If there are many libraries loaded in the classpath for this process, will the cpu usage be increased even though many of this libraries wont be used most all the time?
Excuse me if I haven't been very clear on my question, I am somewhat new to this.
Thanks every one in advance, regards.
--edit--
I note your advices, I will do some monitoring and I will post some code / results to share with you and see what can you come up with!
I am really lost right now!

I your custom monitoring code is causing a problem, you could always use something standard like Apache Commons IO's FileAlterationMonitor. It's simple to implement and it might be faster than fixing your current code.

Are you talking about a simple console application or a swing/awt app ?
Is the application run every minute via OS underlying at schedule or it's a simple server process ?
If the process is run as a server how do you launch the VM ? (server VM or client VM - -server switch on cmd line)
You may check also your garbage collector, sometimes logging framework use up too many object without releasing their references.
Regards
M.

Java performance Inconsistent

I have an interpreter written in Java. I am trying to test the performance results of various optimisations in the interpreter. To do this I parse the code and then repeatedly run the interpreter over the code, this continues until I get 5 runs which differ by a very small margin (0.1s in the times below), the mean is taken and printed. No I/O or randomness happens in the interpreter. If I run the interpreter again I am getting different run times:
91.8s
95.7s
93.8s
97.6s
94.6s
94.6s
107.4s
I have tried to no avail the server and client VM, the serial and parallel gc, large tables and windows and linux. These are on 1.6.0_14 JVM. The computer has no processes running in the background. So I asking what may be causing these large variations or how can I find out what is?
The actualy issue was caused because the program had to iterate to a fixed point solution and the values were stored in a hashset. The hashed values differed between runs, resulting in a different ordering which in turn led to a change in the amount of iterations needed to reach the solution.

"Wall clock time" is rarely a good measurement for benchmarking. A modern OS is extremely unlikely to "[have] no processes running in the background" -- for all you know, it could be writing dirty block buffers to disk, because it's decided that there's no other contention.
Instead, I recommend using ThreadMXBean to track actual CPU consumption.

Your variations don't look that large. It's simply the nature of the beast that there are other things running outside of your direct control, both in the OS and the JVM, and you're not likely to get exact results.
Things that could affect runtime:
if your test runs are creating objects (may be invisible to you, within library calls, etc) then your repeats may trigger a GC
Different GC algorithms, specifications will react differently, different thresholds for incremental gc. You could try to run a System.gc() before every run, although the JVM is not guaranteed to GC when you call that (although it always has when I've played with it).T Depending on the size of your test, and how many iterations you're running, this may be an unpleasantly (and nearly uselessly) slow thing to wait for.
Are you doing any sort of randomization within your tests? e.g. if you're testing integers, values < |128| may be handled slightly differently in memory.
Ultimately I don't think it's possible to get an exact figure, probably the best you can do is an average figure around the cluster of results.

The garbage collection may be responsible. Even though your logic is the same, it may be that the GC logic is being scheduled on external clock/events.
But I don't know that much about JVMs GC implementation.

This seems like a significant variation to me, I would try running with -verbosegc.
You should be able to get the variation to much less than a second if your process has no IO, output or network of any significance.
I suggest profiling your application, there is highly likely to be significant saving if you haven't done this already.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.