Profiling memory leak in a non-redundant uptime-critical application

Profiling memory leak in a non-redundant uptime-critical application - java

We have a major challenge which have been stumping us for months now.
A couple of months ago, we took over the maintenance of a legacy application, where the last developer to touch the code, left the company several years ago.
This application needs to be more or less always online. It's developed many years ago without staging and test environments, and without a redundant infrastructure setup.
We're dealing with a legacy Java EJB application running on Payara application server (Glassfish derivative) on an Ubuntu server.
Within the last year or two, it has been necessary to restart Payara approximately once a week, and the Ubuntu server once a month.
This is due to a memory leak which slows down the application over a period of around a week. The GUI becomes almost entirely non-responsive, but a restart of Payara fixes this, at least for a while.
However after each Payara restart, there is still some kind of residual memory use. The baseline memory usage increases, thereby reducing the time between Payara restarts. Around every month, we thus do a full Ubuntu reboot, which fixes the issue.
Naturally we want to find the memory leak, but we are unable to run a profiler on the server because it's resource intensive, and would need to run for several days in order to capture the memory leak.
We have also tried several times to dump the heap using "gcore" command, but it always result in a segfault and then we need to reboot the Ubuntu server.
What other options / approaches do we have to figure out which objects in the heap are not being garbage collected?

I would try to clone the server in some way to another system where you can perform tests without clients being affected. Could even be a system with less resources, if you want to trigger a resource based problem.
To be able to observe the memory leak without having to wait for days, I would create a load test, maybe with Apache JMeter, to simulate accesses of a week within a day or even hours or minutes (don't know if the base load is at a level where that is feasible from the server and network infrastructure).
First you could set up the load test to act as a "regular" mix of requests like seen in the wild. After you can trigger the loss of response, you can try to find out, if there are specific requests that are more likely to be the cause for the leak than others. (It also could be that some basic component that is reused in nearly any call contains the leak, and so you cannot find out "the" call with the leak.)
Then you can instrument this test server with a profiler.
To get another approach (you could do it in parallel) you also can use a static code inspection tool like SonarQube to analyze the source code for typical patterns of memory leaks.
And one other idea comes to my mind, but it is coming with many preconditions: if you have recorded typical scenarios for the backend calls, and if you have enough development resources, and if it is a stateless web application where each call could be inspoected more or less individually, then you could try to set up partial integration tests where you simulate the incoming web calls, with database and file access, but if possible without the application server, and record the increase of the heap usage after each of the calls. Statistically you might be able to find out the "bad" call this way. (So this would be something I would try as very last option.)

Apart from heap dump have to tried any realtime app perf monitoring (APM) like appdynamics or the opensource alternative like https://github.com/scouter-project/scouter.
Alternate approach would be to analyse existing application issue Eg: Payara issues like these https://github.com/payara/Payara/issues/4098 or maybe the ubuntu patch you are currently running app on.

You can use jmap, an exe bundled with the JDK, to check the memory. From the documentation:-
jmap prints shared object memory maps or heap memory details of a given process or core file or a remote debug server.
For more information you can see the documentation or see the stackoverflow question How to analyse the heap dump using jmap in java
There is also a tool called jhat which can be used tp analise java heap.
From the documentation:-
The jhat command parses a java heap dump file and launches a webserver. jhat enables you to browse heap dumps using your favorite webbrowser. jhat supports pre-designed queries (such as 'show all instances of a known class "Foo"') as well as OQL (Object Query Language) - a SQL-like query language to query heap dumps. Help on OQL is available from the OQL help page shown by jhat. With the default port, OQL help is available at http://localhost:7000/oqlhelp/
See JHat Dcoumentation, or How to analyze the heap dump using jhat

Related

java pid keep increasing

We have Tomcat application running in a Debian 6.07 Server.
Lately CPU used were increasing gradually.
Using Top command I noticed that Java PID keep increasing everyday.
I need to restart the tomcat to make it back to normal.
After restart the tomcat, Java Cpu Used will be back to around 2 %.
From that moment it will increase everyday, and I will need to restart the tomcat every time it reach around 40 %.
Is there any way to fix this issues ?
Thank you

It looks like you have some memory leak or some thread which consumes memory or processing iteratively without freeing unused resources.
Also, you may use tools like Java Profiler (or any other java auditing and profiling tools) to analyze what resources are being used and by whom (classes, threads... etc.)
checkout the following links for Java profiling tools:
https://blog.idrsolutions.com/2014/06/java-performance-tuning-tools/
http://www.infoq.com/articles/java-profiling-with-open-source
(if you can share more info I'll edit my answer properly)

JAVA Lightest Thread Framework

I have a project where I have to send emails using Amazon SES REST API, now amazon allows concurrent connections at a same time based on account. So in my case amazon allows me to open 50 connections at a same time, which means I can send 50 emails/sec. To achieve this, currently I am using JAVA Executioner threads where I control the thread speed to be 50/sec. Also I have implemented this in Hibernate framework because I need to execute some SQL queries before sending emails.
This java program runs continuously in background(its a jar file). This takes around 512MB RAM, so my question is that can I use some other frameworks or better thread system to make it more lighter? The SQL query I execute is only a select query, update/delete/create queries are not used.
I am not good in JAVA so may be this sounds stupid.

I guess the smallest possible framework to use would be plain JDBC.
This would limit your libraries to those in the jre plus the DB driver and maybe libs for AWS / Email. Depending on what else you need, selecting a compact profile might be worth investigating.
Also check your memory settings:
If you set -Xms512m it's really not surprising your app uses 512m, is it?
Edit due to rephrased question
In your level of parallelism, most of your Memory is consumed by Objects, not by Threads (well, Threads are objects, but small ones). Threads are good the way they are in Java. You can run hundrets of them without them consuming 500 mb of heap or more as you claim.
So the issue with 50 threads consuming 512m of your memory is more likely rooted in your code and your objects, not (only) in your threads.
In order to reduce memory footprint, tra the follwing:
Remove hibernate. As you say you only have a simple select SQL, so you don't need the overhead and additional libraries.
Take a memory dump of your running app and analyse it. (MAT - Eclipse Memory Analyser tool comes to mind)
Check other objects and how you use them. When you say "sending emails" - how large are your emails? Might there be duplicate buffers do to bad choice of coding? Share your code for how you do it, then we can have a look.
Try running without any memory options and see how the program runs on defaults.
Add garbage collector output and check that

Is it possible to instantiate a jvm from a heap dump?

Everyone knows that a heap dump can be obtained from a running JVM. Is the other way possible? Can we start a JVM using a heap dump?
I have been having this question in mind for a long time now. If this is possible it would solve a lot of time and make thinks easy for a support engineer. It helps big time in cases where if we have to recreate some the rare problems our customer face. [Just imagine that the underlying hardware and Java runtime are the same and also all the supporting files are also present in the respective location in file system].
Added note: The intention of doing this is not when OOM occurs but at any given point after JVM starts.

No you can't.
You would need things like the current position in each open file. That affects what data is returned on a simple sequential read. The restorer would need to open each file and get it to the correct position. That may not be possible for non-seekable streams.
Program specific serialization is a much more feasible path, then setup the program from there.
Also, as Heap Dumps are usually from OutOfMemory situtaions, recreating JVM from same would again throw OutOfMemoryException. If you are taking heap dump in between, then serialize your objects and restore them when you bring up jvm.
(contents copied from comments of this question, Authors almas-shaikh and patricia-shanahan)

To create a heap dump from a running JVM you can also use jhat, or jcmd (with the GC.heap_dump command), both exist in the JDK/bin folder. MAT is one way of analyzing the contents of the dump. Java Mission Control has a tool called JOverflow that analyzes heap dumps, but only to look at memory waste patterns.
I have never heard of any way to restart a JVM from sort of image, a heap dump would not be enough at all, since it only contains the Java objects, and not the compiled code and other things.

I think you are looking for tools like Java Mission Control and Chronon DVR (commercial). These help you incident analysis, event collection and profiling, time travel debugging (as phrased by chronon)
As per their documentation:
Java Mission Control
Java Flight Recorder and Java Mission Control together create a
complete tool chain to continuously collect low level and detailed
runtime information enabling after-the-fact incident analysis. Java
Flight Recorder is a profiling and event collection framework built
into the Oracle JDK. It allows Java administrators and developers to
gather detailed low level information about how the Java Virtual
Machine (JVM) and the Java application are behaving. Java Mission
Control is an advanced set of tools that enables efficient and
detailed analysis of the extensive of data collected by Java Flight
Recorder. The tool chain enables developers and administrators to
collect and analyze data from Java applications running locally or
deployed in production environments.Starting with the release of
Oracle JDK 7 Update 40 (7u40)
Some key features of Chronon Recording Server which will be useful in your case:
The Recording Server is specifically designed for long running, server
side applications that run for weeks or months at a time. The
Recording Server will take care of splitting the recording if it get
too large and flushing out old recordings.
Get rid of the need to look at long sparsely detailed log files to
debug your program. Just play back the entire execution and see
exactly what took place in your program.
The Recording Server makes it share recordings on different machines
among team members or across multiple teams.

Java Webapp Performance Issues

I have a Web Application, Made entirely with Java. The Webapp doesn't use any Graphical / Model Framework, instead, the webapp uses The Model-View Controller. It's made only with Servlet specification (Servlet ver. 2.4).
The webapp it's developed since 2001, and it's very complex. Initially, was built for work with Tomcat 4.x/5.x. Actually, runs on Tomcat 6.x. But, we still having memory Leaks.
In Depth, the specifications of The Webapp can resumed as:
Uses Servlet v. 2.4 Specification.
It doesn't use Any Framework
It doesn't use JavaEE (Not EJB)
It's based on JavaSE (With Servlets)
Works Only on IE 6+ (Because of it's age)
Infrastructure Specification
Actually, the webapp works in three environments:
First
IBM Server (I don't remember exactly the model)
Intel Xeon 2.4 Ghz
32GB RAM
1TB HDD
Tomcat (Version 6) is configured to use 8GB of RAM
Second
Dell Server
Intel Xeon 2.0Ghz
4GB RAM
500GB HDD
Tomcat (Version 5.5) is configured to use 1.5GB of RAM
Third
Dell Server
Amd Opteron 1214 2.20Ghz
4GB RAM
320GB HDD
Tomcat (Version 6) is Configured to use 1.5GB of RAM
Database specification
The webapp uses SQL Server 2008 R2 Express Edition as a DBMS, except for the user of the first server-specification, that uses SQL Server 2008 R2 Standard Edition. For the connection pools, the app uses Apache DBCP.
Problem
Well, it has very serious performance issues. The webapp slow down continually, and, many times Denies the Service. The only way to recover the app is restarting The Apache Tomcat Service.
During a performance Audit, i've found several programming issues (Like database connections that never closes, excesive use of Vector collection [instead of ArrayList]).
I want to know how can improve the performance for the app, which applications can help me to monitoring the Tomcat performance and the Webapp Memory usage.
All suggestions are gladly accepted.

You could also try stagemonitor. It is an open source performance monitoring library. It records request response times, JVM metrics, request details including a call stack (profile) of the called methods during the request and more. Because of the low overhead, you can also use it in production.
The tuning procedure would be the following.
Identify slow requests with the Request Dashboard
Analyze the stack trace of the request with the Request Detail Dashboard to find out about slow methods
Dive into your code and try to optimize those slow methods
You can also correlate some metrics like the throughput or number of sessions with the response time or cpu usage
Analyze the heap with the JVM Memory Dashboard
Note: I am the developer of stagemonitor.

I would start with some tools that can help you profiling the application. Since you are developing webapp start with Lambda Probe and Java melody.
The first step is to determine the conditions under which the app starts to behave oddly. Ask yourself few questions:
Do performance issues arise right after applications starts, or overtime?
Do performance issues are correlated to quantity of client requests?
What is the real performance problem - high load on the server or lack of memory (note that they are related, so check which one starts first)
Are there any background processes which are performing some massive operations? Are they scheduled to run at some particular time period?
Try to find some clues before going deep into code. It will help you to narrow down possible causes.
As Joshua Bloch has stated in his book entitled "Effective Java" - performance issues are rarely the effect of some minor mistakes in source code (although, of course, misuse of Java constructs can lead to disaster). Usually the cause is bad system (API) architecture.
The last suggestion based on my experience - try not to think that high memory consumption is something bad. Tomcat will use as much memory as operating system and JVM will let him (not more than max settings) and just when it needs more - Tomcat will perform garbage collection. So a typical (proper!) graph of memory consumption looks like a saw. If you are dealing with memory leak, then the graph will be increasing constantly, but indefinitely. This is the most often misunderstood of memory leaks, so keep it in mind.
To be honest - we cannot help you much further. Those are just pointers, now you will have to make extensive research to figure out the cause :)

The general solution is to use a profiler e.g. YourKit, with a realistic workload which reproduces the problem.
What I do first is a CPU only profile, a memory only profile and finally a CPU & Memory profile on at once (I then look at the CPU profile results)
YourKit can also monitor your high level operations such a Java EE resources and JDBC connections. I haven't tried these as I don't use them. ;)
It can be a good idea to improve the efficiency even if its not the cause of the problem as it will reduce the amount of "noise" in these profiles and make your issues more obvious.
You could try increasing the amount of memory available but a suspect it will just delay the problem.

Ok. So I have seen huge Java applications run lesser configurations. You should try to do the following -
First connect a Profiler to your application and see which part of your application takes the most time. You can use JProfiler or Eclipse MAT ( I personally prefer JProfiler). Also try to take a look at the objects taking the most memory. This will help you narrow down to the parts which you need to rewrite to improve the performance.
Once you have taken a look at the memory leaks update your application to use 64bit JDK(assuming it already does not do so)
Take a look at your JVM arguments and optimize them.

You can try the open source tool Webapp Watcher in order to identify where in the code is the performance issue.
You have first to add a filter in the webapp (as explained here) in order to record metrics, and then import the logs in the WAW Analyzer tool and follow the steps described in the doc to know where is the potential performance issue in the code.

How to debug Java memory errors?

There is a Java Struts application running on Tomcat, that have some memory errors. Sometimes it becomes slowly and hoard all of the memory of Tomcat, until it crashes.
I know how to find and repair "normal code errors", using tests, debugging, etc, but I don't know how to deal with memory errors (How can I reproduce? How can I test? What are the places of code where is more common create a memory error? ).
In one question: Where can I start? Thanks
EDIT:
A snapshot sended by the IT Department (I haven't direct access to the production application)

Use one of the many "profilers". They hook into the JVM and can tell you things like how many new objects are being created per second, and what type they are etc.
Here's just one of many: http://www.ej-technologies.com/products/jprofiler/overview.html
I've used this one and it's OK.

http://kohlerm.blogspot.com/
It is quite good intro how to find memory leaks using eclipse memory analyzer.
If you prefer video tutorials, try youtube, although it is android specific it is very informative.

If your application becomes slowly you could create a heap dump and compare it to another heap dump create when the system is in a healthy condition. Look for differences in larger data structures.

You should run it under profiler (jprofile or yourkit, for example) for some time and see for memory/resource usage. Also try to make thread dumps.

There are couple of options profiler is one of them, another is to dump java heap to a file and analyze it with a special tool (i.e. IBM jvm provides a very cool tool called Memory Analizer that presents very detailed report of allocated memory in the time of jvm crash - http://www.ibm.com/developerworks/java/jdk/tools/memoryanalyzer/).
3rd option is to start your server with jmx server enabled and connect to it via JConsole with this approach you would be able to monitor memory ussage/allocation in the runtime. JConsole is provided with standard sun jdk under bin directory (here u may find how to connect to tomcat via jconsole - Connecting remote tomcat JMX instance using jConsole)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.