Application slows down over time - Java + Python

Application slows down over time - Java + Python - java

This is a difficult one to explain, and not hopeful for a single, simple answer, but thought it's worth a shot. Interested in what might slow down a long Python job that interacts with a Java application.
We have an instance of Tomcat running a fairly complex and robust webapp called Fedora Commons (not to be confused with Fedora the OS), software for storing digital objects. Additionally, we have a python middleware that performs long background jobs with Celery. One particular job is ingesting a 400+ page book, where each page of the book has a large TIFF file, then some smaller PDF, XML, and metadata files. Over the course of 10-15 minutes, derivatives are created from these files and they are added to a single object in Fedora.
Our problem: over the course of ingesting one book, adding files to the digital object in the Java app Fedora Commons slows down very consistently and predictably, but I can't figure out how or why.
I thought a graph of the ingest speeds might help, perhaps it belies a common memory management pattern that those more experienced with Java might recognize:
The top-left graph is timing large TIFFs, being converted to JP2, then ingested into Fedora Commons. The bottom-left is very small XML files, with no derivative being made, ingested as well. As you can see, the slope of their curve slowing down is almost identical. On the right, are those two processes graphed together.
I've been all over the internet trying to learn about garbage collection in Java (GC), trying different configurations, but not having much effect on the slowdown. If it helps, here are some memory configurations we're passing to Tomcat (where the tail-end I believe are mostly diagnostic):
JAVA_OPTS='-server -Xms1g -Xmx1g -XX:+UseG1GC -XX:+DisableExplicitGC -XX:SurvivorRatio=10 -XX:TargetSurvivorRatio=90 -verbose:gc -Xloggc:/var/log/tomcat7/ggc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC'
We're working with 12GB of RAM on this VM.
I realize the number of factors that might result in this behavior are, excuse the pun, off the charts. But we've worked with Fedora Commons and our Python middleware for quite some time, and been mostly successful. This slow down you could set your watch too just feels suspiciously Java / garbage collection related, though I could be very wrong about that too.
Any help or advice for digging in more is appreciated!

You say you suspect GC as the problem, but you show no GC metrics. Put your program through a profiler and see why the GC is overloaded. It is hard to solve a problem without identifying the cause.
Once you have the found where the problem lies, likely you will need to change the code instead of just tweaking GC settings.

Thanks to all for the suggestions around GC and Tomcat analysis. Turns out, the slowdown was entirely due to ways that Fedora Commons builds digital objects. I was able to isolate this by creating an extremely simple digital object, iteratively adding near zero-size datastreams and watching the progress. You can see this in the graph below:
The curve of the slowdown as almost identical, which suggested it was not our particular ingest method or file sizes. Furthermore, prompted me to dig back into old forum posts about Fedora Commons which confirm that single objects are not meant to contain a large number of datastreams.
It is perhaps interesting how this knowledge was obfuscated behind intellectual organization of digital objects, and not specifically the performance hits you take with Fedora, but that's probably fodder for another forum.
Thanks again to all for the suggestions - if nothing else, normal usage of Fedora is finer tuned and humming along better than before.

Well, instead of looking into obscure GC settings, you might want to start managing memory explicitly, so the GC doesn't affect your execution that much.

Related

Java runtime check [duplicate]

I need to optimise a Java application. It makes some 3rd party calls. I need some good tool to accurately measure the time taken by individual API calls.
To give an idea of complexity-
the application takes a data source file containing 1 million rows, and it takes around one hour to complete the processing. As a part of processing , it makes some 3rd party calls (including some network calls). I need to identify which calls are taking more time then others, and based on that, find out a way to optimise the application.
Any suggestions would be appreciated.

I can recommend JVisualVM. It's a great monitoring / profiling tool that is bundled with the Oracle/Sun JDK. Just fire it up, connect to your application and start the CPU-profiling. You should get great histograms over where the time is spent.
Getting Started with VisualVM has a great screen-cast showing you how to work with it.
Screen shot:
Another more rudimentary alternative is to go with the -Xprof command line option:
-Xprof
Profiles the running program, and sends profiling data to
standard output. This option is provided as a utility that is
useful in program development and is not intended to be be
used in production systems.

I've been using YourKit a few times and what quite happy with it. I've however never profiled a long-running operation.
Is the processing the same for each row? In which case the size of the input file doesn't really matter. You could profile a subset to figure out which calls are expensive.

Just wanted to mention the inspectIT tool. It recently became completely open source (https://github.com/inspectIT/inspectIT). It provides complete and detailed call graph with contextual information, there are many out-of the box sensor for database calls, http monitoring, exceptions, etc.
Seams perfect for your use-case..

Try OPNET's Panorama software product

It sounds like a normal profiler might not be the right tool in this case, since they're geared towards measuring the CPU time taken by the program being profiled rather than external APIs that it calls, and they tend to incur a high overhead of their own and collect a large amount of data that would probably overwhelm your system if left running for a long time.
If you really need to collect performance data over such a long time, and mainly for external calls, then Perf4J is probably a better tool.

In our office we use YourKit profiler on a day to day basis. It's really light weight and serves most of the performance related use cases we have had.
But I have also used Visual VM. It's free and fast. You may first want to give Visual VM a try before going towards YourKit (YourKit is not freeware).

visualvm (part of the SDK) and Java 7 can produce detailed profiling.

I use profiler in NetBeans (it is really brilliant and already built in, no need to install plugin) or JVisualVM when not using NetBeans.

Is there way to benchmark your computer with java code?

I'm java developer and my goal is to understand which computer is best suited for some statistical evaluation. I have 3 different desktops with different os(Windows 7, MacOS, Ubuntu).
JVM based program seems best suited for this benchmark.
Is there some maven besed package which I can put to dependency and run on all these desktops to get HDD/CPU/Memory benchmark?
The question is about java libraries which provides CPU/IO/memory benchmarks...

Not in any meaningful way, AFAIK. The purpose you have proposed "some statistical evaluation" is too broad for meaningful benchmarking.
In fact, the only meaningful approach would be to:
Select the statistical application that you are going to use.
Select a bunch of representative problems; i.e. problems that are typically of what you are going to be doing ... in both quality and "size".
Code the solutions using your selected application.
Run the solutions, and measure the times taken.
Tune the solutions / application and repeat the previous step until you are satisfied that you are getting the best performance out of the application.
Run the application on the candidate machines.
Compare the times, across all of your problems on all machines.
I would posit that unless you are trying to run really large analyses on an underpowered machine, it is not going to make much difference which OS you use. The critical issues are likely to be using a fast enough machine with enough memory (if the analysis requires lots of memory), picking the right application, coding the solutions correctly, and tuning the application. The choice of OS probably won't matter ... unless you push the memory envelope too hard.
I will disagree. If what you are saying is correct there were no such think as SUperPI, 3DMark etc. Only problem with that stuff it is OS specific so I can compare 2 windows laptops only. Performance can be easly measured with elemntal operations such as write/read disk/memmory. Arithmetical operations. Thats is actaully universe of possible computer operations.
Well fine.
If you think you can find a meaningful benchmark that compares application-level performance across different OSes ... go find one.
And if you think such a benchmark is going to give you numbers that are applicable to running Java statistical analysis tools, feel free to use it. (Hint: the OS-specific benchmarks like SUperPI, 3DMark, etc are not great predictors of performance running applications.)
And if you think that Java application performance is only about how fast disk read/write, memory read/write and basic arithmetic instructions ... feel free to continue believing that.
Unfortunately, reality is very different.
But my guess is that doesn't make a lot of difference what OS you choose, provided that the hardware is up to it.

How do I allocate contiguous disk space?

I am developing a system which works with lots of files and doing some google searches I read about improving the speed of information retrieval by the hard disk. But since I work with Java, I can't find any library to work with this issue. I have a very vague knowledge in C++, and found something about hard disk information retrieval with IOCTL.
Apparently there is no way of getting specific information like how many contiguous free blocks can I get from my hard disk or the maximum of contiguous free blocks it have.
I am currently working with Windows 7 and XP.
I am aware of the use of JNI but I have strong problems with C++. But even searching for C++ solutions I can't find anything. Maybe I am doing some wrong queries on Google.
Could someone please give me a link, suggestions of study or anything? I am willing to study C++ (although I have almost no free time).
Thank you very much!
PS-Edit: I know it would practically make no difference. But I really need to learn about this. But thanks to everyone giving advices.

Have you identified a performance problem? If not, then don't do anything.
Are you sure that the physical distribution of the files on the disk is the cause of this performance problem? If not, then measure where the time is spent in your application, and try to improve the algorithms, introduce caches if necessary.
If you have done all this, and are sure it's the physical distribution of the files on the disk that's causing the performance problem, have you thought about buying a faster disk, or about using several ones? Hardware is often much cheaper than development time.
I very much doubt the physical distribution of the files on the disk has a significant impact on the performance of your app. I would search elsewhere first.

NTFS already tries to allocate your files contiguously, as stated in this blog post of a Windows 7 engineer. Your files will only be fragmented if there is no big enough contiguous chunk of free space.
If you believe that it is important for your files not to be fragmented, then I think the best option is to schedule a nightly defragmentation of your disk. That's more of a system administration problem.
Finally, fragmentation is irrelevant on SSD disks.

AFAIK there's no built in way nor a 100% pure java solution. The problem is that retrieving that kind of information depends on the platform and since Java should be platform independent you only can use a common subset.

Captain Kernel explains here that this won't necessarily increase disk performance, and beyond that, is not possible without extensive work.

Best way to measure Memory Usage of a Java Program?

I'm currently using VisualVM, but the problem I'm having is that I can't save the graphs it generates. I need to report some data about its memory usage and running time, though running time is easy to get with System.nanoTime(). I've also tried the NetBeans profiler but it isn't what I want, since I'm not looking for specific parts that would be slowing it down or anything, so that would be overkill. The biggest problem with it is that it eats up too much processing time. Also doesn't let me capture/transfer the data easily, like VisualVM, at least as far as I can tell.
Ideally the best way to go about it would be some method call because then I'd be able to get the information a lot more easily, but anything like VisualVM that actually lets me save the graph is fine. Performance with VisualVM is pretty good too, compared to the NetBeans profiler, though I suppose that's because I wasn't using its profiler.
I'm currently using Ubuntu, but Windows 7 is fine. I'd rather have a program that specializes in doing this though, since the information gotten by programs who don't is likely to include the JVM and other things that would be better left out.
Well, apparently, you can save snapshots of the current session and maximize the window in VisualVM, so you could make the charts bigger, take a snapshot and cut them... But that's kind of a hack. Better suggestions welcome.

Runtime.getRuntime().freeMemory();
Runtime.getRuntime().totalMemory();

Look at the Runtime class. It has freeMemory, maxMemory, and totalMemory. That's probably close enough for your purposes.

You may prefer graceful method to measure memory, rather than hack image.
JConsole is known to Monitor Applications by JMX,it provides program API. I guess it is what you need.
See: Using JConsole to Monitor Applications

Try JProfiler. Although its not free you can try evaluation version first.

The HPjmeter console is free. Run your Java process with -Xloggc:<file> and open the <file> with it. Not only can you save your sessions, but you can compare runs. Other options to consider including in your command line are:
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails

Can the JVM provide snapshot persistence?

Is it possible to dump an image of a running JVM and later restore the previous state by loading the image into the JVM? I'm fairly certain the answer is negative, but would love to be wrong.
With all the dynamic languages available for the JVM comes an increase in interactivity, being able to save a coding session would help save time manually restoring the VM to a previous session.

There was a JSR 323 proposed for this a while back but it was rejected. You can find some links in those articles about the research behind this and what it would take. It was mostly rejected as an idea that was too immature.
I have heard of at least one startup (unfortunately don't recall the name) that was working on a virtualization technology over a hypervisor (probably Xen) that was getting pretty close to being able to move JVMs, including even things like file system refs and socket endpoints. Because they were at the hypervisor level, they had access to all of that stuff. By hooking that and the JVM, they had most of the pieces. I think they might have gone under though.
The closest thing you can get today is Terracotta, which allows you to cluster a portion of your JVM heap, storing it in a server array, which can be made persistent. On JVM startup, you connect to the cluster and can continue using whatever portions of your heap are specified as clustered. The actual objects are faulted in on an as-needed basis.

Not possible at present. In general, pausing and restarting a memory image of a process in a different context is incredibly hard to achieve: what are you going to do with open OS resources? Transfers to machines with different instruction sets? database connections?
Also images of the running JVM are probably quite large - maybe much larger than the subset of the state you are actually interested in. So it's not a good idea from a performance perspective.
A much better strategy is to have code that persists and recreates the application state: this is relatively feasible with most JVM dynamic languages. I do so similar stuff in Clojure, where you have an interactive environment (REPL) and it is quite possible to create and run a sequence of operations that rebuild the application state that you want in another JVM.

This is currently not possible in any of the JVMs I know. It would not be very difficult to implement something like this in the JVM if programs run disconnected from their environments. However, many programs have hooks into their environment (think file handles, database connections) which would make implementing something like this very hairy.

As of early 2023, there's some progress in this space and it seems a lot of things can at least be tried, even if without claims for their production readiness.
One such feature is called CRaC. You can check their docs or even get an OpenJDK build that includes the feature. The project has its own repo under OpenJDK and looks quite promising.
Another vendors/products to check:
Azul ReadyNow!
OpenJ9 InstantOn
What's also really exciting, is AWS Lambda SnapStart. It doesn't give you full snapshoting capabilities, and is intrinsically vendor-specific, but it's what a ton of Java engineering who use AWS Lambda were waiting for so long.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.