How to generate histogram of Java object lifetimes

How to generate histogram of Java object lifetimes - java

I have a Tomcat Java webapp which is thrashing the Java GC when under load. I think this is due to a combination of a large amount of short lived objects along with an unknown amount of moderately long lived objects.
To validate this theory I want to find a tool which will let me determine the object lifetimes for all allocated objects (or every 10th object etc for better performance). Ideally the final output will be a histogram showing the relative number of objects which live for different amounts of time.
I think this tool will likely be built on top of either the Instrumentation API or the JVMTI. If there are no good tools which already do this I would also appreciate suggestions about which of the JVM's interfaces would be best to use when writing such a tool.

I have now started writing a tool to do what I originally asked about. The current code can be found here:
http://wiki.github.com/mchr3k/org.inmemprofiler/
So far I have managed to get a textual histogram of all object allocations by instance count. This does not include array allocations which are handled differently.
I am now working on adding instance size information along with tracking of array allocations by using the JVMTI.

Related

Creating new objects versus encoding data in primitives

Let's assume I want to store (integer) x/y-values, what is considered more efficient: Storing it in a primitive value like long (which fits perfect, due sizeof(long) = 2*sizeof(int)) using bit-operations like shift, or and a mask, or creating a Point-Class?
Keep in mind that I want to create and store many(!) of these points (in a loop). Would be there a perfomance issue when using classes? The only reason I would prefer storing in primtives over storing in class is the garbage-collector. I guess generating new objects in a loop would trigger the gc way too much, is it correct?

Of course packing those as long[] is going to take less memory (though it is going to be contiguous). For each Object (a Point) you will pay at least 12 bytes more for the two headers.
On other hand, if you are creating them in a loop and thus escape analysis can prove they don't escape, it can apply an optimization called "scalar replacement" (thought is it very fragile); where your Objects will not be allocated at all. Instead those objects will be "desugared" to fields.
The general rule is that you should code the way it is the most easy to maintain and read that code. If and only if you see performance issues (via a profiler let's say or too many pauses), only then you should look at GC logs and potentially optimize code.
As an addendum, jdk code itself is full of such long where each bit means different things - so they do pack them. But then, me and I doubt you, are jdk developers. There such things matter, for us - I have serious doubts.

Reducing memory footprint while using large XML DOM's in Java

Our application is required to take client data presented in XML format (several files) and parse this into our common XML format (a single file with schema). For this purpose we are using apache's XMLBeans data binding framework. The steps of this process are briefly described below.
First, we take raw java.io.File objects pointing to the client XML files on-disk and load these into a collection. We then iterate over this collection creating a single apache.xmlbeans.XmlObject per file. After all files have been parsed into XmlObjects, we create 4 collections holding the individual objects from the XML documents that we are interested in (to be clear, these are not hand-crafted objects but what I can only describe as 'proxy' objects created by apache's XMLBeans framework). As a final step, we then iterate over these collections to produce our XML document (in memory) and then save this to disk.
For the majority of use cases, this process works fine and can easily run in the JVM when given the '-Xmx1500m' command-line argument. However, issues arise when we are given 'large datasets' by the client. Large in this instance is 123Mb of client XML spread over 7 files. Such datasets result in our in-code collections being populated with almost 40,000 of the aforementioned 'proxy objects'. In these cases the memory usage just goes through the roof. I do not get any outofmemory exceptions the program just hangs until garbage collection occurs, free-ing up a small amount of memory, the program then continues, uses up this new space and the cycle repeats. These parsing sessions currently take 4-5 hours. We are aiming to bring this down to within an hour.
Its important to note that the calculations required to transform client xml into our xml require all of the xml data to cross-reference. Therefore we cannot implement a sequential parsing model or batch this process into smaller blocks.
What I've tried so far
Instead of holding all 123Mb of client xml in memory, on each request for data, load the files, find the data and release the references to these objects. This does seem to reduce the amount of memory consumed during the process but as you can imagine, the amount of time the constant I/O takes removes the benefit of the reduced memory footprint.
I suspected an issue was that we are holding an XmlObject[] for 123Mb worth of XML files as well as the collections of objects taken from these documents (using xpath queries). To remedy, I altered the logic so that instead of querying these collections, the documents were queried directly. The idea here being that at no point does there exist 4 massive Lists with 10's of 1000's of objects in, just the large collection of XmlObjects. This did not seem to make a difference at all and in some cases, increases the memory footprint even more.
Clutching at straws now, I considered that the XmlObject we use to build our xml in-memory before writing to disk was growing too large to maintain alongside all the client data. However, doing some sizeOf queries on this object revealed that at its largest, this object is less than 10Kb. After reading into how XmlBeans manages large DOM objects, it seems to use some form of buffered writer and as a result, is managing this object quite well.
So now I am out of ideas; Can't use SAX approaches instead of memory intensive DOM approaches as we need 100% of the client data in our app at any one time, cannot hold off requesting this data until we absolutely need it as the conversion process requires a lot of looping and the disk I/O time is not worth the saved memory space and I cannot seem to structure our logic in such a way as to reduce the amount of space the internal java collections occupy. Am I out of luck here? Must I just accept that if I want to parse 123Mb worth of xml data into our Xml format that I cannot do it with the 1500m memory allocation? While 123Mb is a large dataset in our domain, I cannot imagine others have never had to do something similar with Gb's of data at a time.
Other information that may be important
I have used JProbe to try and see if that can tell me anything useful. While I am a profiling noob, I ran through their tutorials for memory leaks and thread locks, understood them and there doesn't appear to be any leaks or bottlenecks in our code. After running the application with a large dataset, we quickly see a 'sawblade' type shape on the memory analysis screen (see attached image) with PS Eden space being taken over with a massive green block of PS Old Gen. This leads me to believe that the issue here is simply sheer amount of space taken up by object collections rather than a leak holding onto unused memory.
I am running on a 64-Bit Windows 7 platform but this will need to run on a 32 Bit environment.

The approach I'd take would be make two passes on the files, using SAX in both cases.
The first pass would parse the 'cross-reference' data, needed in the calculations, into custom objects and store them Maps. If the 'cross-reference' data is large then look at using distributed cache (Coherence is the natural fit if you've started with Maps).
The second pass would parse the files, retreive the 'cross-reference' data to perform calculations as needed and then write the output XML using the javax.xml.stream APIs.

How to test how many bytes an object reference use in Java?

I would like to test how many bytes an object reference use in the Java VM that I'm using. Do you guys know how to test this?
Thanks!

Taking the question literally, on most JVMs, all references on 32-bit JVMs take 4 bytes, one 64-bit JVMs, a reference takes 8 bytes unless -XX:+UseCompressedOops has been used, in which case it takes 4-bytes.
I assume you are asking how to tell how much space an Object occupies. You can use Instrumentation (not a simple matter) but this will only give you a shallow depth. Java tends you break into many objects something which is C++ might be a single structure so it is not as useful.
However, ifyou have a memory issue, I suggest you a memory profiler. This will give you the shallow and deep space objects use and give you a picture across the whole system. This is often more useful as you can start with the biggest consumers and optimise those as even if you have been developing Java for ten years+ you will only be guessing where is the best place to optimise unless you have hard data.
Another way to get the object size if you don't want to use a profiler is to allocate a large array and see how much memory is consumed, You have to do this many times to get a good idea what the average size is. I would set the young space very high to avoid GCs confusing your results e.g. -XX:NewSize=1g

It can differ from JVM to JVM but "Sizeof for Java" says
You might recollect "Java Tip 130: Do You Know Your Data Size?" that described a technique based on creating a large number of identical class instances and carefully measuring the resulting increase in the JVM used heap size. When applicable, this idea works very well, and I will in fact use it to bootstrap the alternate approach in this article.

If you need to be fairly accurate, check out the Instrumentation framework.

This one is the one I use. Got to love those 16-byte references !
alphaworks.ibm.heapanalyzer

How to handle huge data in java

right now, i need to load huge data from database into a vector, but when i loaded 38000 rows of data, the program throw out OutOfMemoryError exception.
What can i do to handle this ?
I think there may be some memory leak in my program, good methods to detect it ?thanks

Provide more memory to your JVM (usually using -Xmx/-Xms) or don't load all the data into memory.
For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.

If you must have all the data in memory, try caching commonly appearing objects. For example, if you are looking at employee records and they all have a job title, use a HashMap when loading the data and reuse the job titles already found. This can dramatically lower the amount of memory you're using.
Also, before you do anything, use a profiler to see where memory is being wasted, and to check if things that can be garbage collected have no references floating around. Again, String is a common example, since if for example you're using the first 10 chars of a 2000 char string, and you have used substring instead of allocating a new String, what you actually have is a reference to a char[2000] array, with two indices pointing at 0 and 10. Again, a huge memory waster.

You can try increasing the heap size:
java -Xms<initial heap size> -Xmx<maximum heap size>
Default is
java -Xms32m -Xmx128m

Do you really need to have such a large object stored in memory?
Depending of what you have to do with that data you might want to split it in lesser chunks.

Load the data section by section. This will not let you work on all data at the same time, but you won't have to change the memory provided to the JVM.

You could run your code using a profiler to understand how and why the memory is being eaten up. Debug your way through the loop and watch what is being instantiated. There are any number of them; JProfiler, Java Memory Profiler, see the list of profilers here, and so forth.

Maybe optimize your data classes? I've seen a case someone has been using Strings in place of native datatypes such as int or double for every class member that gave an OutOfMemoryError when storing a relatively small amount of data objects in memory. Take a look that you aren't duplicating your objects. And, of course, increase the heap size:
java -Xmx512M (or whatever you deem necessary)

Let your program use more memory or much better rethink the strategy. Do you really need so much data in the memory?

I know you are trying to read the data into vector - otherwise, if you where trying to display them, I would have suggested you use NatTable. It is designed for reading huge amount of data into a table.
I believe it might come in handy for another reader here.

Use a memory mapped file. Memory mapped files can basically grow as big as you want, without hitting the heap. It does require that you encode your data in a decoding-friendly way. (Like, it would make sense to reserve a fixed size for every row in your data, in order to quickly skip a number of rows.)
Preon allows you deal with that easily. It's a framework that aims to do to binary encoded data what Hibernate has done for relational databases, and JAXB/XStream/XmlBeans to XML.

determining java memory usage

Hmmm. Is there a primer anywhere on memory usage in Java? I would have thought Sun or IBM would have had a good article on the subject but I can't find anything that looks really solid. I'm interested in knowing two things:
at runtime, figuring out how much memory the classes in my package are using at a given time
at design time, estimating general memory overhead requirements for various things like:
how much memory overhead is required for an empty object (in addition to the space required by its fields)
how much memory overhead is required when creating closures
how much memory overhead is required for collections like ArrayList
I may have hundreds of thousands of objects created and I want to be a "good neighbor" to not be overly wasteful of RAM. I mean I don't really care whether I'm using 10% more memory than the "optimal case" (whatever that is), but if I'm implementing something that uses 5x as much memory as I could if I made a simple change, I'd want to use less memory (or be able to create more objects for a fixed amount of memory available).
I found a few articles (Java Specialists' Newsletter and something from Javaworld) and one of the builtin classes java.lang.instrument.getObjectSize() which claims to measure an "approximation" (??) of memory use, but these all seem kind of vague...
(and yes I realize that a JVM running on two different OS's may be likely to use different amounts of memory for different objects)

I used JProfiler a number of years ago and it did a good job, and you could break down memory usage to a fairly granular level.

As of Java 5, on Hotspot and other VMs that support it, you can use the Instrumentation interface to ask the VM the memory usage of a given object. It's fiddly but you can do it.
In case you want to try this method, I've added a page to my web site on querying the memory size of a Java object using the Instrumentation framework.
As a rough guide in Hotspot on 32 bit machines:
objects use 8 bytes for
"housekeeping"
fields use what you'd expect them to
use given their bit length (though booleans tend to be allocated an entire byte)
object references use 4 bytes
overall obejct size has a
granularity of 8 bytes (i.e. if you
have an object with 1 boolean field
it will use 16 bytes; if you have an
object with 8 booleans it will also
use 16 bytes)
There's nothing special about collections in terms of how the VM treats them. Their memory usage is the total of their internal fields plus -- if you're counting this -- the usage of each object they contain. You need to factor in things like the default array size of an ArrayList, and the fact that that size increases by 1.5 whenever the list gets full. But either asking the VM or using the above metrics, looking at the source code to the collections and "working it through" will essentially get you to the answer.
If by "closure" you mean something like a Runnable or Callable, well again it's just a boring old object like any other. (N.B. They aren't really closures!!)

You can use JMP, but it's only caught up to Java 1.5.

I've used the profiler that comes with newer versions of Netbeans a couple of times and it works very well, supplying you with a ton of information about memory usage and runtime of your programs. Definitely a good place to start.

If you are using a pre 1.5 VM - You can get the approx size of objects by using serialization. Be warned though.. this can require double the amount of memory for that object.

See if PerfAnal will give you what you are looking for.

This might be not the exact answer you are looking for, but the bosts of the following link will give you very good pointers. Other Question about Memory

I believe the profiler included in Netbeans can moniter memory usage also, you can try that

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.