More flexible memory management - java

In the past I have made most of my memory and performance critical applications with C++ or C#, but with the latest improvements in the Java language I figured I might give it a try. However I am already stuck pretty early with the memory management in Java. More specifically the following two points are really surprising me:
Why do I have to tell the JVM how much memory it can use? Couldn't it just use whatever it wants? I mean... take whatever you need?
Why is it so greedy with memory? Cant it be a bit more generous in giving back memory to the OS? Check the following example:
The example I mentioned above:
Started with 2048mb of heap space
At time T0 the application uses 300mb of RAM
I open and fully read to a byte array a 400mb file -> 700mb of RAM
I wrap that array into a ByteBuffer -> 700mb of RAM
I decrypt it into a second array (cant be done in place) -> 1100mb of RAM
I close the file and clear the ByteBuffer and set the encrypted array to null -> 1100mb of RAM
I parse the decrypted array (which produces a little bit less data) -> 1350mb of RAM
I set the decrypted array to null -> 1350mb of RAM
I wait for a while -> 1350mb of RAM
If I repeat the above with 1028mb of heap space -> OutOfMemoryException
So my question is: Why is Java behaving that way? And more importantly, can I tell the JVM to be a bit more... sane? C# is also a managed language, but it manages to properly free unused memory.

Why do I have to tell the JVM how much memory it can use?
You don't. It has a reasonable default maximum of 1/4 of main memory up to 32 GB.
Couldn't it just use whatever it wants?
It could, you could set the maximum to be all your free memory.
I mean... take whatever you need?
It does this, but up to some maximum you set so it doesn't impact other applications.
Why is it so greedy with memory?
It depends on how you use it.
Cant it be a bit more generous in giving back memory to the OS?
That depends on which GC you use.
Started with 2048mb of heap space
I assume you mean 2 GB or 2048 MB (mb is a milli-bit and a pet hate of mine sorry)
This is a really small amount of memory these days, I assume this is just an example. c.f. my 9 year old has an old desktop of mine with 24 GB of memory.
I open and fully read to a byte array a 400mb file -> 700mb of RAM
I would memory map the file. This uses almost no heap. BTW You can memory map in C and C# too, this is not a trick specific to Java.
I wrap that array into a ByteBuffer -> 700mb of RAM
A memory mapped file is already a ByteBuffer. at this point your heap is no bigger.
I decrypt it into a second array (cant be done in place) -> 1100mb of RAM
I would do this to another "direct" buffer. again no more heap has been used.
I parse the decrypted array (which produces a little bit less data) -> 1350mb of RAM
So this uses an extra 250 MB.
I set the decrypted array to null -> 1350mb of RAM
This changes the reference to be null but nothing else. That is just one of your buffers above in any case.
I wait for a while -> 1350mb of RAM
If you do nothing, you wouldn't expect anything to happen.
If I repeat the above with 1028mb of heap space -> OutOfMemoryException
This is because you are retaining memory and you have less the second time around.
In short, I would
use memory mapped files.
use native buffers
note you can clear these deterministically if you really need to, but usually you don't.
don't touch the maximum heap unless you need to.
And more importantly, can I tell the JVM to be a bit more... sane?
I suspect "sane" is in the eye of the beholder.

Related

Does Direct Memory affect compressed Pointers in Java?

I am aware that once Java heap size grows past 32GB, we lose the benefits of compressed pointers and may have less effective memory (compared to 32GB) until the total heap reaches ~48GB.
Does Direct Memory usage affect the determination to use compressed pointers or not? For example, will I still be able to use them with settings like -Xmx28G -XX:MaxDirectMemorySize=12G?
I am aware that once Java heap size grows past 32GB, we lose the benefits of compressed pointers and may have less effective memory (compared to 32GB) until the total heap reaches ~48GB.
You can increase the object alignment to 16 (in Java) allowing you to use CompressedOops up to 64 GB.
Does Direct Memory usage affect the determination to use compressed pointers or not?
The direct memory is just native memory like the thread stacks, GUI components, shared libraries etc. They are not part of the heap, nor is the meta space.
For example, will I still be able to use them with settings like -Xmx28G -XX:MaxDirectMemorySize=12G
You can have -XX:MaxDirectMemorySize=1024G if you like, this is not part of the heap.

Java Heap Size Reduction

BACKGROUND
I recently wrote a java application that consumes a specified amount of MB. I am doing this purposefully to see how another Java application reacts to specific RAM loads (I am sure there are tools for this purpose, but this was the fastest). The memory consumer app is very simple. I enter the number of MB I want to consume and create a vector of that many bytes. I also have a reset button that removes the elements of the vector and prompts for a new number of bytes.
QUESTION
I noticed that the heap size of the java process never reduces once the vector is cleared. I tried clear(), but the heap remains the same size. It seems like the heap grows with the elements, but even though the elements are removed the size remains. Is there a way in java code to reduce heap size? Is there a detail about the java heap that I am missing? I feel like this is an important question because if I wanted to keep a low memory footprint in any java application, I would need a way to keep the heap size from growing or at least not large for long lengths of time.
Try garbage collection by making call to System.gc()
This might help you - When does System.gc() do anything
Calling GC extensively is not recommended.
You should provide max heap size with -Xmx option, and watch memory allocation by you app. Also use weak references for objects which have short time lifecycle and GC remove them automatically.

Most efficient initial capacity size for StringBuilder?

I'm writing lots of stuff to log in bursts, and optimizing the data path. I build the log text with StringBuilder. What would be the most efficient initial capacity, memory management wise, so it would work well regardless of JVM? Goal is to avoid reallocation almost always, which should be covered by initial capacity of around 80-100. But I also want to waste as few bytes as possible, since the StringBuilder instance may hang around in buffer and wasted bytes crop up.
I realize this depends on JVM, but there should be some value, which would waste least bytes, no matter the JVM, sort of "least common denominator". I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead. Also, this might be considered a case of "premature optimization", but since the answer I am after is a "rule-of-a-thumb" number, knowing it would be useful in future too.
I'm not expecting "my best guess" answers (my own answer above is already that), I hope someone has researched this already and can share a knowledge-based answer.
Don't try to be smart in this case.
I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead.
In Java, this is based on totally arbitrary assumptions about the inner workings of a JVM. Java is not C. Byte-alignment and the like are absolutely not an issue the programmer can or should try to exploit.
If you know the (probable) maximum length of your strings you may use that for the initial size. Apart from that, any optimization attempts are simply in vain.
If you really know that vast amounts of your StringBuilders will be around for very long periods (which does not quite fit the concept of logging), and you really feel the need to try to persuade the JVM to save some bytes of heap space you may try and use trimToSize() after the string is built completely. But, again, as long as your strings don't waste megabytes each you really should go and focus on other problems in your application.
Well, I ended up testing this briefly myself, and then testing some more after comments, to get this edited answer.
Using JDK 1.7.0_07 and test app reporting VM name "Java HotSpot(TM) 64-Bit Server VM", granularity of StringBuilder memory usage is 4 chars, increasing at even 4 chars.
Answer: any multiple of 4 is equally good capacity for StringBuilder from memory allocation point of view, at least on this 64-bit JVM.
Tested by creating 1000000 StringBuilder objects with different initial capacities, in different test program executions (to have same initial heap state), and printing out ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed() before and after.
Printing out heap sizes also confirmed, that amount actually allocated from heap for each StringBuilder's buffer is an even multiple of 8 bytes, as expected since Java char is 2 bytes long. In other words, allocating 1000000 instances with initial capacity 1..4 takes about 8 megabytes less memory (8 bytes per instace), than allocating same number of isntances with initial capacity 5...8.

Setting a smaller JVM heap size within a JNI client application

I'm attempting to debug a problem with pl/java, a procedural language for PostgreSQL. I'm running this stack on a Linux server.
Essentially, each Postgres backend (connection process) must start its own JVM, and does so using the JNI. This is generally a major limitation of pl/java, but it has one particularly nasty manifestation.
If native memory runs out (I realise that this may not actually be due to malloc() returning NULL, but the effect is about the same), this failure is handled rather poorly. It results in an OutOfMemoryError due to "native memory exhaustion". This results in a segfault of the Postgres backend, originating from within libjvm.so, and a javacore file that says something like:
0SECTION TITLE subcomponent dump routine
NULL ===============================
1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received
1TIDATETIME Date: 2012/09/13 at 16:36:01
1TIFILENAME Javacore filename: /var/lib/PostgreSQL/9.1/data/javacore.20120913.104611.24742.0002.txt
***SNIP***
Now, there are reasonably well-defined ways of ameliorating these types of problems with Java, described here:
http://www.ibm.com/developerworks/java/library/j-nativememory-linux/
I think that it would be particularly effective if I could set the maximum heap size to a value that is far lower than the default. Ordinarily, it is possible to do something along these lines:
The heap's size is controlled from the Java command line using the -Xmx and -Xms options (mx is the maximum size of the heap, ms is the initial size). Although the logical heap (the area of memory that is actively used) can grow and shrink according to the number of objects on the heap and the amount of time spent in GC, the amount of native memory used remains constant and is dictated by the -Xmx value: the maximum heap size. Most GC algorithms rely on the heap being allocated as a contiguous slab of memory, so it's impossible to allocate more native memory when the heap needs to expand. All heap memory must be reserved up front.
However, it is not apparent how I can follow these steps such that pl/java's JNI initialisation initialises a JVM with a smaller heap; I can't very well pass these command line arguments to Postgres. So, my question is, how can I set the maximum heap size or otherwise control these problems in this context specifically? This appears to be a general problem with pl/java, so I expect to be able to share whatever solution I eventually arrive at with the Postgres community.
Please note that I am not experienced with JVM internals, and am not generally familiar with Java.
Thanks
According to slide 19 in this presentation postgresql.conf can have the parameter pljava.vmoptions where you can pass arguments to the JVM.

Why is the maximum size of the Java heap fixed?

It is not possible to increase the maximum size of Java's heap after the VM has started. What are the technical reasons for this? Do the garbage collection algorithms depend on having a fixed amount of memory to work with? Or is it for security reasons, to prevent a Java application from DOS'ing other applications on the system by consuming all available memory?
In Sun's JVM, last I knew, the entire heap must be allocated in a contiguous address space. I imagine that for large heap values, it's pretty hard to add to your address space after startup while ensuring it stays contiguous. You probably need to get it at startup, or not at all. Thus, it is fixed.
Even if it isn't all used immediately, the address space for the entire heap is reserved at startup. If it cannot reserve a large enough contiguous block of address space for the value of -Xmx that you pass it, it will fail to start. This is why it's tough to allocate >1.4GB heaps on 32-bit Windows - because it's hard to find contiguous address space in that size or larger, since some DLLs like to load in certain places, fragmenting the address space. This isn't really an issue when you go 64-bit, since there is so much more address space.
This is almost certainly for performance reasons. I could not find a terrific link detailing this further, but here is a pretty good quote from Peter Kessler (full link - be sure to read the comments) that I found when searching. I believe he works on the JVM at Sun.
The reason we need a contiguous memory
region for the heap is that we have a
bunch of side data structures that are
indexed by (scaled) offsets from the
start of the heap. For example, we
track object reference updates with a
"card mark array" that has one byte
for each 512 bytes of heap. When we
store a reference in the heap we have
to mark the corresponding byte in the
card mark array. We right shift the
destination address of the store and
use that to index the card mark array.
Fun addressing arithmetic games you
can't do in Java that you get to (have
to :-) play in C++.
This was in 2004 - I'm not sure what's changed since then, but I am pretty sure it still holds. If you use a tool like Process Explorer, you can see that the virtual size (add the virtual size and private size memory columns) of the Java application includes the total heap size (plus other required space, no doubt) from the point of startup, even though the memory 'used' by the process will be no where near that until the heap starts to fill up...
Historically there has been a reason for this limitiation, which was not to allow Applets in the browser to eat up all of the users memory. The Microsoft VM which never had such a limitiation actually allowed to do this which could lead to some sort of Denial of Service attack against the users computer. It was only a year ago that Sun introduced in the 1.6.0 Update 10 VM a way to let applets specify how much memory they want (limited to a certain fixed share of the physical memory) instead of always limiting them to 64MB even on computers that have 8GB or more available.
Now since the JVM has evolved it should have been possible to get rid of this limitation when the VM is not running inside a browser, but Sun obviously never considered it such a high priority issue even though there have been numerous bug reports been filed to finally allow the heap to grow.
I think the short, snarky, answer is because Sun hasn't found it worth the time and cost to develop.
The most compelling use case for such a feature is on the desktop, IMO, and Java has always been a disaster on the desktop when it comes to the mechanics of launching the JVM. I suspect that those who think the most about those issues tend to focus on the server side and view any other details best left to native wrappers. It is an unfortunate decision, but it should just be one of the decision points when deciding on the right platform for an application.
My gut feel is that it has to do with memory management with respect to the other applications running on the operating system.
If you set the maximum heap size to, for example, the amount of RAM on the box you effectively let the VM decide how much memory it requires (up to this limit). The problem with this is that the VM could effectively cripple the machine it is running on because it will take over all the memory on the box before it decides that it needs to garbage collect.
When you specify max heap size, what you're saying to the VM is, you are allowed to use this amount of memory before you need to start garbage collecting. You cannot have more because if you take more then the other applications running on the box will slow down and you will start swapping to the disk if you use more than this.
Also be aware that they are two values with respect to memory, that is "current heap size" and "max heap size". The current heap size is how much memory the heap size is currently using and, if it requires more it can resize the heap but it cannot resize the heap above the value of maximum heap size.
From IBM's performance tuning tips (so may not be directly applicable to Sun's VMs)
The Java heap parameters influence the behavior of garbage collection. Increasing the heap size supports more object creation. Because a large heap takes longer to fill, the application runs longer before a garbage collection occurs. However, a larger heap also takes longer to compact and causes garbage collection to take longer.
The JVM has thresholds it uses to manage the JVM's storage. When the thresholds are reached, the garbage collector gets invoked to free up unused storage. Therefore, garbage collection can cause significant degradation of Java performance. Before changing the initial and maximum heap sizes, you should consider the following information:
In the majority of cases you should set the maximum JVM heap size to value higher than the initial JVM heap size. This allows for the JVM to operate efficiently during normal, steady state periods within the confines of the initial heap but also to operate effectively during periods of high transaction volume by expanding the heap up to the maximum JVM heap size. In some rare cases where absolute optimal performance is required you might want to specify the same value for both the initial and maximum heap size. This will eliminate some overhead that occurs when the JVM needs to expand or contract the size of the JVM heap. Make sure the region is large enough to hold the specified JVM heap.
Beware of making the Initial Heap Size too large. While a large heap size initially improves performance by delaying garbage collection, a large heap size ultimately affects response time when garbage collection eventually kicks in because the collection process takes more time.
So, I guess the reason that you can't change the value at runtime is because it may not help: either you have enough space in your heap or you don't. Once you run out, a GC cycle will be triggered. If that doesn't free up the space, you're stuffed anyway. You'd need to catch the OutOfMemoryException, increase the heap size, and then retry you calculation, hoping that this time you have enough memory.
In general the VM won't use the maximum heap size unless you need it, so if you think you might need to expand the memory at runtime, you could just specify a large maximum heap size.
I admit that's all a bit unsatisfying, and seems a bit lazy, since I can imagine a reasonable garbage collection strategy which would increase the heap size when GC fails to free enough space. Whether my imagination translates to a high performance GC implementation is another matter though ;)

Categories

Resources