In every example I see of the -Xmx flag in java documentation (and in examples on the web), it is always a power of 2. For example '-Xmx512m', '-Xmx1024m', etc.
Is this a requirement, or is it a good idea for some reason?
I understand that memory in general comes in powers of 2; this question is specifically about the java memory flags.
It keeps things simple, but it is more for your benefit than anything else.
There is no particular reason to pick a power of 2, or a multiple of 50 MB (also common) e.g. -Xmx400m or -Xmx750m
Note: the JVM doesn't follow this strictly. It will use this to calculate the sizes of different regions which if you add them up tends to be lightly less than the number you provide. In my experience the heap is 1% - 2% less, but if you consider all the other memory regions the JVM uses, this doesn't make much difference.
Note: memory sizes for hardware typically a power of two (on some PCs it was 3x a power of two) This might have got people into the habit of thinking of a memory sizes as a power of two.
BTW: AFAIK, in most JVMs the actual size of each region is a multiple of the page size i.e. 4 KB.
Related
At my company we are trying an approach with JVM based microservices. They are designed to be scaled horizontally and so we run multiple instances of each using rather small containers (up to 2G heap, usually 1-1.5G). The JVM we use is 1.8.0_40-b25.
Each of such instances typically handles up to 100 RPS with max memory allocation rate around 250 MB/s.
The question is: what kind of GC would be a safe sensible default to start off with? So far we are using CMS with Xms = Xmx (to avoid pauses during heap resizing) and Xms = Xmx = 1.5G. Results are decent - we hardly ever see any Major GC performed.
I know that G1 could give me smaller pauses (at the cost of total throughput) but AFAIK it requires a bit more "breathing" space and at least 3-4G heap to perform properly.
Any hints (besides going for Azul's Zing :D) ?
Hint # 1: Do experiments!
Assuming that your microservice is deployed at least on two nodes run one on CMS, another on G1 and see what response times are.
Not very likely, but what if you can find that with G1 performance is so good that need half of original cluster size?
Side notes:
re: "250Mb/s" -> if all of this is stack memory (alternatively, if it's young gen) then G1 would provide little benefit since collection form these areas is free.
re: "100 RPS" -> in many cases on our production we found that reducing concurrent requests in system (either via proxy config, or at application container level) improves throughput. Given small heap it's very likely that you have small cpu number as well (2 to 4).
Additionally there are official Oracle Hints on tuning for a small memory footprint. It might not reflect latest config available on 1.8_40, but it's good read anyway.
Measure how much memory is retained after a full GC. add to this the amount of memory allocated per second and multiply by 2 - 10 depending on how often you would like to have a minor GC. e.g. every 2 second or every 10 second.
E.g. say you have up to 500 MB retained after a full GC and GCing every couple of seconds is fine, you can have 500 MB + 2 * 250 MB, or a heap of around 1 GB.
The number of RPS is not important.
I am reading about the feature in Java 8 update 20 for String deduplication (more info) but I am not sure if this basically makes String.intern() obsolete.
I know that this JVM feature needs the G1 garbage collector, which might not be an option for many, but assuming one is using G1GC, is there any difference/advantage/disadvantage of the automatic deduplication done by the JVM vs manually having to intern your strings (one obvious one is the advantage of not having to pollute your code with calls to intern())?
This is especially interesting considering that Oracle might make G1GC the default GC in java 9
With this feature, if you have 1000 distinct String objects, all with the same content "abc", JVM could make them share the same char[] internally. However, you still have 1000 distinct String objects.
With intern(), you will have just one String object. So if memory saving is your concern, intern() would be better. It'll save space, as well as GC time.
However, the performance of intern() isn't that great, last time I heard. You might be better off by having your own string cache, even using a ConcurrentHashMap ... but you need to benchmark it to make sure.
As a comment references, do see: http://java-performance.info/string-intern-in-java-6-7-8/. It is very insightful reference and I learned a lot, however I'm not sure its conclusions are necessarily "one size fits all". Each aspect depends on the needs of your own application - taking measurements of realistic input data is highly recommended!
The main factor probably depends on what you are in control over:
Do you have full control over the choice of GC? In a GUI application for example, there is still a strong case to be made for using Serial GC. (far lower total memory footprint for the process - think 400 MB vs ~1 GB for a moderately complex app, and being much more willing release memory, e.g. after a transient spike in usage). So you might pick that or give your users the option. (If the heap remains small the pauses should not be a big deal).
Do you have full control over the code? The G1GC option is great for 3rd party libraries (and applications!) which you can't edit.
The second consideration (as per #ZhongYu's answer) is that String.intern can de-duplication the String objects themselves, whereas G1GC necessarily can only de-duplicate their private char[] field.
A third consideration may be CPU usage, say if impact on laptop battery life might be of concern to your users. G1GC will run an extra thread dedicated to de-duplicating the heap. For example, I played with this to run Eclipse and found it caused an initial period of increased CPU activity after starting up (think 1 - 2 minutes) but it settled on a smaller heap "in-use" and no obvious (just eye-balling the task manager) CPU overhead or slow-down thereafter. So I imagine a certain % of a CPU core will be taken up on de-duplication (during? after?) periods of high memory-churn. (Of course there may be a comparable overhead if you call String.intern everywhere, which would also runs in serial, but then...)
You probably don't need string de-duplication everywhere. There are probably only certain areas of code that:
really impact long-term heap usage, and
create a high proportion of duplicate strings
By using String.intern selectively, other parts of the code (which may create temporary or semi-temporary strings) don't pay the price.
And finally, a quick plug for the Guava utility: Interner, which:
Provides equivalent behavior to String.intern() for other immutable types
You can also use that for Strings. Memory probably is (and should be) your top performance concern, so this probably doesn't apply often: however when you need to squeeze every drop of speed out of some hot-spot area, my experience is that Java-based weak-reference HashMap solutions do run slightly but consistently faster than the JVM's C++ implementation of String.intern(), even after tuning the jvm options. (And bonus: you don't need to tune the JVM options to scale to different input.)
I want to introduce another decision factor regarding the targeted audience:
For a system integrator having a system composed by many different libraries/frameworks, with low capacity to influence those libraries internal development, StringDeDuplication could be a quick winner if memory is a problem. It will affect all the Strings in the JVM, but G1 will use only spare time to do it. You may even tweak when DeDuplication is calculated by using another parameter(StringDeduplicationAgeThreshold)
For a developer profiling his own code, String.intern could be more interesting. Thoughful review of the domain model is necessary to decide whether to call intern, and when. As rule of thumb you may use intern when you know the String will contain a limited set of values, like a kind of enumerated set (i.e. Country name, month, day of week...).
I'm writing lots of stuff to log in bursts, and optimizing the data path. I build the log text with StringBuilder. What would be the most efficient initial capacity, memory management wise, so it would work well regardless of JVM? Goal is to avoid reallocation almost always, which should be covered by initial capacity of around 80-100. But I also want to waste as few bytes as possible, since the StringBuilder instance may hang around in buffer and wasted bytes crop up.
I realize this depends on JVM, but there should be some value, which would waste least bytes, no matter the JVM, sort of "least common denominator". I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead. Also, this might be considered a case of "premature optimization", but since the answer I am after is a "rule-of-a-thumb" number, knowing it would be useful in future too.
I'm not expecting "my best guess" answers (my own answer above is already that), I hope someone has researched this already and can share a knowledge-based answer.
Don't try to be smart in this case.
I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead.
In Java, this is based on totally arbitrary assumptions about the inner workings of a JVM. Java is not C. Byte-alignment and the like are absolutely not an issue the programmer can or should try to exploit.
If you know the (probable) maximum length of your strings you may use that for the initial size. Apart from that, any optimization attempts are simply in vain.
If you really know that vast amounts of your StringBuilders will be around for very long periods (which does not quite fit the concept of logging), and you really feel the need to try to persuade the JVM to save some bytes of heap space you may try and use trimToSize() after the string is built completely. But, again, as long as your strings don't waste megabytes each you really should go and focus on other problems in your application.
Well, I ended up testing this briefly myself, and then testing some more after comments, to get this edited answer.
Using JDK 1.7.0_07 and test app reporting VM name "Java HotSpot(TM) 64-Bit Server VM", granularity of StringBuilder memory usage is 4 chars, increasing at even 4 chars.
Answer: any multiple of 4 is equally good capacity for StringBuilder from memory allocation point of view, at least on this 64-bit JVM.
Tested by creating 1000000 StringBuilder objects with different initial capacities, in different test program executions (to have same initial heap state), and printing out ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed() before and after.
Printing out heap sizes also confirmed, that amount actually allocated from heap for each StringBuilder's buffer is an even multiple of 8 bytes, as expected since Java char is 2 bytes long. In other words, allocating 1000000 instances with initial capacity 1..4 takes about 8 megabytes less memory (8 bytes per instace), than allocating same number of isntances with initial capacity 5...8.
I know that Java uses padding; objects have to be a multiple of 8 bytes. However, I dont see the purpose of it. What is it used for? What exactly is its main purpose?
Its purpose is alignment, which allows for faster memory access at the cost of some space. If data is unaligned, then the processor needs to do some shifts to access it after loading the memory.
Additionally, garbage collection is simplified (and sped up) the larger the size of the smallest allocation unit.
It's unlikely that Java has a requirement of 8 bytes (except on 64-bit systems), but since 32-bit architectures were the norm when Java was created it's possible that 4-byte alignment is required in the Java standard.
The accepted answer is speculation (but partially correct). Here is the real answer.
First of, to #U2EF1's credit, one of the benefits of 8-byte boundaries is that 8-bytes are the optimal access on most processors. However, there was more to the decision than that.
If you have 32-bit references, you can address up to 2^32 or 4 GB of memory (practically you get less though, more like 3.5 GB). If you have 64-bit references, you can address 2^64, which is terrabytes of memory. However, with 64-bit references, everything tends to slow down and take more space. This is due to the overhead of 32-bit processors dealing with 64-bits, and on all processors more GC cycles due to less space and more garbage collection.
So, the creators took a middle ground and decided on 35-bit references, which allow up to 2^35 or 32 GB of memory and take up less space so to have the same performance benefits of 32-bit references. This is done by taking a 32-bit reference and shifting it left 3 bits when reading, and shifting it right 3 bits when storing references. That means all objects must be aligned on 2^3 boundaries (8 bytes). These are called compressed ordinary object pointers or compressed oops.
Why not 36-bit references for accessing 64 GB of memory? Well, it was a tradeoff. You'd require a lot of wasted space for 16-byte alignments, and as far as I know the vast majority of processors receive no speed benefit from 16-byte alignments as opposed to 8-byte alignments.
Note that the JVM doesn't bother using compressed oops unless the maximum memory is set to be above 4 GB, which it does not by default. You can actually enable them with the -XX:+UsedCompressedOops flag.
This was back in the day of 32-bit VMs to provide the extra available memory on 64-bit systems. As far as I know, there is no limitation with 64-bit VMs.
Source: Java Performance: The Definitive Guide, Chapter 8
Data type sizes in Java are multiples of 8 bits (not bytes) because word sizes in most modern processors are multiples of 8-bits: 16-bits, 32-bits, 64-bits. In this way a field in an object can be made to fit ("aligned") in a word or words and waste as little space as possible, taking advantage of the underlying processor's instructions for operating on word-sized data.
I am new to Java, I just gone through heap dump analysis using eclispe's MAT. So i just wanted to below points
a) understand what is Shallow and retained heap size and how it is being calculated? It would be great if you provide example.
b) For performance issue,most of people adviced me to keep minimum and maximum heap size same, so is is ok or it is subjective to application to application?
Shallow heap is the memory consumed by one object. An object needs 32 or 64 bits (depending on the OS architecture) per reference, 4 bytes per Integer, 8 bytes per Long, etc. Depending on the heap dump format the size may be adjusted
Retained heap of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.
Refer this link.
JVM sizing and tuning is not an exact science, so taking a blanket decision to set min and max size the same is not necessarily the best config to choose.
I find this to be an excellent explanation of the sorts of decisions that you need to make. In addressing your question, it says :
Setting -Xms and -Xmx to the same value increases predictability by removing the most important sizing decision from the virtual machine. However, the virtual machine is then unable to compensate if you make a poor choice.
If you set the min and max the same, but have an inappropriately sized Eden space, you may still end up with poor garbage collection performance.
The only way to make decisions about sizing is to run your application under a variety of workloads and see how the GC performs.