I'm writing lots of stuff to log in bursts, and optimizing the data path. I build the log text with StringBuilder. What would be the most efficient initial capacity, memory management wise, so it would work well regardless of JVM? Goal is to avoid reallocation almost always, which should be covered by initial capacity of around 80-100. But I also want to waste as few bytes as possible, since the StringBuilder instance may hang around in buffer and wasted bytes crop up.
I realize this depends on JVM, but there should be some value, which would waste least bytes, no matter the JVM, sort of "least common denominator". I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead. Also, this might be considered a case of "premature optimization", but since the answer I am after is a "rule-of-a-thumb" number, knowing it would be useful in future too.
I'm not expecting "my best guess" answers (my own answer above is already that), I hope someone has researched this already and can share a knowledge-based answer.
Don't try to be smart in this case.
I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead.
In Java, this is based on totally arbitrary assumptions about the inner workings of a JVM. Java is not C. Byte-alignment and the like are absolutely not an issue the programmer can or should try to exploit.
If you know the (probable) maximum length of your strings you may use that for the initial size. Apart from that, any optimization attempts are simply in vain.
If you really know that vast amounts of your StringBuilders will be around for very long periods (which does not quite fit the concept of logging), and you really feel the need to try to persuade the JVM to save some bytes of heap space you may try and use trimToSize() after the string is built completely. But, again, as long as your strings don't waste megabytes each you really should go and focus on other problems in your application.
Well, I ended up testing this briefly myself, and then testing some more after comments, to get this edited answer.
Using JDK 1.7.0_07 and test app reporting VM name "Java HotSpot(TM) 64-Bit Server VM", granularity of StringBuilder memory usage is 4 chars, increasing at even 4 chars.
Answer: any multiple of 4 is equally good capacity for StringBuilder from memory allocation point of view, at least on this 64-bit JVM.
Tested by creating 1000000 StringBuilder objects with different initial capacities, in different test program executions (to have same initial heap state), and printing out ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed() before and after.
Printing out heap sizes also confirmed, that amount actually allocated from heap for each StringBuilder's buffer is an even multiple of 8 bytes, as expected since Java char is 2 bytes long. In other words, allocating 1000000 instances with initial capacity 1..4 takes about 8 megabytes less memory (8 bytes per instace), than allocating same number of isntances with initial capacity 5...8.
Related
We inherited a system which runs in production and started to fail every 10 hours recently. Basically, our internal software marks the system that is has failed if it is unresponsive for a minute. We found that our problem that our Full GC cycles last for 1.5 minutes, we use 30 GB heap. Now the problem is that we cannot optimize a lot in a short period of time and we cannot partition of our service quickly but we need to get rid of 1.5 minutes pauses as soon as possible as our system fails because of these pauses in production. For us, an acceptable delay is 20 milliseconds but not more. What will be the quickest way to tweak the system? Reduce the heap to trigger GCs frequently? Use System.gc() hints? Any other solutions? We use Java 8 default settings and we have more and more users - i.e. more and more objects created.
Some GC stat
You have a lot of retained data. There is a few options which are worth considering.
increase the heap to 32 GB, this has little impact if you have free memory. Looking again at your totals it appears you are using 32 GB rather than 30 GB, so this might not help.
if you don't have plenty of free memory, it is possible a small portion of your heap is being swapped as this can increase full GC times dramatically.
there might be some simple ways to make the data structures more compact. e.g. use compact strings, use primitives instead of wrappers e.g. long for a timestamp instead of Date or LocalDateTime. (long is about 1/8th the size)
if neither of these help, try moving some of the data off heap. e.g. Chronicle Map is a ConcurrentMap which uses off heap memory can can reduce you GC times dramatically. i.e. there is no GC overhead for data stored off heap. How easy this is to add highly depends on how your data is structured.
I suggest analysing how your data is structured to see if there is any easy ways to make it more efficient.
There is no one-size-fits-all magic bullet solution to your problem: you'll need to have a good handle on your application's allocation and liveness patterns, and you'll need to know how that interacts with the specific garbage collection algorithm you are running (function of version of Java and command line flags passed to java).
Broadly speaking, a Full GC (that succeeds in reclaiming lots of space) means that lots of objects are surviving the minor collections (but aren't being leaked). Start by looking at the size of your Eden and Survivor spaces: if the Eden is too small, minor collections will run very frequently, and perhaps you aren't giving an object a chance to die before its tenuring threshold is reached. If the Survivors are too small, objects are going to be promoted into the Old gen prematurely.
GC tuning is a bit of an art: you run your app, study the results, tweak some parameters, and run it again. As such, you will need a benchmark version of your application, one which behaves as close as possible to the production one but which hopefully doesn't need 10 hours to cause a full GC.
As you stated that you are running Java 8 with the default settings, I believe that means that your Old collections are running with a Serial collector. You might see some very quick improvements by switching to a Parallel collector for the Old generation (-XX:+UseParallelOldGC). While this might reduce the 1.5 minute pause to some number of seconds (depending on the number of cores on your box, and the number of threads you specify for GC), this will not reduce your max pause to to 20ms.
When this happened to me, it was due to a memory leak caused by a static variable eating up memory. I would go through all recent code changes and look for any possible memory leaks.
I have a Java program that operates on a (large) graph. Thus, it uses a significant amount of heap space (~50GB, which is about 25% of the physical memory on the host machine). At one point, the program (repeatedly) picks one node from the graph and does some computation with it. For some nodes, this computation takes much longer than anticipated (30-60 minutes, instead of an expected few seconds). In order to profile these opertations to find out what takes so much time, I have created a test program that creates only a very small part of the large graph and then runs the same operation on one of the nodes that took very long to compute in the original program. Thus, the test program obviously only uses very little heap space, compared to the original program.
It turns out that an operation that took 48 minutes in the original program can be done in 9 seconds in the test program. This really confuses me. The first thought might be that the larger program spends a lot of time on garbage collection. So I turned on the verbose mode of the VM's garbage collector. According to that, no full garbage collections are performed during the 48 minutes, and only about 20 collections in the young generation, which each take less than 1 second.
So my questions is what else could there be that explains such a huge difference in timing? I don't know much about how Java internally organizes the heap. Is there something that takes significantly longer for a large heap with a large number of live objects? Could it be that object allocation takes much longer in such a setting, because it takes longer to find an adequate place in the heap? Or does the VM do any internal reorganization of the heap that might take a lot of time (besides garbage collection, obviously).
I am using Oracle JDK 1.7, if that's of any importance.
While bigger memory might mean bigger problems, I'd say there's nothing (except the GC which you've excluded) what could extend 9 seconds to 48 minutes (factor 320).
A big heap makes seemingly worse spatial locality possible, but I don't think it matters. I disagree with Tim's answer w.r.t. "having to leave the cache for everything".
There's also the TLB which a cache for the virtual address translation, which could cause some problems with very large memory. But again, not factor 320.
I don't think there's anything in the JVM which could cause such problems.
The only reason I can imagine is that you have some swap space which gets used - despite the fact that you have enough physical memory. Even slight swapping can be the cause for a huge slowdown. Make sure it's off (and possibly check swappiness).
Even when things are in memory you have multiple levels of caching of data on modern CPUs. Every time you leave the cache to fetch data the slower that will go. Having 50GB of ram could well mean that it is having to leave the cache for everything.
The symptoms and differences you describe are just massive though and I don't see something as simple as cache coherency making that much difference.
The best advice I can five you is to try running a profiler against it both when it's running slow and when it's running fast and compare the difference.
You need solid numbers and timings. "In this environment doing X took Y time". From that you can start narrowing things down.
I have been working on a Java program that generates fractal orbits for quite some time now. Much like photographs, the larger the image, the better it will be when scaled down. The program uses a 2D object (Point) array, which is written to when a point's value is calculated. That is to say the Point is stored in it's corresponding value, I.e.:
Point p = new Point(25,30);
histogram[25][30] = p;
Of course, this is edited for simplicity. I could just write the point values to a CSV, and apply them to the raster later, but using similar methods has yielded undesirable results. I tried for quite some time because I enjoyed being able to make larger images with the space freed by not having this array. It just won't work. For clarity I'd like to add that the Point object also stores color data.
The next problem is the WriteableRaster, which will have the same dimensions as the array. Combined the two take up a great deal of memory. I have come to accept this, after trying to change the way it is done several times, each with lower quality results.
After trying to optimize for memory and time, I've come to the conclusion that I'm really limited by RAM. This is what I would like to change. I am aware of the -Xmx switch (set to 10GB). Is there any way to use Windows' virtual memory to store the raster and/or the array? I am well aware of the significant performance hit this will cause, but in lieu of lowering quality, there really doesn't seem to be much choice.
The OS is already making hard drive space into RAM for you and every process of course -- no magic needed. This will be more of a performance disaster than you think; it will be so slow as to effectively not work.
Are you looking for memory-mapped files?
http://docs.oracle.com/javase/6/docs/api/java/nio/MappedByteBuffer.html
If this is really to be done in memory, I would bet that you could dramatically lower your memory usage with some optimization. For example, your Point object is mostly overhead and not data. Count up the bytes needed for the reference, then for the Object overhead, compared to two ints.
You could reduce the overhead to nothing with two big parallel int arrays for your x and y coordinates. Of course you'd have to encapsulate this for access in your code. But it could halve your memory usage for this data structure. Millions fewer objects also speeds up GC runs.
Instead of putting a WritableRaster in memory, consider writing out the image file in some simple image format directly, yourself. BMP can be very simple. Then perhaps using an external tool to efficiently convert it.
Try -XX:+UseCompressedOops to reduce object overhead too. Also try -XX:NewRatio=20 or higher to make the JVM reserve almost all its heap for long-lived objects. This can actually let you use more heap.
It is not recommended to configure your JVM memory parameters (Xmx) in order to make the operating system to allocate from it's swap memory. apparently the garbage collection mechanism needs to have random access to heap memory and if doesn't, the program will thrash for a long time and possibly lock up. please check the answer given already to my question (last paragraph):
does large value for -Xmx postpone Garbage Collection
I'm developing a visualization app for Android (including older devices running Android 2.2).
The input model of my app contains an area, which typically consists of tens of thousands of vertices. Typical models have 50000-100000 vertices (each with an x,y,z float coord), i.e. they use up 600K-1200 kilobytes of total memory. The app requires all vertices are available in memory at any time. This is all I can share about this app (I'm not allowed to share high-level use cases), so I'm wondering if my below conclusions are correct and whether there is a better solution.
For example, assume there are count=50000 vertices. I see two solutions:
1.) My earlier solution was using an own VertexObj (better readability due to encapsulation, better locality when accessing individual coordinates):
public static class VertexObj {
public float x, y, z;
}
VertexObj mVertices = new VertexObj[count]; // 50,000 objects
2.) My other idea is using a large float[] instead:
float[] mVertices = new VertexObj[count * 3]; // 150,000 float values
The problem with the first solution is the big memory overhead -- we are on a mobile device where the app's heap might be limited to 16-24MB (and my app needs memory for other things too). According to the official Android pages, object allocation should be avoided when it is not truly necessary. In this case, the memory overhead can be huge even for 50,000 vertices:
First of all, the "useful" memory is 50000*3*4 = 600K (this is used up by float values). Then we have +200K overhead due to the VertexObj elements, and probably another +400K due to Java object headers (they're probably at least 8 bytes per object on Android, too). This is 600K "wasted" memory for 50,000 vertices, which is 100% overhead (!). In case of 100,000 vertices, the overhead is 1.2MB.
The second solution is much better, as it requires only the useful 600K for float values.
Apparently, the conclusion is that I should go with float[], but I would like to know the risks in this case. Note that my doubts might be related with lower-level (not strictly Android-specific) aspects of memory management as well.
As far as I know, when I write new float[300000], the app requests the VM to reserve a contiguous block of 300000*4 = 1200K bytes. (It happened to me in Android that I requested a 1MB byte[], and I got an OutOfMemoryException, even though the Dalvik heap had much more than 1MB free. I suppose this was because it could not reserve a contiguous block of 1MB.)
Since the GC of Android's VM is not a compacting GC, I'm afraid that if the memory is "fragmented", such a hugefloat[] allocation may result in an OOM. If I'm right here, then this risk should be handled. E.g. what about allocating more float[] objects (and each would store a portion such as 200KB)? Such linked-list memory management mechanisms are used by operating systems and VMs, so it sounds unusual to me that I would need to use it here (on application level). What am I missing?
If nothing, then I guess that the best solution is using a linked list of float[] objects (to avoid OOM but keep overhead small)?
The out of memory you are facing while allocating the float array is quite strange.
If the biggest countinous memory block available in the heap is smaller then the memory required by the float array, the heap increases his size in order to accomodate the required memory.
Of course, this would fail if the heap has already reach the maximum available to your application. This would mean, that your application has exausted the heap, and then release a significant number of objects that resulted in memory fragmentation, and no more heap to allocate. However, if this is the case, and assuming that the fragmented memory is enough to hold the float array (otherwise your application wouldn't run anyawy), it's just a matter of allocating order.
If you allocate the memory required for the float array during application startup, you have plenty of countinous memory for it. Then, you just let your application do the remaining stuff, as the countigous memory is already allocated.
You can easly check the memory blocks being allocated (and the free ones) using DDMS in Eclipse, selecting yout app, and pressing Update Heap button.
Just for the sake of of avoiding misleading you, I've tested it before post, allocationg several contigous memory bloocks of float[300000].
Regards.
I actually ran into a problem where I wanted to embed data for a test case. You'll have quite a fun time embedding huge arrays because Eclipse kept complaining when the function exceeded something like 65,535 bytes of data due to me declaring an array like that. However, this is actually a quite common approach.
The rest goes into optimization. The big question is this: would be worth the trouble doing all of that optimizing? If you aren't hard-up on RAM, you should be fine using 1.2 megs. There's also a chance that Java will whine if you have an array that large, but you can do things like use a fancier data structure like a LinkedList or chop up the array into smaller ones. For statically set data, I feel an array might be a good choice if you are reading it like crazy.
I know you can make .xml files for integers, so storing as an integer with a tactic like multiplying the value, reading it in, and then dividing it by a value would be another option. You can also put in things like text files into your assets folder. Just do it once in the application and you can read/write however you like.
As for double vs float, I feel that in your case, a math or science case, that doubles would be safer if you can pull it off. If you do any math, you'll have less chance of error with double, especially with an operation like multiplication. floats are usually faster. I'm not sure if Java does SIMD packing, but if it does, more floats can be packed into an SIMD register than doubles
I know that Java uses padding; objects have to be a multiple of 8 bytes. However, I dont see the purpose of it. What is it used for? What exactly is its main purpose?
Its purpose is alignment, which allows for faster memory access at the cost of some space. If data is unaligned, then the processor needs to do some shifts to access it after loading the memory.
Additionally, garbage collection is simplified (and sped up) the larger the size of the smallest allocation unit.
It's unlikely that Java has a requirement of 8 bytes (except on 64-bit systems), but since 32-bit architectures were the norm when Java was created it's possible that 4-byte alignment is required in the Java standard.
The accepted answer is speculation (but partially correct). Here is the real answer.
First of, to #U2EF1's credit, one of the benefits of 8-byte boundaries is that 8-bytes are the optimal access on most processors. However, there was more to the decision than that.
If you have 32-bit references, you can address up to 2^32 or 4 GB of memory (practically you get less though, more like 3.5 GB). If you have 64-bit references, you can address 2^64, which is terrabytes of memory. However, with 64-bit references, everything tends to slow down and take more space. This is due to the overhead of 32-bit processors dealing with 64-bits, and on all processors more GC cycles due to less space and more garbage collection.
So, the creators took a middle ground and decided on 35-bit references, which allow up to 2^35 or 32 GB of memory and take up less space so to have the same performance benefits of 32-bit references. This is done by taking a 32-bit reference and shifting it left 3 bits when reading, and shifting it right 3 bits when storing references. That means all objects must be aligned on 2^3 boundaries (8 bytes). These are called compressed ordinary object pointers or compressed oops.
Why not 36-bit references for accessing 64 GB of memory? Well, it was a tradeoff. You'd require a lot of wasted space for 16-byte alignments, and as far as I know the vast majority of processors receive no speed benefit from 16-byte alignments as opposed to 8-byte alignments.
Note that the JVM doesn't bother using compressed oops unless the maximum memory is set to be above 4 GB, which it does not by default. You can actually enable them with the -XX:+UsedCompressedOops flag.
This was back in the day of 32-bit VMs to provide the extra available memory on 64-bit systems. As far as I know, there is no limitation with 64-bit VMs.
Source: Java Performance: The Definitive Guide, Chapter 8
Data type sizes in Java are multiples of 8 bits (not bytes) because word sizes in most modern processors are multiples of 8-bits: 16-bits, 32-bits, 64-bits. In this way a field in an object can be made to fit ("aligned") in a word or words and waste as little space as possible, taking advantage of the underlying processor's instructions for operating on word-sized data.