How to measure memory usage of a data structure? [duplicate] - java

This question already has answers here:
Calculate size of Object in Java [duplicate]
(3 answers)
Closed 1 year ago.
I am comparing a Trie with a HashMap storing English words, over 1 million. After the data is loaded, only lookup is performed. I am writing code to test both speed and memory. The speed seems easy to measure, simply recording the system time before and after the testing code.
What's the way to measure the memory usage of an object? In this case, it's either a Trie and HashMap. I watched the system performance monitor and tested in Eclipse. The OS performance monitor shows over 1G memory is used after my testing program is launched. I doubt the fact that storing the data needs so much memory.
Also, on my Windows machine, it shows that memory usage keeps rising throughout the testing time. This shouldn't happen, since the initial loading time of the data is short. And after that, during the lookup phrase, there shouldn't be any more additional memory consumption, since no new objects are created. On linux, the memory usage seems more stable, though it also increased some.
Would you please share some thoughts on this? Thanks a lot.

The short answer is: you can't.
The long answer is: you can calculate the size of objects in memory by repeating the differential memory analysis calling GC multiple times before and after the tests. But even then only a very large numbers or round can approximate the real size. You need a warmup phase first and even if it all seams to work just smoothly, you can get stuck with jit and other optimizations, you were not aware of.
In general it's a good rule of thumb to count the amount of objects you use.
If your tree implementation uses objects as structure representing the data, it is quite possible that your memory consumption is high, compared to a map.
If you have wast amount of data a map might become slow because of collisions.
A common approach is to optimize later in case optimization is needed.

Did you try the "jps" tool which is provided by Oracle in Java SDK? You can find this in JavaSDK/bin folder. Its a great tool for performance checking and even memory usage.

Related

Java Memory Usage Consumption

I am performing analysis of different sort algorithms. Currently I am analysing the Insertion Sort and Quick Sort. And as part of the analysis, I need to measure the memory consumption.
I am using Visual VM for profiling. However when I execute the Insertion Sort for a random data set of, let's say 70,000, I get different range of Heap Memory usage. For example, in the first run the heap memory consumption was 75 kbytes and then in the next round it drops to 35 kbytes. And if I execute it few more times then this value fluctuates randomly.
Is this normal or am I missing something here ? I have plot a graph of data size versus the memory consumption and with this fluctuation I won't be able to draw a chart.
java version "1.8.0_65"
This is Java's garbage collector at work, it kicks in at its own pace and does its job. Perhaps, it would be best for you to measure the amount of memory used after explicitly calling System.gc(), so that you're not taking notes of the garbage.
EDIT:
System.gc() should be called after you perform your tests, to explicitly request that garbage collector kicks in. While it is true that System.gc() is treated only as a request and it is not mathematically 100% sure that JVM will respect your request, it is most probably safe for your analysis, especially if you perform several runs of it.
With regards to measuring memory usage, it is quite tricky, especially for low values. Please see this answer which contains some nice details:
You may find JMH useful for running benchmarks while isolating side effects from the JVM.
Read through the code samples to understand how to use it.

how to measure the cpu usage of a method within a java program [duplicate]

This question already has answers here:
Measuring Java execution time, memory usage and CPU load for a code segment
(4 answers)
Closed 7 years ago.
I would like to measure the CPU usage within a Java application, for example the
System.out.println("...")
of HelloWorld application. Any idea? I do not want to measure the CPU usage of the entire application, only the methods within the application.
Is there a way to correlate the the total execution time of the application with the CPU usage of the entire application?
In my opinion, there is no point in profiling single methods. Because a) it is very easy to spot resource hogging methods (ie if you have to iterate through 10000 of input values to find only 100 valid values you want to output, then it is obvious that you are not effective and waste CPU time on 9900 useless iterations) and b) key of efficient design is to foresee the resource hog, and again using the same example as in point a, it is up to you as designer to find more efficient way of solving the problem, using leaner and faster data structure or just finding a way to trim the junk data from the method.
EDIT: And no, there is no way to correlate execution time and cpu usage, thanks to the java JIT and the fact that it uses garbage collector autonomously.

Java slower with big heap

I have a Java program that operates on a (large) graph. Thus, it uses a significant amount of heap space (~50GB, which is about 25% of the physical memory on the host machine). At one point, the program (repeatedly) picks one node from the graph and does some computation with it. For some nodes, this computation takes much longer than anticipated (30-60 minutes, instead of an expected few seconds). In order to profile these opertations to find out what takes so much time, I have created a test program that creates only a very small part of the large graph and then runs the same operation on one of the nodes that took very long to compute in the original program. Thus, the test program obviously only uses very little heap space, compared to the original program.
It turns out that an operation that took 48 minutes in the original program can be done in 9 seconds in the test program. This really confuses me. The first thought might be that the larger program spends a lot of time on garbage collection. So I turned on the verbose mode of the VM's garbage collector. According to that, no full garbage collections are performed during the 48 minutes, and only about 20 collections in the young generation, which each take less than 1 second.
So my questions is what else could there be that explains such a huge difference in timing? I don't know much about how Java internally organizes the heap. Is there something that takes significantly longer for a large heap with a large number of live objects? Could it be that object allocation takes much longer in such a setting, because it takes longer to find an adequate place in the heap? Or does the VM do any internal reorganization of the heap that might take a lot of time (besides garbage collection, obviously).
I am using Oracle JDK 1.7, if that's of any importance.
While bigger memory might mean bigger problems, I'd say there's nothing (except the GC which you've excluded) what could extend 9 seconds to 48 minutes (factor 320).
A big heap makes seemingly worse spatial locality possible, but I don't think it matters. I disagree with Tim's answer w.r.t. "having to leave the cache for everything".
There's also the TLB which a cache for the virtual address translation, which could cause some problems with very large memory. But again, not factor 320.
I don't think there's anything in the JVM which could cause such problems.
The only reason I can imagine is that you have some swap space which gets used - despite the fact that you have enough physical memory. Even slight swapping can be the cause for a huge slowdown. Make sure it's off (and possibly check swappiness).
Even when things are in memory you have multiple levels of caching of data on modern CPUs. Every time you leave the cache to fetch data the slower that will go. Having 50GB of ram could well mean that it is having to leave the cache for everything.
The symptoms and differences you describe are just massive though and I don't see something as simple as cache coherency making that much difference.
The best advice I can five you is to try running a profiler against it both when it's running slow and when it's running fast and compare the difference.
You need solid numbers and timings. "In this environment doing X took Y time". From that you can start narrowing things down.

Java Heap Hard Drive

I have been working on a Java program that generates fractal orbits for quite some time now. Much like photographs, the larger the image, the better it will be when scaled down. The program uses a 2D object (Point) array, which is written to when a point's value is calculated. That is to say the Point is stored in it's corresponding value, I.e.:
Point p = new Point(25,30);
histogram[25][30] = p;
Of course, this is edited for simplicity. I could just write the point values to a CSV, and apply them to the raster later, but using similar methods has yielded undesirable results. I tried for quite some time because I enjoyed being able to make larger images with the space freed by not having this array. It just won't work. For clarity I'd like to add that the Point object also stores color data.
The next problem is the WriteableRaster, which will have the same dimensions as the array. Combined the two take up a great deal of memory. I have come to accept this, after trying to change the way it is done several times, each with lower quality results.
After trying to optimize for memory and time, I've come to the conclusion that I'm really limited by RAM. This is what I would like to change. I am aware of the -Xmx switch (set to 10GB). Is there any way to use Windows' virtual memory to store the raster and/or the array? I am well aware of the significant performance hit this will cause, but in lieu of lowering quality, there really doesn't seem to be much choice.
The OS is already making hard drive space into RAM for you and every process of course -- no magic needed. This will be more of a performance disaster than you think; it will be so slow as to effectively not work.
Are you looking for memory-mapped files?
http://docs.oracle.com/javase/6/docs/api/java/nio/MappedByteBuffer.html
If this is really to be done in memory, I would bet that you could dramatically lower your memory usage with some optimization. For example, your Point object is mostly overhead and not data. Count up the bytes needed for the reference, then for the Object overhead, compared to two ints.
You could reduce the overhead to nothing with two big parallel int arrays for your x and y coordinates. Of course you'd have to encapsulate this for access in your code. But it could halve your memory usage for this data structure. Millions fewer objects also speeds up GC runs.
Instead of putting a WritableRaster in memory, consider writing out the image file in some simple image format directly, yourself. BMP can be very simple. Then perhaps using an external tool to efficiently convert it.
Try -XX:+UseCompressedOops to reduce object overhead too. Also try -XX:NewRatio=20 or higher to make the JVM reserve almost all its heap for long-lived objects. This can actually let you use more heap.
It is not recommended to configure your JVM memory parameters (Xmx) in order to make the operating system to allocate from it's swap memory. apparently the garbage collection mechanism needs to have random access to heap memory and if doesn't, the program will thrash for a long time and possibly lock up. please check the answer given already to my question (last paragraph):
does large value for -Xmx postpone Garbage Collection

How to estimate whether a given task would have enough memory to run in Java

I am developing an application that allows users to set the maximum data set size they want me to run their algorithm against
It has become apparent that array sizes around 20,000,000 in size causes an 'out of memory' error. Because I am invoking this via reflection, there is not really a great deal I can do about this.
I was just wondering, is there any way I can check / calculate what the maximum array size could be based on the users heap space settings and therefore validate user entry before running the application?
If not, are there any better solutions?
Use Case:
The user provides a data size they want to run their algorithm against, we generate a scale of numbers to test it against up to the limit they provided.
We record the time it takes to run and measure the values (in order to work out the o-notation).
We need to somehow limit the users input so as to not exceed or get this error. Ideally we want to measure n^2 algorithms on as bigger array sizes as we can (which could last in terms of runtime for days) therefore we really don't want it running for 2 days and then failing as it would have been a waste of time.
You can use the result of Runtime.freeMemory() to estimate the amount of available memory. However, it might be that actually a lot of memory is occupied by unreachable objects, which will be reclaimed by GC soon. So you might actually be able to use more memory than this. You can try invoking the GC before, but this is not guaranteed to do anything.
The second difficulty is to estimate the amount of memory needed for a number given by the user. While it is easy to calculate the size of an ArrayList with so many entries, this might not be all. For example, which objects are stored in this list? I would expect that there is at least one object per entry, so you need to add this memory too. Calculating the size of an arbitrary Java object is much more difficult (and in practice only possible if you know the data structures and algorithms behind the objects). And then there might be a lot of temporary objects creating during the run of the algorithm (for example boxed primitives, iterators, StringBuilders etc.).
Third, even if the available memory is theoretically sufficient for running a given task, it might be practically insufficient. Java programs can get very slow if the heap is repeatedly filled with objects, then some are freed, some new ones are created and so on, due to a large amount of Garbage Collection.
So in practice, what you want to achieve is very difficult and probably next to impossible. I suggest just try running the algorithm and catch the OutOfMemoryError.
Usually, catching errors is something you should not do, but this seems like an occasion where its ok (I do this in some similar cases). You should make sure that as soon as the OutOfMemoryError is thrown, some memory becomes reclaimable for GC. This is usually not a problem, as the algorithm aborts, the call stack is unwound and some (hopefully a lot of) objects are not reachable anymore. In your case, you should probably ensure that the large list is part of these objects which immediately become unreachable in the case of an OOM. Then you have a good chance of being able to continue your application after the error.
However, note that this is not a guarantee. For example, if you have multiple threads working and consuming memory in parallel, the other threads might as well receive an OutOfMemoryError and not be able to cope with this. Also the algorithm needs to support the fact that it might get interrupted at any arbitrary point. So it should make sure that the necessary cleanup actions are executed nevertheless (and of course you are in trouble if those need a lot of memory!).

Categories

Resources