Lowering memory footprint of data structure in Java

Lowering memory footprint of data structure in Java - java

I am working on an in-memory storage solution that stores data samples. This is for a multithreaded trending application that has items constantly being written into the storage array and items being removed from it periodically. It will store the latest 24 hours of samples. I need to be able to grab the data in full, or partially. Because of these requirements I chose to use a CopyOnWriteArrayList.
The storage solution is stored in a
CopyOnWriteArrayList<Point>.
I have two classes that I wrote to house the data: Point and Samples.
Point consists of:
private int ioId;
private int machineId;
private jPointType pointType; //(int)
private String subfield;
private long startTimestamp;
private long endTimestamp;
private int pruneLock;
private jTrendReqType predefType; //(int)
CopyOnWriteArrayList<Sample> dataList;
Sample consists of:
Long timestamp;
double data;
I'm currently testing with 2/sec data and 30 points (7200 Samples per each point). I run the test for 1 hour and see an increase of about 10MB of usage via Task Manager. This ends up being about 45 bytes per sample. This seems to be quite high for the type of data that I am storing. Is there that much Java overhead or am I doing something inefficiently?

Well, let's look at your Sample class:
Long timestamp;
double data;
From this other answer, a Long takes about 16 bytes (8 bytes for the long and 8 bytes overhead.
The double has a memory footprint of 8 bytes in Java.
The Sample object reference adds at least 8 bytes (it needs to store a long to reference the memory address).
So just by itself, the Sample object is 32 bytes long.
However, you calculated an average storage size of 45 bytes per Sample.
So possible other causes:
Point objects contain String which are about 8 bytes + 2 bytes * length
Overhead from CopyOnWriteArrayList's implementation
Unfreed memory - memory that is not used, but not yet released by the JVM.
The most likely cause however is probably unfreed memory. Due to how Java operates, memory is only freed when garbage collection(GC) is run (and there's no guaranteed way of forcing it to run). Because you're using CopyOnWriteArrayList, you're constantly creating new lists behind the scenes as you're adding objects and the JVM just haven't released them just yet because GC hasn't run.
Here's a link to some Oracle documentation on Java's Garbage Collection mechanism.

Related

What is an overhead for creating Java objects from lines of csv file

the code reads lines of CSV file like:
Stream<String> strings = Files.lines(Paths.get(filePath))
then it maps each line in the mapper:
List<String> tokens = line.split(",");
return new UserModel(tokens.get(0), tokens.get(1), tokens.get(2), tokens.get(3));
and finally collects it:
Set<UserModel> current = currentStream.collect(toSet())
File size is ~500MB
I've connected to the server using jconsole and see that heap size grew from 200MB to 1.8GB while processing.
I can't understand where this x3 memory usage came from - I expected something like 500MB spike or so?
My first impression was it's because there is no throttling and garbage collector simply doesn't have enough time for cleanup.
But I've tried to use guava rate limiter to let garbage collector time to do it's job but result is the same.

Tom Hawtin made good points - I just wanna expand on them and provide a bit more details.
Java Strings take at least 40 bytes of memory (that's for empty string) due to java object header (see later) overhead and an internal byte array.
That means the minimal size for non-empty string (1 or more characters) is 48 bytes.
Nowawadays, JVM uses Compact Strings which means that ASCII-only strings only occupy 1 byte per character - before it was 2 bytes per char minimum.
That means if your file contains characters beyond ASCII set, then memory usage can grow significantly.
Streams also have more overhead compared to plain iteration with arrays/lists (see here Java 8 stream objects significant memory usage)
I guess your UserModel object adds at least 32 bytes overhead on top of each line, because:
the minimum size of java object is 16 bytes where first 12 bytes are the JVM "overhead": object's class reference (4 bytes when Compressed Oops are used) + the Mark word (used for identity hash code, Biased locking, garbage collectors)
and the next 4 bytes are used by the reference to the first "token"
and the next 12 bytes are used by 3 references to the second, third and fourth "token"
and the last 4 bytes are required due to Java Object Alignment at 8-byte boundaries (on 64-bit architectures)
That being said, it's not clear whether you even use all the data that you read from the file - you parse 4 tokens from a line but maybe there are more?
Moreover, you didn't mention how exactly the heap size "grew" - If it was the commited size or the used size of the heap. The used portion is what actually is being "used" by live objects, the commited portion is what has been allocated by the JVM at some point but could be garbage-collected later; used < commited in most cases.
You'd have to take a heap snapshot to find out how much memory actually the result set of UserModel occupies and that would actually be interesting to compare to the size of the file.

It may be that the String implementation is using UTF-16 whereas the file may be using UTF-8. That would be double the size assuming all US ASCII characters. However, I believe JVM tend to use a compact form for Strings nowadays.
Another factor is that Java objects tend to be allocated on a nice round address. That means there's extra padding.
Then there's memory for the actual String object, in addition to the actual data in the backing char[] or byte[].
Then there's your UserModel object. Each object has a header and references are usually 8-bytes (may be 4).
Lastly not all the heap will be allocated. GC runs more efficiently when a fair proportion of the memory isn't, at any particular moment, being used. Even C malloc will end up with much of the memory unused once a process is up and running.

You code reads the full file into memory. Then you start splitting each line into an array, then you create objects of your custom class for each line. So basically you have 3 different pieces of "memory usage" for each line in your file!
While enough memory is available, the jvm might simply not waste time running the garbage collector while turning your 500 megabytes into three different representations. Therefore you are likely to "triplicate" the number of bytes within your file. At least until the gc kicks in and throws away the no longer required file lines and splitted arrays.

maximum limit on Java array

I am trying to create 2D array in Java as follows:
int[][] adjecancy = new int[96295][96295];
but it is failing with the following error:
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2017/04/07 11:58:55 - please wait.
JVMDUMP032I JVM requested System dump using 'C:\eclipse\workspaces\TryJavaProj\core.20170407.115855.7840.0001.dmp' in response to an event
JVMDUMP010I System dump written to C:\eclipse\workspaces\TryJavaProj\core.20170407.115855.7840.0001.dmp
JVMDUMP032I JVM requested Heap dump using 'C:\eclipse\workspaces\TryJavaProj\heapdump.20170407.115855.7840.0002.phd' in response to an event
JVMDUMP010I Heap dump written to C:\eclipse\workspaces\TryJavaProj\heapdump.20170407.115855.7840.0002.phd
A way to solve this is by increasing the JVM memory but I am trying to submit the code for an online coding challenge. There it is also failing and I will not be able to change the settings there.
Is there any standard limit or guidance for creating large arrays which one should not exceed?

int[][] adjecancy = new int[96295][96295];
When you do that you are trying to allocate 96525*96525*32 bits which is nearly 37091 MB which is nearly 37 gigs. That is highly impossible to get the memory from a PC for Java alone.
I don't think you need that much data in your hand on initialization of your program. Probably you have to look at ArrayList which gives you dynamic allocation of size and then keep on freeing up at runtime is a key to consider.
There is no limit or restriction to create an array. As long as you have memory, you can use it. But keep in mind that you should not hold a block of memory which makes JVM life hectic.

Array must obviously fit into memory. If it does not, the typical solutions are:
Do you really need int (max value 2,147,483,647)? Maybe byte (max
value 127) or short is good enough? byte is 8 times smaller than int.
Do you have really many identical values in array (like zeros)? Try to use sparse arrays.
for instance:
Map<Integer, Map<Integer, Integer>> map = new HashMap<>();
map.put(27, new HashMap<Integer, Integer>()); // row 27 exists
map.get(27).put(54, 1); // row 27, column 54 has value 1.
They need more memory per value stored, but have basically no limits on the array space (you can use Long rather than Integer as index to make them really huge).
Maybe you just do not know how long the array should be? Try ArrayList, it self-resizes. Use ArrayList of ArrayLists for 2D array.
If nothing else is helpful, use RandomAccessFile to store your overgrown data into the filesystem. 100 Gb or about are not a problem in these times on a good workstation, you just need to compute the required offset in the file. The filesystem is obviously much slower than RAM but with good SSD drive may be bearable.

It is recommended to allocate Maximum Heap Size that can be allocated is 1/4th of the Machine RAM Size.
1 int in Java takes 4 bytes and your array allocation needs approximately 37.09GB of Memory.
In that case even if I assume you are allocating Full Heap to just an Array your machine should be around 148GB RAM. That is huge.
Have a look at below.
Ref: http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
Hope this helps.

It depends on maximum memory available to your JVM and the content type of the array. For int we have 4 bytes of memory. Now if 1 MB of memory is available on your machine , it can hold maximum of 1024 * 256 integers(1 MB = 1024 * 1024 bytes). Keeping that in mind you can create your 2D array accordingly.

Array that you can create depends upon JVM heap size.
96295*96295*4(bytes per number) = 37,090,908,100 bytes = ~34.54 GBytes. Most JVMs in competitive code judges don't have that much memory. Hence the error.
To get a good idea of what array size you can use for given heap size -
Run this code snippet with different -Xmx settings:
Scanner scanner = new Scanner(System.in);
while(true){
System.out.println("Enter 2-D array of size: ");
size = scanner.nextInt();
int [][]numbers = new int[size][size];
numbers = null;
}
e.g. with -Xmx 512M -> 2-D array of ~10k+ elements.
Generally most of online judges have ~1.5-2GB heap while evaluating submissions.

File size vs. in memory size in Java

If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB.
Why the huge increase in size?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB.
To measure the memory in Java I'm using Instrumentation.
For C++, I take the length of the serialized object (e.g string).

I think there are multiple factors involved.
First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not.
Second, Strings in Java might be 2 Bytes per character instead of 1.
Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does.
Please note that these are just ideas where the big difference might come from.

Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall.
So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. There must be some problem with the method you use to measure it.

In Java String object have some extra data, that increases it's size.
It is object data, array data and some other variables. This can be array reference, offset, length etc.
Visit http://www.javamex.com/tutorials/memory/string_memory_usage.shtml for details.

String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead.
For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.
More:
What is the memory consumption of an object in Java?

Yes, you should GC and give it time to finish. Just System.gc(); and print totalMem() in the loop. You also better to create a million of string copies in array (measure empty array size and, then, filled with strings), to be sure that you measure the size of strings and not other service objects, which may present in your program. String alone cannot take 32 kb. But hierarcy of XML objects can.
Said that, I cannot resist the irony that nobody cares about memory (and cache hits) in the world of Java. We are know that JIT is improving and it can outperform the native C++ code in some cases. So, there is not need to bother about memory optimization. Preliminary optimization is a root of all evils.

As stated in other answers, Java's String is adding an overhead. If you need to store a large number of strings in memory, I suggest you to store them as byte[] instead. Doing so the size in memory should be the same than the size on disk.
String -> byte[] :
String a = "hello";
byte[] aBytes = a.getBytes();
byte[] -> String :
String b = new String(aBytes);

Using too much Ram in Java

I'm writing a program in java which has to make use of a large hash-table, the bigger the hash-table can be, the better (It's a chess program :P). Basically as part of my hash table I have an array of "long[]", an array of "short[]", and two arrays of "byte[]". All of them should be the same size. When I set my table size to ten-million however, it crashes and says "java heap out of memory". This makes no sense to me. Here's how I see it:
1 Long + 1 Short + 2 Bytes = 12 bytes
x 10,000,000 = 120,000,000 bytes
/ 1024 = 117187.5 kB
/ 1024 = 114.4 Mb
Now, 114 Mb of RAM doesn't seem like too much to me. In total my CPU has 4Gb of RAM on my mac, and I have an app called FreeMemory which shows how much RAM I have free and it's around 2Gb while running this program. Also, I set the java preferences like -Xmx1024m, so java should be able to use up to a gig of memory. So why won't it let me allocate just 114Mb?

You predicted that it should use 114 MB and if I run this (on a windows box with 4 GB)
public static void main(String... args) {
long used1 = memoryUsed();
int Hash_TABLE_SIZE = 10000000;
long[] pos = new long[Hash_TABLE_SIZE];
short[] vals = new short[Hash_TABLE_SIZE];
byte[] depths = new byte[Hash_TABLE_SIZE];
byte[] flags = new byte[Hash_TABLE_SIZE];
long used2 = memoryUsed() - used1;
System.out.printf("%,d MB used%n", used2 / 1024 / 1024);
}
private static long memoryUsed() {
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
prints
114 MB used
I suspect you are doing something else which is the cause of your problem.
I am using Oracle HotSpot Java 7 update 10

Has not taken into account that each object is a reference and also use memory, and more "hidden things"... we must also take into account also the alignment... byte is not always a byte ;-)
Java Objects Memory Structure
How much memory is used by Java
To see how much memory is really in use, you can use a profiler:
visualvm
If you are using standard HashMap (or similar from JDK), each "long" (boxing/unboxing) really are more than 8bytes), you can use this as a base... (use less memory)
NativeIntHashMap

From what I have read about BlueJ, and serious technical information is almost impossible to find, BlueJ VM is quite likely not to support primitive types at all; your arrays are actually of boxed primitives. BlueJ uses a subset of all Java features, with emphasis on object orientation.
If that is the case, plus taking into consideration that performance and efficiency are quite low on BlueJ VM's list of priorities, you may actually be using quite a bit more memory than you think: a whole order of magnitude is quite imaginable.

I believe one way it would be to clean the heap memory after each execution, one link is here:
Java heap space out of memory

Huge String Table in Java

I've got a question about storing huge amount of Strings in application memory. I need to load from file and store about 5 millions lines, each of them max 255 chars (urls), but mostly ~50. From time to time i'll need to search one of them. Is it possible to do this app runnable on ~1GB of RAM?
Will
ArrayList <String> list = new ArrayList<String>();
work?
As far as I know String in java is coded in UTF-8, what gives me huge memory use. Is it possible to make such array with String coded in ANSI?
This is console application run with parameters:
java -Xmx1024M -Xms1024M -jar "PServer.jar" nogui

The latest JVMs support -XX:+UseCompressedStrings by default which stores strings which only use ASCII as a byte[] internally.
Having several GB of text in a List isn't a problem, but it can take a while to load from disk (many seconds)
If the average URL is 50 chars which are ASCII, with 32 bytes of overhead per String, 5 M entries could use about 400 MB which isn't much for a modern PC or server.

A Java String is a full blown object. This means that appart from the characters of the string theirselves, there is other information to store in it (a pointer to the class of the object, a counter with the number of pointers pointing to it, and some other infrastructure data). So an empty String already takes 45 bytes in memory (as you can see here).
Now you just have to add the maximum lenght of your string and make some easy calculations to get the maximum memory of that list.
Anyway, I would suggest you to load the string as byte[] if you have memory issues. That way you can control the encoding and you can still do searchs.

Is there some reason you need to restrict it to 1G? If you want to search through them, you definitely don't want to swap to disk, but if the machine has more memory it makes sense to go higher then 1G.
If you have to search, use a SortedSet, not an ArrayList

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Lowering memory footprint of data structure in Java - java

Related

What is an overhead for creating Java objects from lines of csv file

maximum limit on Java array

File size vs. in memory size in Java

Using too much Ram in Java

Huge String Table in Java

Categories

Resources