How do you make your Java application memory efficient? - java

How do you optimize the heap size usage of an application that has a lot (millions) of long-lived objects? (big cache, loading lots of records from a db)
Use the right data type
Avoid java.lang.String to represent other data types
Avoid duplicated objects
Use enums if the values are known in advance
Use object pools
String.intern() (good idea?)
Load/keep only the objects you need
I am looking for general programming or Java specific answers. No funky compiler switch.
Edit:
Optimize the memory representation of a POJO that can appear millions of times in the heap.
Use cases
Load a huge csv file in memory (converted into POJOs)
Use hibernate to retrieve million of records from a database
Resume of answers:
Use flyweight pattern
Copy on write
Instead of loading 10M objects with 3 properties, is it more efficient to have 3 arrays (or other data structure) of size 10M? (Could be a pain to manipulate data but if you are really short on memory...)

I suggest you use a memory profiler, see where the memory is being consumed and optimise that. Without quantitative information you could end up changing thing which either have no effect or actually make things worse.
You could look at changing the representation of your data, esp if your objects are small.
For example, you could represent a table of data as a series of columns with object arrays for each column, rather than one object per row. This can save a significant amount of overhead for each object if you don't need to represent an individual row. e.g. a table with 12 columns and 10,000,000 rows could use 12 objects (one per column) rather than 10 million (one per row)

You don't say what sort of objects you're looking to store, so it's a little difficult to offer detailed advice. However some (not exclusive) approaches, in no particular order, are:
Use a flyweight pattern wherever
possible.
Caching to disc. There are
numerous cache solutions for
Java.
There is some debate as to whether
String.intern is a good idea. See
here for a question re.
String.intern(), and the amount of
debate around its suitability.
Make use of soft or weak
references to store data that you can
recreate/reload on demand. See
here for how to use soft
references with caching techniques.
Knowing more about the internals and lifetime of the objects you're storing would result in a more detailed answer.

Ensure good normalization of your object model, don't duplicate values.
Ahem, and, if it's only millions of objects I think I'd just go for a decent 64 bit VM and lots of ram ;)

Normal "profilers" won't help you much, because you need an overview of all your "live" objects. You need heap dump analyzer. I recommend the Eclipse Memory analyzer.
Check for duplicated objects, starting with Strings.
Check whether you can apply patterns like flightweight, copyonwrite, lazy initialization (google will be your friend).

Take a look at this presentation linked from here. It lays out the memory use of common java object and primitives and helps you understand where all the extra memory goes.
Building Memory-efficient Java Applications: Practices and Challenges

You could just store fewer objects in memory. :) Use a cache that spills to disk or use Terracotta to cluster your heap (which is virtual) allowing unused parts to be flushed out of memory and transparently faulted back in.

I want to add something to the point Peter alredy made(can't comment on his answer :() it's always better to use a memory profiler(check java memory profiler) than to go by intution.80% of time it's routine that we ignore has some problem in it.also collection classes are more prone to memory leaks.

If you have millions of Integers and Floats etc. then see if your algorithms allow for representing the data in arrays of primitives. That means fewer references and lower CPU cost of each garbage collection.

A fancy one: keep most data compressed in ram. Only expand the current working set. If your data has good locality that can work nicely.
Use better data structures. The standard collections in java are rather memory intensive.
[what is a better data structure]
If you take a look at the source for the collections, you'll see that if you restrict yourself in how you access the collection, you can save space per element.
The way the collection handle growing is no good for large collections. Too much copying. For large collections, you need some block-based algorithm, like btree.

Spend some time getting acquainted with and tuning the VM command line options, especially those concerning garbage collection. While this won't change the memory used by your objects, it can have a big impact on performance with memory-intensive apps on machines with a lot of RAM.

Assign null value to all the variables which are no longer used. Thus make it available for Garbage collection.
De-reference the collections once usage is over, otherwise GC won't sweep those.

1) Use right dataTypes wherever possible
Class Person {
int age;
int status;
}
Here we can use below variables to save memory while sending Person object
class Person{
short age;
byte status;
}
2) Instead of returning new ArrayList<>(); from method , you can use Collection.emptyList() which will only contain only one element instead of default 10;
For e.g
public ArrayList getResults(){
.....
if(failedOperation)
return new ArrayList<>();
}
//Use this
public ArrayList getResults(){
if(failedOperation)
return Collections.emptyList();
}
3 ) Move creation of objects in methods instead of static declaration wherever possible as fields of objects will be stored on stack instead of heap
4) Using binary formats like protobuf,thrift,avro,messagepack for reducing intercommunication instead of json or XML

Related

Creating new objects versus encoding data in primitives

Let's assume I want to store (integer) x/y-values, what is considered more efficient: Storing it in a primitive value like long (which fits perfect, due sizeof(long) = 2*sizeof(int)) using bit-operations like shift, or and a mask, or creating a Point-Class?
Keep in mind that I want to create and store many(!) of these points (in a loop). Would be there a perfomance issue when using classes? The only reason I would prefer storing in primtives over storing in class is the garbage-collector. I guess generating new objects in a loop would trigger the gc way too much, is it correct?
Of course packing those as long[] is going to take less memory (though it is going to be contiguous). For each Object (a Point) you will pay at least 12 bytes more for the two headers.
On other hand, if you are creating them in a loop and thus escape analysis can prove they don't escape, it can apply an optimization called "scalar replacement" (thought is it very fragile); where your Objects will not be allocated at all. Instead those objects will be "desugared" to fields.
The general rule is that you should code the way it is the most easy to maintain and read that code. If and only if you see performance issues (via a profiler let's say or too many pauses), only then you should look at GC logs and potentially optimize code.
As an addendum, jdk code itself is full of such long where each bit means different things - so they do pack them. But then, me and I doubt you, are jdk developers. There such things matter, for us - I have serious doubts.

Economizing On Space In Arrays Using Static Variables

I am working with some relatively large arrays of instances of a single data structure. Each instance consists of about a half a dozen fields. The arrays take up a lot of space and I'm finding that my development environment dies even when running with a vm using 7 gigabytes of heap space. Although I can move to a larger machine, I am also exploring ways I could economize on space without taking an enormous hit in performance. On inspection of the data I've noticed a great deal of redundancy in the data. For about 80 percent of the data, four of the six fields have identical values.
This gave me the idea, that I can segregate these instances that have redundant information and put them in a specialized form of the data structure (an extension of the original data structure) with static fields for the four fields that contain the identical information. My assumption is that the static fields will only be instantiated in memory once and so even though this information is shared by say 100K objects, these fields take up the same memory as a they would if only one data structure was instantiated. I therefore should be able to realize a significant memory savings.
Is this a correct assumption?
Thank you,
Elliott
I don't know your specific datastructure and a possible algorithm to buid a flyweight, but I would suggest one:
http://en.wikipedia.org/wiki/Flyweight_pattern
The pattern is quite near to the solution you are thinking about, and gives you a good seperation of "how to get the data."
How about maintaining the fields that are redundant in a map and just have references to those values in arrays. Could save space by reducing size of individual data structure.
Try to use HashMap for your storage. That's the way to fast find equal object.
You need to think how define hashCode function of your objects.

Keep serialized and compressed Objects in-memory

I'm currently working on a Part of an Application where "a lot" of data must be selected for further work and I have the impression that the I/O is limiting and not the following work.
My idea is now to have all these objects in memory but serialized an compressed. The question is, if accessing the objects like this would be faster than direct Database access and if it is a good idea or not. (and if it is feasble in terms of memory consumption = serialized form uses less memory than normal object)
EDIT February 2011:
The creation of the objects is the slow part and not the database access itself. Having all in memory is not possible and using ehcache option to "overflow to disk" is actually slower than just getting the data from the database. Standard java serialization is also unusable. it is also a lot slower. So basically nothing I can do about it...
You're basically looking for an in-memory cache or an in-memory datagrid. There are plenty of APIs/products for this sort of thing. ehcache/hibernate chace/gridgain etc etc
The compressed serialized form will use less memory, if it is a large object. However for smaller objects e.g. which use primtives. The original object will be much smaller.
I would first check whether you really need to do this. e.g. Can you just consume more memory? or restructure your objects so they use less memory.
"I have the impression that the I/O is limiting and not the following work. " -> I would be very sure of this before starting implementing such a thing.
The simpler approach I can suggest you is to use ehcache with the option to store on disk when the size of the cache get too big.
Another completely different approach could be using some doc based nosql db like couchdb to store objects selected "for further work"

How to handle huge data in java

right now, i need to load huge data from database into a vector, but when i loaded 38000 rows of data, the program throw out OutOfMemoryError exception.
What can i do to handle this ?
I think there may be some memory leak in my program, good methods to detect it ?thanks
Provide more memory to your JVM (usually using -Xmx/-Xms) or don't load all the data into memory.
For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.
If you must have all the data in memory, try caching commonly appearing objects. For example, if you are looking at employee records and they all have a job title, use a HashMap when loading the data and reuse the job titles already found. This can dramatically lower the amount of memory you're using.
Also, before you do anything, use a profiler to see where memory is being wasted, and to check if things that can be garbage collected have no references floating around. Again, String is a common example, since if for example you're using the first 10 chars of a 2000 char string, and you have used substring instead of allocating a new String, what you actually have is a reference to a char[2000] array, with two indices pointing at 0 and 10. Again, a huge memory waster.
You can try increasing the heap size:
java -Xms<initial heap size> -Xmx<maximum heap size>
Default is
java -Xms32m -Xmx128m
Do you really need to have such a large object stored in memory?
Depending of what you have to do with that data you might want to split it in lesser chunks.
Load the data section by section. This will not let you work on all data at the same time, but you won't have to change the memory provided to the JVM.
You could run your code using a profiler to understand how and why the memory is being eaten up. Debug your way through the loop and watch what is being instantiated. There are any number of them; JProfiler, Java Memory Profiler, see the list of profilers here, and so forth.
Maybe optimize your data classes? I've seen a case someone has been using Strings in place of native datatypes such as int or double for every class member that gave an OutOfMemoryError when storing a relatively small amount of data objects in memory. Take a look that you aren't duplicating your objects. And, of course, increase the heap size:
java -Xmx512M (or whatever you deem necessary)
Let your program use more memory or much better rethink the strategy. Do you really need so much data in the memory?
I know you are trying to read the data into vector - otherwise, if you where trying to display them, I would have suggested you use NatTable. It is designed for reading huge amount of data into a table.
I believe it might come in handy for another reader here.
Use a memory mapped file. Memory mapped files can basically grow as big as you want, without hitting the heap. It does require that you encode your data in a decoding-friendly way. (Like, it would make sense to reserve a fixed size for every row in your data, in order to quickly skip a number of rows.)
Preon allows you deal with that easily. It's a framework that aims to do to binary encoded data what Hibernate has done for relational databases, and JAXB/XStream/XmlBeans to XML.

trying to store java objects in contiguous memory

I am trying to implement a cache-like collection of objects. The purpose is to have fast access to these objects through locality in memory since I'll likely be reading multiple objects at a time. I currently just store objects in a java collections object like vector or deque. But I do not believe this makes use of contiguous memory.
I know this can be done in C, but can it be done in Java? These objects may be of varying lengths (since they may contain strings). Is there a way to allocate contiguous memory through Java? Is there a Java collections object that would?
Please let me know.
Thanks,
jbu
You can't force it. If you allocate all the objects in quick succession they're likely to be contiguous - but if you're storing them in a collection, there's no guarantee that the collection will be local to the actual values. (The collection will have references to the objects, rather than containing the objects themselves.)
In addition, GC compaction will move values around in memory.
Have you actually profiled your app and found this is a bottleneck? In most cases I'd expect other optimisations could help you in a more reliable way.
No, you can't guarantee this locality of reference.
By allocating a byte array, or using a mapped byte buffer from the nio packages, you could get a chunk of contiguous memory, from which you can decode the data you want (effectively deserializing the objects of interest from this chunk of memory). However, if you repeatedly access the same objects, the deserialization overhead would likely defeat the purpose.
Have you written this code yet in Java? And if so, have you profiled it? I would argue that you probably don't need to worry about the objects being in contiguous memory - the JVM is better at memory management than you are in a garbage collected environment.
If you're really concerned about performance, maybe Java isn't the right tool for the job, but my gut instinct is to tell you that you're worrying about optimization too early, and that a Java version of your code, working with non-contiguous memory, will probably suit your needs.
I suggest using a HashMap (no threaded) or Hashtable (threaded) for your cache. Both store an object into an array in the sun jvm. Since all objects in java are passed by reference, this should be represented as an array of pointers in c. My bet is that you are performing premature optimization.
If you absolutely must have this, you have two options:
1) Use JNI and write it in c.
2) Get a BIG byte buffer and use ObjectOutputStream to write objects to it. This will probable be VERY SLOW compared to using a hash table.

Categories

Resources