Tweaking java classes for CPU cache friendliness

Tweaking java classes for CPU cache friendliness - java

When designing java classes, what are the recommendations for achieving CPU cache friendliness?
What I have learned so far is that one should use POD as much as possible (i.e. int instead of integer). Thus, the data will be allocated consecutively when allocating the containing object. E.g.
class Local
{
private int data0;
private int data1;
// ...
};
is more cache friendly than
class NoSoLocal
{
private Integer data0;
private Integer data1;
//...
};
The latter will require two separate allocations for the Integer objects that can be at arbitrary locations in memory, esp. after a GC run. OTOH the first approach might lead to duplication of data in cases where the data can be reused.
Is there a way to have them located close to each other in memory so that the parent object and its' containing elements will be in the CPU cache at once and not distributed arbitrarily over the whole memory plus the GC will keep them together?

You cannot force JVM to place related objects close to each other (though JVM tries to do it automatically). But there are certain tricks to make Java programs more cache-friendly.
Let me show you some examples from the real-life projects.
BEWARE! This is not a recommended way to code in Java!
Do not adopt the following techniques unless you are absolutely sure why you are doing it.
Inheritance over composition. You've definitely heard the contrary principle "Favor composition over inheritance". But with composition you have an extra reference to follow. This is not good for cache locality and also requires more memory. The classic example of inheritance over composition is JDK 8 Adder and Accumulator classes which extend utility Striped64 class.
Transform arrays of structures into a structure of arrays. This again helps to save memory and to speed-up bulk operations on a single field, e.g. key lookups:
class Entry {
long key;
Object value;
}
Entry[] entries;
will be replaced with
long[] keys;
Object[] values;
Flatten data structures by inlining. My favorite example is inlining 160-bit SHA1 hash represented by byte[]. The code before:
class Blob {
long offset;
int length;
byte[] sha1_hash;
}
The code after:
class Blob {
long offset;
int length;
int hash0, hash1, hash2, hash3, hash4;
}
Replace String with char[]. You know, String in Java contains char[] object under the hood. Why pay performance penalty for an extra reference?
Avoid linked lists. Linked lists are very cache-unfriendly. Hardware works best with linear structures. LinkedList can be often replaced with ArrayList. A classic HashMap may be replaced with an open address hash table.
Use primitive collections. Trove is a high-performance library containg specialized lists, maps, sets etc. for primitive types.
Build your own data layouts on top of arrays or ByteBuffers. A byte array is a perfect linear structure. To achieve the best cache locality you can pack an object data manually into a single byte array.

the first approach might lead to duplication of data in cases where the data can be reused.
But not in the case you mention. An int is 4 bytes, a reference is typically 4-bytes so you don't gain anything by using Integer. For a more complex type, it can make a big difference however.
Is there a way to have them located close to each other in memory so that the parent object and its' containing elements will be in the CPU cache at once and not distributed arbitrarily over the whole memory plus the GC will keep them together?
The GC will do this anyway, provided the objects are only used in one place. If the objects are used in multiple places, they will be close to one reference.
Note: this is not guaranteed to be the case, however when allocating objects they will typically be continuous in memory as this is the simplest allocation strategy. When copying retained objects, the HotSpot GC will copy them in reverse order of discovery. i.e. they are still together but in reverse order.
Note 2: Using 4 bytes for an int is still going to be more efficient than using 28 bytes for an Integer (4 bytes for reference, 16 bytes for object header, 4 bytes for value and 4 bytes for padding)
Note 3: Above all, you should favour clarity over performance, unless you have measured you need and have a more performant solution. In this case, an int cannot be a null but an integer can be null. If you want a value which should not be null use int, not for performance but for clarity.

Related

Java Performance/Memory Consumption: Class vs. Array

Out of interest: Recently, I encountered a situation in one of my Java projects where I could store some data either in a two-dimensional array or make a dedicated class for it whose instances I would put into a one-dimensional array. So I wonder whether there exist some canonical design advice on this topic in terms of performance (runtime, memory consumption)?
Without regard of design patterns (extremely simplified situation), let's say I could store data like
class MyContainer {
public double a;
public double b;
...
}
and then
MyContainer[] myArray = new MyContainer[10000];
for(int i = myArray.length; (--i) >= 0;) {
myArray[i] = new MyContainer();
}
...
versus
double[][] myData = new double[10000][2];
...
I somehow think that the array-based approach should be more compact (memory) and faster (access). Then again, maybe it is not, arrays are objects too and array access needs to check indexes while object member access does not.(?) The allocation of the object array would probably(?) take longer, as I need to iteratively create the instances and my code would be bigger due to the additional class.
Thus, I wonder whether the designs of the common JVMs provide advantages for one approach over the other, in terms of access speed and memory consumption?
Many thanks.

Then again, maybe it is not, arrays are objects too
That's right. So I think this approach will not buy you anything.
If you want to go down that route, you could flatten this out into a one-dimensional array (each of your "objects" then takes two slots). That would give you immediate access to all fields in all objects, without having to follow pointers, and the whole thing is just one big memory allocation: since your component type is primitive, there is just one object as far as memory allocation is concerned (the container array itself).
This is one of the motivations for people wanting to have structs and value types in Java, and similar considerations drive the development of specialized high-performance data structure libraries (that get rid of unneccessary object wrappers).
I would not worry about it, until you really have a huge datastructure, though. Only then will the overhead of the object-oriented way matter.

I somehow think that the array-based approach should be more compact (memory) and faster (access)
It won't. You can easily confirm this by using Java Management interfaces:
com.sun.management.ThreadMXBean b = (com.sun.management.ThreadMXBean) ManagementFactory.getThreadMXBean();
long selfId = Thread.currentThread().getId();
long memoryBefore = b.getThreadAllocatedBytes(selfId);
// <-- Put measured code here
long memoryAfter = b.getThreadAllocatedBytes(selfId);
System.out.println(memoryAfter - memoryBefore);
Under measured code put new double[0] and new Object() and you will see that those allocations will require exactly the same amount of memory.
It might be that the JVM/JIT treats arrays in a special way which could make them faster to access in one way or another.
JIT do some vectorization of an array operations if for-loops. But it's more about speed of arithmetic operations rather than speed of access. Beside that, can't think about any.

The canonical advice that I've seen in this situation is that premature optimisation is the root of all evil. Following that means that you should stick with the code that is easiest to write / maintain / get past your code quality regime, and then look at optimisation if you have a measurable performance issue.
In your examples the memory consumption is similar because in the object case you have 10,000 references plus two doubles per reference, and in the 2D array case you have 10,000 references (the first dimension) to little arrays containing two doubles each. So both are one base reference plus 10,000 references plus 20,000 doubles.
A more efficient representation would be two arrays, where you'd have two base references plus 20,000 doubles.
double[] a = new double[10000];
double[] b = new double[10000];

Raw hard core performance, ArrayList of objects vs. hashset comparison

I have an ArrayList that is not vary memory intensive, it stores only two fields,
public class ExampleObject{
private String string;
private Integer integer;
public ExampleObject(String stringInbound, Integer integerInbound){
string = stringInbound;
integer = integerInbound;
}
I will fill an ArrayList with this objects
ArrayList<ExampleObject> = new ArrayList<ExampleObject>();
for raw, hard core performance is it much better to use a hashset for this? If my ArrayList grows to a vary large number of items with an index in the the hundreds, will I notice a huge deference between the ArrayList of Objects and the hashset?

Although they are both Collection, I suggest that you read the differences between a Set and a List.
They are not used for the same purpose. So choose the one that meet your implementation requirements before thinking about performance.

It all depends on what you're doing. How is data added? How is it accessed? How often is it removed?
For example, in some situations it might even be better to have parallel arrays of String[] and int[] -- you avoid collection class overhead and the boxing of int to Integer. Depending on exactly what you're doing, that might be really great or incredibly dumb.
Memory consumption can have a strong effect on performance as data sets get larger. A couple of IBM researchers did a neat presentation on Building Memory-efficient Java Applications a few years back that everyone who is concerned about performance should read.

Java class vs array memory size?

I have to store millions of X/Y double pairs for reference in my Java program. I'd like to keep memory consumption as low as possible as well as the number of object references. So after some thinking I decided holding the two points in a tiny double array might be a good idea, it's setup looks like so:
double[] node = new double[2];
node[0] = x;
node[1] = y;
I figured using the array would prevent the link between the class and my X and Y variables used in a class, as follows:
class Node {
public double x, y;
}
However after reading into the way public fields in classes are stored, it dawned on me that fields may not actually be structured as pointer like structures, perhaps the JVM is simply storing these values in contiguous memory and knows how to find them without an address thus making the class representation of my point smaller than the array.
So the question is, which has a smaller memory footprint? And why?
I'm particularly interested in whether or not class fields use a pointer, and thus have a 32-bit overhead, or not.

The latter has the smaller footprint.
Primitive types are stored inline in the containing class. So your Node requires one object header and two 64-bit slots. The array you specify uses one array header (>= an object header) plust two 64-bit slots.
If you're going to allocate 100 variables this way, then it doesn't matter so much, as it is just the header sizes which are different.
Caveat: all of this is somewhat speculative as you did not specify the JVM - some of these details may vary by JVM.

I don't think your biggest problem is going to be storing the data, I think it's going to be retrieving, indexing, and manipulating it.
However, an array, fundamentally, is the way to go. If you want to save on pointers, use a one dimensional array. (Someone has already said that).

First, it must be stated that the actual space usage depends on the JVM you are using. It is strictly implementation specific. The following is for a typical mainstream JVM.
So the question is, which has a smaller memory footprint? And why?
The 2nd version is smaller. An array has the overhead of the 32 bit field in the object header that holds the array's length. In the case of a non-array object, the size is implicit in the class and does not need to be represented separately.
But note that this is a fixed over head per array object. The larger the array is, the less important the overhead is in practical terms. And the flipside of using a class rather than array is that indexing won't work and your code may be more complicated (and slower) as a result.
A Java 2D array is actually and array of 1D arrays (etcetera), so you can apply the same analysis to arrays with higher dimensionality. The larger the size an array has in any dimension, the less impact the overhead has. The overhead in a 2x10 array will be less than in a 10x2 array. (Think it through ... 1 array of length 2 + 2 of length 10 versus 1 array of length 10 + 10 of length 2. The overhead is proportional to the number of arrays.)
I'm particularly interested in whether or not class fields use a pointer, and thus have a 32-bit overhead, or not.
(You are actually talking about instance fields, not class fields. These fields are not static ...)
Fields whose type is a primitive type are stored directly in the heap node of the object without any references. There is no pointer overhead in this case.
However, if the field types were wrapper types (e.g. Double rather than double) then there could be the overhead of a reference AND the overheads of the object header for the Double object.

HashSet of ByteBuffer(actually integers) to separate unique & non unique elements from a ByteBuffer array

I have an array of ByteBuffers(which actually represent integers). I want to the separate unique & non unique ByteBuffers (i.e integers) in the array. Thus I am using HashSet of this type:
HashSet<ByteBuffer> columnsSet = new HashSet<ByteBuffer>()
Just wanted to know if HashSet is a good way to do so? And do I pay more costs, when doing so for a ByteBuffer then if I would have done it for a Integer?
(Actually I am reading serialized data from DB which needs to be written back after this operation thus I want to avoid serialization and deserialization between bytebuffer to Integer and back!)
Your thoughts upon this appreciated.

Creating a ByteBuffer is far more expensive than reading/writing from a reused ByteBuffer.
The most efficient way to store integers is to use int type. If you want a Set of these you can use TIntHashSet which uses int primitives. You can do multiple read/deserialize/store and reverse with O(1) pre-allocated objects.

First of all, it will work. The overhead of equals() on two ByteBuffers will definitely be higher, but perhaps not enough to offset the benefits of not having to deserialize (though, I'm not entirely sure if that would be such a big problem).
I'm pretty sure that the performance will asymptotically be the same, but a more memory-efficient solution is to sort your array, then step through it linearly and test successive elements for equality.
An example, suppose your buffers contain the following:
1 2 5 1
Sort it:
1 1 2 5
Once you start iterating, you get ar[0].equals(ar[1]) and you know these are duplicates. Just keep going like that till n-1.

Collections normally operate on the equals() and hashCode() methods, so performance implications would come through the implementation of the objects stored in the collection.
Looking at ByteBuffer and Integer one can see that the implementation of those methods in Integer are simpler (just one int comparison for equals() and return value; for hashCode()). Thus you could say the Set<ByteBuffer> has higher cost than a Set<Integer>.
However, I can't tell you right now if this cost is higher than the serialization and deserialization cost.
In fact, I'd just go for the more readable code unless you really have a performance problem. In that case I'd just try both methods and take the faster one.

Java & memory management

I'm new to java world from C++ background. I'd like to port some C++ code to Java.
The code uses Sparse vectors:
struct Feature{
int index;
double value;
};
typedef std::vector<Feature> featvec_t;
As I understood, if one makes an object, there will be some overhead on memory usage.
So naive implementation of Feature will overhead signifiantly when there will be 10-100 millions of Features in a set of featvec_t.
How to represent this structure memory efficiently in Java?

If memory is really your bottleneck, try storing your data in two separate arrays:
int[] index and double[] value.
But in most cases with such big structures performance (time) will be the main issue. Depending on operations mostly performed on your data (insert, delete, get, etc.) you need to choose appropriate data structure to store objects of class Feature.
Start your explorations with java.util.Collection interface, its subinterfaces (List, Set, etc) and their implementations provided in java.util package.

To avoid memory overhead for each entry, you could write a java.util.List<Feature> implementation that wraps arrays of int and double, and builds Feature objects on demand.
To have it resize automatically, you could use TIntArrayList and TDoubleArrayList from GNU trove.

Is the question about space for the struct itself or the sparse vector? Since others have answered the former, I'll shoot for the latter...
There aren't any sparse lists/matrices in the standard Java collections to my knowledge.
You could build an equivalent using a TreeMap keyed on the index.

An object in Java (I guess) has:
sizeof(index)
sizeof(value)
sizeof(Class*) <-- pointer to concrete class
So the diference is four bytes from the pointer. If your struct is 4+8=12 bytes it's a 33% overhead... but I can't think other (better) way to do it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Tweaking java classes for CPU cache friendliness - java

Related

Java Performance/Memory Consumption: Class vs. Array

Raw hard core performance, ArrayList of objects vs. hashset comparison

Java class vs array memory size?

HashSet of ByteBuffer(actually integers) to separate unique & non unique elements from a ByteBuffer array

Java & memory management

Categories

Resources