Caching strategy for small immutable objects in Java?

Caching strategy for small immutable objects in Java? - java

I am developing an app that creates a large number of small, immutable Java objects. An example might be:
public class Point {
final int x;
final int y;
final int z;
.....
}
Where it is likely that many instances of Point will need to refer to the same (x,y,z) location.
To what extent does it make sense to try to cache and re-use such objects during the lifetime of the application? Any special tricks to handle this kind of situation?

When it becomes a problem. Otherwise you're just creating a useless layer of abstraction.
Either way, you could easily implement this with a PointFactory that you call to get a Point, which always returns the same object instance for any given x, y and z. But then you have to manage when the points should be removed from cache because they wont be garbage collected.
I say forget about it unless it's an actual issue. Your application shouldn't depend on such a caching mechanism, which would allow you to add it in later if necessary. So maybe just use a factory that returns a new point instance very time for now.
public class PointFactory{
public static Point get(int x, int y, int z){
return new Point(x, y, z);
}
}

The problem you are likely to have is making the object pool light weight enough to be cheaper than just creating the objects. You want to the pool to be large enough that you get a fairly high hit rate.
In my experience, you are likely to have problems micro-benchmarking this. When you are creating a single object type repeatedly in a micro-benchmark, you get much better results than when creating a variety of objects in a real/complex application.
The problem with many object pool aproaches is that they a) require a key object, which costs as much or more than creating a simple object, b) involve some synchromization/locking which again can cost as much as creating an object c) require an extra object when adding to the cache (e.g. a Map.Entry), meaning your hit rate has to be much better for the cache to be worth while.
The most light weight, but dumb caching strategy I know is to use an array with a hashcode.
e.g.
private static final int N_POINTS = 10191; // or some large prime.
private static final Point[] POINTS = new Point[N_POINTS];
public static Point of(int x, int y, int z) {
int h = hash(x,y,z); // a simple hash function of x,y,z
int index = (h & 0x7fffffff) % N_POINTS;
Point p = POINTS[index];
if (p != null && p.x == x && p.y == y && p.z == z)
return p;
return POINTS[index] = new Point(x,y,z);
}
Note: the array is not thread safe, but since the Point is immutable, this doesn't matter. The cache works on a best effort basis, and is naturally limited in size with a very simple eviction strategy.
For testing purposes, you can add hit/miss counters to determine the caches effectiveness for you data set.

It sounds almost like a textbook example of the Flyweight pattern.

How many instances will share the same coordinates, how many will exist at the same time, and how many will be discarded?
Reusing the objects only has benefits if a significant percentage of live objects at one time are duplicates (at least 20%, I'd say) and overall memory usage is problematic. And if objects are discarded frequently, you have to construct the cache in a way that prevents it from becoming a memory leak (probably using soft/weak references).

Remember that caching these objects will influence concurrency and garbage collection in (most likely) a bad way. I wouldn't do it unless the other objects that refer to the points are long lived too.

As for most cases: it depends.
If your object is rather complex (takes a lot of time to instatiate) put can be expressed in a string, it makes sense to create and load them through a static factory method.
This also makes sense if some representations of the object are used more often than others (in your case maybe Point(0,0,0))
e.g
private static final HashMap<String, Point> hash = new HashMap<String, Point>();
public static Point createPoint(int x, int y, int z) {
String key = getKey(x,y,z);
Point created = hash.get(key)
if (created == null) {
created = new Point(x,y,z);
hash.put(key,created);
}
return created;
}
private static String createKey(int x, int y, int z) {
StringBuffer buffer = new StringBuffer();
buffer.append("x:");
buffer.append(x);
buffer.append("y:");
buffer.append(y);
buffer.append("z:");
buffer.append(z);
return buffer.toString()
}

Related

Lazy initialization of hashcode in Java

Why do we say that immutable objects use lazy hash code initialization? For mutable objects too, we can calculate hashcode only when required right causing lazy initialization?

For mutable classes, it usually doesn't make much sense to store the hashCode, as you'd have to update it every time the object is modified (or at least nullify it so you can recalculate it next time hashCode() is called).
For immutable classes, it makes a lot of sense to store the hash code - once it's calculated, it will never change (since the object is immutable), and there's no need to keep re-calculating every time hashCode() is called. As a further optimization, we can avoid calculating this value until the first time it's needed (i.e., hashCode() is called) - i.e., use lazy initialization.
There's nothing that prohibits you from doing the same on a mutable object, it's just generally not a very good idea.

The advantage of lazy initialization is that hashcode computation is suspended until it is required. Many objects don't need it at all, so you save some computations. Particularly when you have high hash computations. Look at the example below :
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}

Are all immutable objects re-usable?

From the effective Java book it states that "An object can always be reused if it is immutable".
String s = "shane";
String p = "shane";
This version uses a single String instance, rather than creating a new one
each time it is executed. Furthermore, it is guaranteed that the object will be
reused by any other code running in the same virtual machine that happens to contain
the same string literal.
What about the below final class which is also immutable?. Can the Point Object be re-used?.
public final class Point {
private final int x, y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public int getX() { return x; }
public int getY() { return y;
}
Can anyone provide me an example of the above immutable class where its object/instance can be re-used?. I am just confused on how the re-usability would occur?.
I am able to relate with String and Integer Classes, but not with user defined classes.

It "can" be reused, in that you could use the same object in multiple places and it would be fine. But it won't be, automatically. The JVM itself manges reuse Integer objects for the range -128 - 127
Integers caching in Java
"intern"ed strings (including literals) similarly are managed by the JVM. The closest to automatic reuse you could have here would be to make the constructor private, and create a factory method:
Point.create(int x, int y)
And have the implementation maintain a cache of objects that you'd like to reuse (like Integers effectively cache -128 to 127) But you'll have to do the work yourself.
Edit:
You'd basically have:
private static final Map<Pair<Integer, Integer>, Point> cache = new HashMap<>();
public Point create(int x, int y) {
Pair<Integer, Integer> key = Pair.of(x, y);
if (cache.containsKey(key)) {
return cache.get(key);
}
Point p = new Point(x, y);
cache.put(key, p);
return p;
}
Edit:
Alternatively, add hashCode() and equals() to the Point class, and just use a HashSet. Would be simpler.

Re usable simply means to change the "reference" variable value.
e.g. an int is can be reused and its value changed
a data type is a little different the reference variable is re-initiated for example using the "new" instane e.g. myframe=new JFrame()
variables declared "final" are a "constant" and are mutable.
The class above itself requires its reference variable at initiation to be declared "final" to be mutable although its contents is effectively mutable, the difficulty is the definition of context of which (variable or class definition) part is the mutable.

Immutability means when an object is created its state at the creation time is going to stay through out its life. And yes, the class you showed and object of that class is immutable, as you are initialing states in constructor and there are no setters.
About the re-use: yes you can reuse the same object over and over where an object of type Point is required, but for that purpose you have to hold on to an object once it's created for that. As #James
suggested, you can use a factory for object creation and that factory can decide if it needs to create a new object or use an existing one when you ask for a Point object.

immutable objects and lazy initialization.

http://www.javapractices.com/topic/TopicAction.do?Id=29
Above is the article which i am looking at. Immutable objects greatly simplify your program, since they:
allow hashCode to use lazy initialization, and to cache its return value
Can anyone explain me what the author is trying to say on the above
line.
Is my class immutable if its marked final and its instance variable
still not final and vice-versa my instance variables being final and class being normal.

As explained by others, because the state of the object won't change the hashcode can be calculated only once.
The easy solution is to precalculate it in the constructor and place the result in a final variable (which guarantees thread safety).
If you want to have a lazy calculation (hashcode only calculated if needed) it is a little more tricky if you want to keep the thread safety characteristics of your immutable objects.
The simplest way is to declare a private volatile int hash; and run the calculation if it is 0. You will get laziness except for objects whose hashcode really is 0 (1 in 4 billion if your hash method is well distributed).
Alternatively you could couple it with a volatile boolean but need to be careful about the order in which you update the two variables.
Finally for extra performance, you can use the methodology used by the String class which uses an extra local variable for the calculation, allowing to get rid of the volatile keyword while guaranteeing correctness. This last method is error prone if you don't fully understand why it is done the way it is done...

If your object is immutable it can't change it's state and therefore it's hashcode can't change. That allows you to calculate the value once you need it and to cache the value since it will always stay the same. It's in fact a very bad idea to implement your own hasCode function based on mutable state since e.g. HashMap assumes that the hash can't change and it will break if it does change.
The benefit of lazy initialization is that hashcode calculation is delayed until it is required. Many object don't need it at all so you save some calculations. Especially expensive hash calculations like on long Strings benefit from that.
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}
Edit: as pointed out by #assylias, using unsynchronized / non volatile code is only guaranteed to work if there is only 1 read of hashCode because every consecutive read of that field could return 0 even though the first read could already see a different value. Above version fixes the problem.
Edit2: replaced with more obvious version, slightly less code but roughly equivalent in bytecode
public int hashCode() {
int h = hashCode; // only read
return h != 0 ? h : (hashCode = a + b);
// ^- just a (racy) write to hashCode, no read
}

What that line means is, since the object is immutable, then the hashCode has to only be computed once. Further, it doesn't have to be computed when the object is constructed - it only has to be computed when the function is first called. If the object's hashCode is never used then it is never computed. So the hashCode function can look something like this:
#Override public int hashCode(){
synchronized (this) {
if (!this.computedHashCode) {
this.hashCode = expensiveComputation();
this.computedHashCode = true;
}
}
return this.hashCode;
}

And to add to other answers.
Immutable object cannot be changed. The final keyword works for basic data types such as int. But for custom objects it doesn't mean that - it has to be done internally in your implementation:
The following code would result in a compilation error, because you are trying to change a final reference/pointer to an object.
final MyClass m = new MyClass();
m = new MyClass();
However this code would work.
final MyClass m = new MyClass();
m.changeX();

Using Java's contains(Object) method for Collections (eg HashSet) without actually having the object

I recognise that sounds a bit mad but to explain what I mean:
I have a Collection (eg HashSet) containing several quite slow initialisation objects and I want to see if the Collection already contains a particular object. Let's use Vector3d as an example (I know that is not expensive to initialise).
So the Collection contains:
Vector3d(1,1,1)
Vector3d(2,1,1)
Vector3d(3,1,1)
And I want to ask the Collection the question "does the Collection contain a Vector3d with x=2, y=1 and z=1 (i.e. I already know the data the .contains() method would hash against). So I could create a new Vector3d(2,1,1) and then use .contains() on that but as I said the objects initialisation is slow, or I could run through the entire Collection manually checking (which is what I'm doing now) but thats (as I understand it) slower than .contains() since it doesn't use hash. Is there a better way to do this?
The objects in question are mutable but the data that the equals method is based upon is not. (In my case they are blocks at x,y,z co-ordinates, the contents of the blocks may change but the x,y,z co-ordinates will not)

ArrayList is the correct data structure if you only need to iterate through all of your elements or access your elements by position. It is the wrong data structure for anything else.
What you are trying to do is answer the containment question quickly, which is what Sets and Maps are for. It would make much more sense to create a separate, cheaper Vector3dKey class with the simple hash function you want and insert your expensive objects into a Map< Vector3dKey, Vector3d > at the same time as, or instead of, an ArrayList< Vector3d >. Java obviously won't keep two copies of your expensive vectors, just copies of the references. Of course, this whole scheme breaks down if your Vectors are mutable.

Using the .contains() method on an ArrayList will result in the equals method being invoked against each and every instance in the ArrayList.
While that will work for you, it may not prove beneficial for extremely large ArrayLists. If performance is a problem, you may wish to hold a HashSet containing references to the Vector3d objects. Invoking contains on a HashSet (or any Set) is drastically faster.

If you REALLY have to use a list (and not a hash) you might as well iterate over the list, retrieve each object and check it's attributes manually--I mean that will be pretty much as quick as "Contains".
If you were going to use a hash instead of a list then you should use a different object for comparison. For instance, if you use a HashMap with your above example your keys could be the following strings:
"1,1,1","2,1,1","3,1,1"
This would make a lookup instant and easy. If the list could contain other types of objects, maybe "Vector3d(1,1,1)" would be a better string. It's easy to re-create without being expensive or adding code complexity.
If you were using a list because you needed to retain order, look at LinkedHashMap.
Also I suggest you create a function to derive the string from the object (when inserting) or from the parameters (when searching) rather than distributing the functionality around your code, this is the kind of thing you are likely to need to change or expand on later.

Code based on Judge Mental's answer
package mygame;
import java.util.HashMap;
import java.util.Map;
public class Main{
public Main(){
Map<CheapKey,ExpensiveClass> map=new HashMap< CheapKey, ExpensiveClass>();
for(int i=0;i<100;i++){
ExpensiveClass newExpensiveClass;
newExpensiveClass=new ExpensiveClass(i,0,0);
map.put(newExpensiveClass.getKey(), newExpensiveClass);
}
CheapKey testKey1=new CheapKey(1,0,0);
CheapKey testKey2=new CheapKey(1,0,1);
System.out.println(map.containsKey(testKey1)); //there is an object under key1
System.out.println(map.containsKey(testKey2)); //there isn't an object under key2
ExpensiveClass retrievedExpensiveClass=map.get(testKey1);
}
public static void main(String[] args) {
Main main=new Main();
}
protected class ExpensiveClass{
int x;
int y;
int z;
public ExpensiveClass(int x, int y, int z){
this.x=x;
this.y=y;
this.z=z;
for(int i=0;i<10000;i++){
//slow initilisation
}
}
public CheapKey getKey(){
return new CheapKey(x,y,z);
}
}
protected class CheapKey{
int x;
int y;
int z;
public CheapKey(int x, int y, int z){
this.x=x;
this.y=y;
this.z=z;
}
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final CheapKey other = (CheapKey) obj;
return true;
}
#Override
public int hashCode() {
int hash = 7;
hash = 79 * hash + this.x;
hash = 79 * hash + this.y;
hash = 79 * hash + this.z;
return hash;
}
}
}

The contains method will invoke the .equals method of an object, so as long as the implementation of .equals for that class compares the values contains in the objects not their pointers then using contains will work.
http://docs.oracle.com/javase/7/docs/api/java/util/Collection.html#contains(java.lang.Object)
Edit misread your question a bit. I think it comes down to how big the list is vs how long the initialisation takes. If the list is short, iterate through it and manually check. However if the list is likely to be long, creating the objects and using .contains could well be more efficient.

ArrayList.contains doesn't use hashing; it's exactly the same speed as the manual check. It makes no difference either way.
Using a fake object class is doable, but almost certainly a code smell.

What's the fastest Java collection for single threaded Contains(Point(x,y)) functionality?

In my application I need to check a collection of 2D coordinates (x,y) to see if a given coordinate is in the collection, it needs to be as fast as possible and it will only be accessed from one thread.
( It's for collision checking )
Can someone give me a push in the right direction?

The absolute fastest I can think of would be to maintain a 2D matrix of those points:
//just once
int[][] occurrences = new int[X_MAX][Y_MAX];
for (Point p : points ) {
occurrences[p.x][p.y]++;
}
//sometime later
if ( occurrences[x][y] != 0 ) {
//contains Point(x, y)
}
If you don't care how many there are, just a boolean matrix would work. Clearly this would only be fast if the matrix was created just once, and maybe updated as Points are added to the collection.
In short, the basic Collections aren't perfect for this (though a HashSet would come close).
Edit
This could be easily adapted to be a Set<Point> if you don't find a library that does this for you already. Something like this:
public class PointSet implements Set<Point> {
private final boolean[][] data;
public PointSet(int xSize, int ySize) {
data = new boolean[xSize][ySize];
}
#Override
public boolean add(Point e) {
boolean hadIt = data[e.x][e.y];
data[e.x][e.y] = true;
return hadIt;
}
#Override
public boolean contains(Object o) {
Point p = (Point) o;
return data[p.x][p.y];
}
//...other methods of Set<Point>...
}

I would go using some Trove collections data structures.
If your points are stored as a couple of int or a couple of float you can pack them in a long: 32 bits for x-coord and 32 bits for y-coord. Then you can use a TLongHashSet that is an HashSet optimized for working with primitive data (it will be faster and consume less memory compared to normal java collections).
If you have int coordinates it would be something like
static private long computeKey(int h1, int h2)
{
return ((long)h1) << 32 | h2;
}
to compute the key and then use it
TLongHashSet set = new TLongHashSet()
set.add(long v);
set.addAll(long[] v);
set.containsAll(..);
if you have float values you can do the same thing, but you have to pack float bits inside the long.

HashSet. Its O(1) average. If you want true O(1) you can make a wrapper for your object which has a reference to a collection. That way you cant just compare it with the collection you have.

How often do you have to update the collection in comparison to searching it? You should chose an appropriate data structure based on that.
Point2D implements comparable, right? Then your best bet is probably a TreeSet, they are incredibly fast and I believe they rely on B+ trees, which you may know are used in actual databases and filesystems.
If you think you're going to be doing a fair amount of updates to the structure, take a look at the SkipList. It guarentees O(log(operations)) **NOTE this is for ALL operations you do, there is no guarentee about the runtime of a single opperation)

You can try some sort of sorted set, like treeset, since you can do binary searches on it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Caching strategy for small immutable objects in Java? - java

It sounds almost like a textbook example of the Flyweight pattern.

Remember that caching these objects will influence concurrency and garbage collection in (most likely) a bad way. I wouldn't do it unless the other objects that refer to the points are long lived too.

Related

Lazy initialization of hashcode in Java

Are all immutable objects re-usable?

immutable objects and lazy initialization.

Using Java's contains(Object) method for Collections (eg HashSet) without actually having the object

What's the fastest Java collection for single threaded Contains(Point(x,y)) functionality?

Categories

Resources