Java - Check if reference to object in Map exists

Java - Check if reference to object in Map exists - java

A few weeks back I wrote a Java class with the following behavior:
Each object contains a single final integer field
The class contains a static Map (Key: Integer, Content: MyClass)
Whenever an object of the class is instantiated a look-up is done, if an object with the wanted integer field already exists in the static map: return it, otherwise create one and put it in the map.
As code:
public class MyClass
{
private static Map<Integer, MyClass> map;
private final int field;
static
{
map = new HashMap<>();
}
private MyClass(int field)
{
this.field = field;
}
public static MyClass get(int field)
{
synchronized (map)
{
return map.computeIfAbsent(field, MyClass::new);
}
}
}
This way I can be sure, that only one object exists for each integer (as field). I'm currently concerned, that this will prevent the GC to collect objects, which I no longer need, since the objects are always stored in the map (a reference exists)...
If I wrote a loop like function like this:
public void myFunction() {
for (int i = 0; i < Integer.MAX_VALUE; i++) {
MyClass c = MyClass.get(i);
// DO STUFF
}
}
I would end up with Integer.MAX_VALUE objects in memory after calling the method. Is there a way I can check, whether references to objects in the map exists and otherwise remove them?

This looks like a typical case of the multiton pattern: You want to have at most one instance of MyClass for a given key. However, you also seem to want to limit the amount of instances created. This is very easy to do by lazily instantiating your MyClass instances as you need them. Additionally, you want to clean up unused instances:
Is there a way I can check, whether references to objects in the map exists and otherwise remove them?
This is exactly what the JVM's garbage collector is for; There is no reason to try to implement your own form of "garbage collection" when the Java core library already provides tools for marking certain references as "not strong", i.e. should refer to a given object only if there is a strong reference (i.e. in Java, a "normal" reference) somewhere referring to it.
Implementation using Reference objects
Instead of a Map<Integer, MyClass>, you should use a Map<Integer, WeakReference<MyClass>> or a Map<Integer, SoftReference<MyClass>>: Both WeakReference and SoftReference allow the MyClass instances they refer to to be garbage-collected if there are no strong (read: "normal") references to the object. The difference between the two is that the former releases the reference on the next garbage collection action after all strong references are gone, while the latter one only releases the reference when it "has to", i.e. at some point which is convenient for the JVM (see related SO question).
Plus, there is no need to synchronize your entire Map: You can simply use a ConcurrentHashMap (which implements ConcurrentMap), which handles multi-threading in a way much better than by locking all access to the entire map. Therefore, your MyClass.get(int) could look like this:
private static final ConcurrentMap<Integer, Reference<MyClass>> INSTANCES = new ConcurrentHashMap<>();
public static MyClass get(final int field) {
// ConcurrentHashMap.compute(...) is atomic <https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#compute-K-java.util.function.BiFunction->
final Reference<MyClass> ref = INSTANCES.compute(field, (key, oldValue) -> {
final Reference<MyClass> newValue;
if (oldValue == null) {
// No instance has yet been created; Create one
newValue = new SoftReference<>(new MyClass(key));
} else if (oldValue.get() == null) {
// The old instance has already been deleted; Replace it with a
// new reference to a new instance
newValue = new SoftReference<>(new MyClass(key));
} else {
// The existing instance has not yet been deleted; Re-use it
newValue = oldValue;
}
return newValue;
});
return ref.get();
}
Finally, in a comment above, you mentioned that you would "prefer to cache maybe up to say 1000 objects and after that only cache, what is currently required/referenced". Although I personally see little (good) reason for it, it is possible to perform eager instantiation on the "first"† 1000 objects by adding them to the INSTANCES map on creation:
private static final ConcurrentMap<Integer, Reference<MyClass>> INSTANCES = createInstanceMap();
private static ConcurrentMap<Integer, Reference<MyClass>> createInstanceMap() {
// The set of keys to eagerly initialize instances for
final Stream<Integer> keys = IntStream.range(0, 1000).boxed();
final Collector<Integer, ?, ConcurrentMap<Integer, Reference<MyClass>>> mapFactory = Collectors
.toConcurrentMap(Function.identity(), key -> new SoftReference<>(new MyClass(key)));
return keys.collect(mapFactory);
}
†How you define which objects are the "first" ones is up to you; Here, I'm just using the natural order of the integer keys because it's suitable for a simple example.

Your function for examining your cache is cringe worthy. First, as you said, it creates all the cache objects. Second, it iterates Integer.MAX_VALUE times.
Better would be:
public void myFunction() {
for(MyClass c : map.values()) {
// DO STUFF
}
}
To the issue at hand: Is it possible to find out whether an Object has references to it?
Yes. It is possible. But you won't like it.
http://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/jvmti.html
jvmtiError
IterateOverReachableObjects(jvmtiEnv* env,
jvmtiHeapRootCallback heap_root_callback,
jvmtiStackReferenceCallback stack_ref_callback,
jvmtiObjectReferenceCallback object_ref_callback,
void* user_data)
Loop over all reachable objects in the heap. If a MyClass object is reachable, then, well, it is reachable.
Of course, by storing the object in your cache, you are making it reachable, so you'd have to change your cache to WeakReferences, and see if you can exclude those from the iteration.
And you're no longer using pure Java, and jvmti may not be supported by all VM's.
As I said, you won't like it.

Related

Association Object-Enum, which is more efficient?

Let's say I have the following enum:
public enum Example{
A,
B,
C,
D,
E;
}
I need an association between an Object and the enum above.
In my specific case, one object must have only one Example association, except for Example.B and Example.C, because an Object could eventually have these two associations.
Right now my solution is: I have created an object wrapper of a five booleans, each boolean representing an enum. When it's true it means there is an association with the represented Example constant.
In the end I have an association between the Object and the Wrapper.
The thing is that I believe it's unnecessary to carry around these five booleans, because in most cases only boolean will be true and in very few cases only two booleans will be true. But never more than two.
Then I thought that maybe an association between an Object and an ArrayList<Example> would be more appropriate. Or maybe even better an array Example[] of size 2.
What do you think?
Please, if you have any different suggestions let me know.

You may add a field of type Set to the class you like to associate with one or more Examples. Add those Example values to the set you like to associate your type with.
Code Example
public class MyObject {
public static MyObject createAssociatedWithBAndC() {
return new MyObject(Example.B, Example.C);
}
public static MyObject create(Example example) {
return new MyObject(example);
}
private final Set<Example> examples = new HashSet<>();
private MyObject(Example... examples) {
for (Example example : examples) {
this.examples.add(example);
}
}
}
And in case you really, really need to avoid using too much memory try it this way (reflecting your requirements) which uses null as associated with B and C:
public class MyObject {
private final Example example;
private MyObject(Example example) {
this.example = example;
}
public Set<Example> getExamples() {
return example == null
? EnumSet.of(Example.B, Example.C)
: EnumSet.of(example);
}
}
Now you hold only one or none Examples while none means: associated with B and C. And if instantiation the EnumSet is as well to expansive try:
public class MyObject {
private final Set<Example> examples;
private MyObject(Example example) {
this.example = example == null
? EnumSet.of(Example.B, Example.C)
: EnumSet.of(example);
}
public Set<Example> getExamples() {
return examples;
}
}

Which is the goal to have boolean values to transform them into Example enum values ?
This is better :
Then I thought that maybe an association between an Object and an
ArrayList would be more appropriate. Or maybe even better an
array Example[] of size 2.
However, array use has limitations.
An array that has a variable size as in your case is harder to manipulate as you have to remember how many elements are effectively contained in (generally with a integer value).
Besides, if the size of 2 always goes to 3, you have to change its declaration.
I would prefer a List implementation or a Set implementation as the enum are constant and unique values. You could use an EnumSet implementation.
You could instantiate them like that :
EnumSet<Example> examples = EnumSet.of(Example.A, Example.B);

Concurrent cache using WeakReference's throws an NPE

I need a concurrent cache of objects where each instance wraps a unique id (and maybe some extra information, which is omitted for simplicity in the code fragment below) and no more objects can be created than the number of corresponding ids,
and
I also need the objects to be GC'ed as soon as no other object references them (i. e. keep the memory foorprint as low as possible), so I want to use WeakReference's, not SoftReference's.
In the below example of a factory method, T is not a generic type -- instead, it can be thought of as some arbitrary class with an id field of type String, where all ids are unique. Each value (of type Reference<T>) is mapped to the corresponding id:
static final ConcurrentMap<String, WeakReference<T>> INSTANCES = new ConcurrentHashMap<>();
#NotNull
public static T from(#NotNull final String id) {
final AtomicReference<T> instanceRef = new AtomicReference<>();
final T newInstance = new T(id);
INSTANCES.putIfAbsent(id, new WeakReference<>(newInstance));
/*
* At this point, the mapping is guaranteed to exist.
*/
INSTANCES.computeIfPresent(id, (k, ref) -> {
final T oldInstance = ref.get();
if (oldInstance == null) {
/*
* The object referenced by ref has been GC'ed.
*/
instanceRef.set(newInstance);
return new WeakReference<>(newInstance);
}
instanceRef.set(oldInstance);
return ref;
});
return instanceRef.get();
}
The subject of WeakReference's needing to be GC'ed once they're cleared (i. e. the referrant object GC'ed) is out of scope of this question -- in the production code, this is implemented using reference queues.
AtomicReference is used solely for the purpose of returning a value from outside the lambda (which is executed in the same thread as the factory method itself).
Now, the question.
After a couple of weeks of the code running successfully, I've received an NPE which originates from the extra null checks IntelliJ IDEA added thanks to #NotNull annotations:
java.lang.IllegalStateException: #NotNull method com/example/T.from must not return null
In practice, this means that instanceRef value wasn't set in either of the branches, or the whole computeIfPresent(...) method wasn't called.
The only possiblity for a race condition I see is the map entry being removed (from a separate thread processing reference queues to GC'ed instances) somewhere between putIfAbsent(...) and computeIfPresent(...) calls.
Is there any extra room for a race condition I am missing?

You must remember that not only can other threads be happening but also GC. Consider this fragment:
instanceRef.set(oldInstance);
return ref;
});
// Here!!!!!
return instanceRef.get();
What do you think would be the effect if a GC kicked in at the Here point?
I suspect your fault is in the #NotNull because this method can return null.
Added - Logic
If the final instanceRef.get() is returning null (as is implied) then the following statements can be made.
The key was present and the oldInstance had been GCd. A certainly non-null newInstance is recorded.
// This line MUST be executed.
instanceRef.set(newInstance);
The key was present and the oldInstance had not been GCd. A certainly non-null oldInstance is recorded.
// This line MUST be executed.
instanceRef.set(oldInstance);
The key was NOT present.
Therefore the problem could occur when the instance is present when putIfAbsent is called but gone by the time computeIfPresent is executed. This scenario could occur if an item is deleted between the putIfAbsent and the computeIfPresent. However, finding a route that returns null when no deletion is occuring is difficult.
Possible Solution
You could, perhaps, ensure that the item being referenced is always recorded in the reference.
#NotNull
public static Thing fromMe(#NotNull final String id) {
// Keep track of the thing I've created (if any)
// Use AtomicReference as a mutable final.
// NB: Also delays GC as a hard reference is held.
final AtomicReference<Thing> thing = new AtomicReference<>();
// Make the map entry if not exists.
INSTANCES.computeIfAbsent(id,
// New one only made if not present.
r -> new WeakReference<>(newThing(thing, id)));
// Grab it - whatever it's contents.
// NB: Parallel deletions will cause a NPE here.
trackThing(thing, INSTANCES.get(id).get());
// Has it been GC'd
if (thing.get() == null) {
// Make it again!
INSTANCES.put(id, new WeakReference<>(newThing(thing, id)));
}
return thing.get();
}
// Makes a new Thing - keeping track of the new one in the reference.
static Thing newThing(AtomicReference<Thing> thing, String id) {
// Make the new Thing.
return trackThing(thing, new Thing(id));
}
// Tracks the Thing in the Atomic.
static Thing trackThing(AtomicReference<Thing> thing, Thing it) {
// Keep track of it.
thing.set(it);
return it;
}

Change the state of an object that is stored in a set

Is there any good advice how to deal with mutable objects that are stored in Sets?
Some objects may define their equality (and hashCode) based on their internal state. When such objects are stored in Sets, in- or outside the controlled code, a mutation of the state may lead to inconsistency in the Set.
Is there any "best-practice" to avoid or deal with that, aside don't do it?
Example code:
static class A {
String s;
public boolean equals(Object o) {
return s.equals(((A)o).s);
}
public int hashCode() {
return s.hashCode();
}
public String toString() {
return s;
}
}
public static void main(String[] args) {
A a0 = new A();
a0.s = "Hello";
A a1 = new A();
a1.s = "World";
HashSet<A> set = new HashSet<A>();
set.add(a0);
set.add(a1);
System.out.println(set);
a0.s = "World";
System.out.println(set);
}

At some point the development team got this problem a lot when working with collections of entities that only got their key before being stored to the database. Their hashcode/equals depended on that key...
The solution we came up with was something along these lines:
public static <P> void rearrange(Set<P> set) {
HashSet<P> temp = new HashSet<P>();
temp.addAll(set);
set.clear();
set.addAll(temp);
}
Another idea was a Set implementation decorating a HashSet, but we quickly decided this would cause more problems on the long run. For the most part the above method is executed by 'framework' code transparently to the developer, but I am still not particularly happy with this solution.

Set interface doesn't provide any method for getting a value and there is a reason to do that.
Java Set API
So it doesn't expect that you get a value, modify it and then add it again. In your code that's what you doing; you modified a value stored in Set.
So ideally you should avoid it but if you want to modify any object inside Set then do below steps:
Remove Object from Set
Add a new modified object
This way your hashcode/equality contract will be guaranteed.

When should I create a class for a Map key?

I'm using Java 6.
Suppose I have a class which I would like to save its instances into a map. Later on I would like to retrieve instances using only the "key fields". I'll ignore field modifiers, getters, and setters for conciseness.
class A {
String field1;
String field2;
String field3;
String field4;
//more fields
public int hashCode(){
//uses only field1 and field2
}
public boolean equals(Object o){
//uses only field1 and field2
}
}
Since Java's standard API doesn't have the MultikeyMap and I don't want to use 3rd party libraries, I have a choice of
1) creating a new class KeyA to represent the key of a map
2) use A itself as the key and populate only the "key fields" when I need to retrieve objects from a map
3) nest the maps, e.g. HashMap<String, HashMap<String, A>>
4) other workarounds
What do people normally use and when?

Given your recent edit, you should be fine to use instances of class A as keys in this situation. Lookups will be done based on the semantics of equals() and hashCode(), so this will cause instances to be retrieved by only the "key fields". Hence the following code would work as you intend:
final Map<A, String> map = new HashMap<A, Object>();
final A first = new A("fe", "fi", "fo", "fum");
map.put(first, "success");
// later on
final A second = new A ("fe", "fi", "foo", "bar");
System.out.println(map.get(second)); // prints "success";
Having said that, your description of option 2 makes me a little concerned that this might not be the most sensible option. If you create a Map<A, String>, that's a mapping from instances of class A to strings. Yet your second point implies that you want to think of it as a mapping from pairs of key fields to strings. If you're going to usually look up values based on a couple of "raw" strings, then I'd advise against this. It feels wrong (to me), to create a "fake" instance of A just to do a lookup - so in this case, you probably should create a key class that embodies the pair of strings as described in option 1. (You could even embed instances of these within your A objects to hold the key fields).
There's a similar argument for or against option 3, too. If the strings really are conceptually hierarchical, then it might well make sense. For example, if field1 was Country, and field2 was Town, one could definitely argue that the nested maps make sense - you have a mapping from country, to the map of Town->A relations within that country. But if your keys don't naturally compose in this fashion (say, if they were (x, y) coordinates), this would again not be a very natural way to represent the data, and a single-level map from XYPoint to value would be more sensible. (Likewise, if you never use the two-level map except to always go straight through both layers, one could argue the one-level map still makes more sense.)
And finally, as for option 4 -if you're always mapping to A itself, and storing the key as its own value (e.g. if you want to canonicalise your A instances, a bit like String.intern()) then as was pointed out you needn't use a Map at all, and a Set will do the job here. The Map is useful when you want to establish relationships between different objects, whereas a Set automatically gives you the uniqueness of objects without any extra conceptual overhead.
If you do use the class itself as a key, be warned though that objects should only generally be used as keys if their hashCode (and the behaviour of equals) won't change over time. Typically this means the keys are immutable, though here you could afford to have mutable "non-key" fields. If you were to break this rule, you'd see odd behaviour such as the following:
// Populate a map, with an A as the key
final Map<A, String> map = new HashMap<A, Object>();
final A a = new A("one", "two", "three", "four");
map.put(a, "here");
// Mutate a
a.setField1("un");
// Now look up what we associated with it
System.out.println(map.get(a)); // prints "null" - huh?
System.out.println(map.containsKey(a)); // prints "false"

I'd create an Index class, something like this (warning: untested code), to abstract out the indexing functionality. Why Java doesn't have something like this already is puzzling to me.
interface Indexer<T, K>
{
/** extract key from index */
public K getIndexKey(T object);
}
class Index<T,K>
{
final private HashMap<K,List<T>> indexMap = new HashMap<K,List<T>>();
final private Indexer<T,K> indexer;
public Index(Indexer<T,K> indexer)
{
this.indexer = indexer;
}
public void add(T object) {
K key = this.indexer.getIndexKey(object);
List<T> values = this.indexMap.get(key);
if (values == null)
{
values = new ArrayList<T>();
this.indexMap.put(key, values);
}
values.add(object);
}
public void remove(T object) {
K key = this.indexer.getIndexKey(object);
List<T> values = this.indexMap.get(key);
if (values != null)
{
values.remove(object);
}
}
public List<T> lookup(K key) {
List<T> values = this.indexMap.get(key);
return values == null
? Collections.emptyList()
: Collections.unmodifiableList(values);
}
}
example relevant to your class A:
Index<A,String> index1 = new Index<A,String>(new Indexer<A,String>() {
#Override public String getIndexKey(A object)
{
return object.field1;
}
});
Index<A,String> index2 = new Index<A,String>(new Indexer<A,String>() {
#Override public String getIndexKey(A object)
{
return object.field2;
}
});
/* repeat for all desired fields */
You would manually have to add and remove entries from the indices, but all the grungework below those operations is handled by the Index class.

Your class has "key fields". I would suggest to create a parent class, ParentA, with those key fields (which certainly map to a concept in your domain) and inherit this class in your child class A.
Override hashCode() and equals() in the ParentA class.
Use a Map<ParentA, A> to store your A instances and give the instance as key and value.
To retrieve a specific A instance, create a new ParentA instance, pA, with your key fields set, and do
A a = map.get(pA);
That's it.
Another way is to create a AIdentifier class with key fields and add an instance as A property id. So you add your instance with map.put(a.id, a); That's inheritance vs composition pattern discussion :)

Caching objects built with multiple parameters

I have a factory that creates objects of class MyClass, returning already generated ones when they exist. As I have the creation method (getOrCreateMyClass) taking multiple parameters, which is the best way to use a Map to store and retrieve the objects?
My current solution is the following, but it doesn't sound too clear to me.
I use the hashCode method (slightly modified) of class MyClass to build an int based on the parameters of class MyClass, and I use it as the key of the Map.
import java.util.HashMap;
import java.util.Map;
public class MyClassFactory {
static Map<Integer, MyClass> cache = new HashMap<Integer, MyClass>();
private static class MyClass {
private String s;
private int i;
public MyClass(String s, int i) {
}
public static int getHashCode(String s, int i) {
final int prime = 31;
int result = 1;
result = prime * result + i;
result = prime * result + ((s == null) ? 0 : s.hashCode());
return result;
}
#Override
public int hashCode() {
return getHashCode(this.s, this.i);
}
}
public static MyClass getOrCreateMyClass(String s, int i) {
int hashCode = MyClass.getHashCode(s, i);
MyClass a = cache.get(hashCode);
if (a == null) {
a = new MyClass(s, i);
cache.put(hashCode , a);
}
return a;
}
}

Your getOrCreateMyClass doesn't seem to add to the cache if it creates.
I think this will also not perform correctly when hashcodes collide. Identical hashcodes do not imply equal objects. This could be the source of the bug you mentioned in a comment.
You might consider creating a generic Pair class with actual equals and hashCode methods and using Pair<String, Integer> class as the map key for your cache.
Edit:
The issue of extra memory consumption by storing both a Pair<String, Integer> key and a MyClass value might be best dealt with by making the Pair<String, Integer> into a field of MyClass and thereby having only one reference to this object.
With all of this though, you might have to worry about threading issues that don't seem to be addressed yet, and which could be another source of bugs.
And whether it is actually a good idea at all depends on whether the creation of MyClass is much more expensive than the creation of the map key.
Another Edit:
ColinD's answer is also reasonable (and I've upvoted it), as long as the construction of MyClass is not expensive.
Another approach that might be worth consideration is to use a nested map Map<String, Map<Integer, MyClass>>, which would require a two-stage lookup and complicate the cache updating a bit.

You really shouldn't be using the hashcode as the key in your map. A class's hashcode is not intended to necessarily guarantee that it will not be the same for any two non-equal instances of that class. Indeed, your hashcode method could definitely produce the same hashcode for two non-equal instances. You do need to implement equals on MyClass to check that two instances of MyClass are equal based on the equality of the String and int they contain. I'd also recommend making the s and i fields final to provide a stronger guarantee of the immutability of each MyClass instance if you're going to be using it this way.
Beyond that, I think what you actually want here is an interner.... that is, something to guarantee that you'll only ever store at most 1 instance of a given MyClass in memory at a time. The correct solution to this is a Map<MyClass, MyClass>... more specifically a ConcurrentMap<MyClass, MyClass> if there's any chance of getOrCreateMyClass being called from multiple threads. Now, you do need to create a new instance of MyClass in order to check the cache when using this approach, but that's inevitable really... and it's not a big deal because MyClass is easy to create.
Guava has something that does all the work for you here: its Interner interface and corresponding Interners factory/utility class. Here's how you might use it to implement getOrCreateMyClass:
private static final Interner<MyClass> interner = Interners.newStrongInterner();
public static MyClass getOrCreateMyClass(String s, int i) {
return interner.intern(new MyClass(s, i));
}
Note that using a strong interner will, like your example code, keep each MyClass it holds in memory as long as the interner is in memory, regardless of whether anything else in the program has a reference to a given instance. If you use newWeakInterner instead, when there isn't anything elsewhere in your program using a given MyClass instance, that instance will be eligible for garbage collection, helping you not waste memory with instances you don't need around.
If you choose to do this yourself, you'll want to use a ConcurrentMap cache and use putIfAbsent. You can take a look at the implementation of Guava's strong interner for reference I imagine... the weak reference approach is much more complicated though.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Check if reference to object in Map exists - java

Related

Association Object-Enum, which is more efficient?

Concurrent cache using WeakReference's throws an NPE

Change the state of an object that is stored in a set

When should I create a class for a Map key?

Caching objects built with multiple parameters

Categories

Resources