how do I get a unique ID per object in Java? [duplicate] - java

This question already has answers here:
How to get the unique ID of an object which overrides hashCode()?
(11 answers)
Closed 9 years ago.
I made a vector set in order to avoid thrashing the GC with iterator allocations and the like
( you get a new/free each for both the set reference and the set iterator for each traversal of a HashSet's values or keys )
anyway supposedly the Object.hashCode() method is a unique id per object. (would fail for a 64 bit version?)
But in any case it is overridable and therefore not guaranteed unique, nor unique per object instance.
If I want to create an "ObjectSet" how do I get a guaranteed unique ID for each instance of an object??
I just found this: which answers it.
How to get the unique ID of an object which overrides hashCode()?

The simplest solution is to add a field to the object. This is the fastest and most efficient solution and avoid any issues of objects failing to be cleaned up.
abstract Ided {
static final AtomicLong NEXT_ID = new AtomicLong(0);
final long id = NEXT_ID.getAndIncrement();
public long getId() {
return id;
}
}
If you can't modify the class, you can use an IdentityHashMap like #glowcoder's deleted solution.
private static final Map<Object, Long> registry = new IdentityHashMap<Object, Long>();
private static long nextId = 0;
public static long idFor(Object o) {
Long l = registry.get(o);
if (l == null)
registry.put(o, l = nextId++);
return l;
}
public static void remove(Object o) {
registry.remove(o);
}

No, that's not how hashCode() works. The returned value does not have to be unique. The exact contract is spelled out in the documentation.
Also,
supposedly the Object.hashCode() method is a unique id per object
is not true. To quote the documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

java.lang.System.identityHashCode(obj); will do this for you, if you really need it and understand the repercussions. It gets the identity hashcode, even if the method to provide the hashcode has been overridden.

Trying to outperform the java GC sounds like premature optimization to me.
The GC is already tuned to handle small short-lived objects. If you have performance issues with GC, you ought to help the GC, not re-implement it (IMNSHO)

Related

How to implement Comparable so it is consistent with identity-equality

I have a class for which equality (as per equals()) must be defined by the object identity, i.e. this == other.
I want to implement Comparable to order such objects (say by some getName() property). To be consistent with equals(), compareTo() must not return 0, even if two objects have the same name.
Is there a way to compare object identities in the sense of compareTo? I could compare System.identityHashCode(o), but that would still return 0 in case of hash collisions.
I think the real answer here is: don't implement Comparable then. Implementing this interface implies that your objects have a natural order. Things that are "equal" should be in the same place when you follow up that thought.
If at all, you should use a custom comparator ... but even that doesn't make much sense. If the thing that defines a < b ... is not allowed to give you a == b (when a and b are "equal" according to your < relation), then the whole approach of comparing is broken for your use case.
In other words: just because you can put code into a class that "somehow" results in what you want ... doesn't make it a good idea to do so.
By definition, by assigning each object a Universally unique identifier (UUID) (or a Globally unique identifier, (GUID)) as it's identity property, the UUID is comparable, and consistent with equals. Java already has a UUID class, and once generated, you can just use the string representation for persistence. The dedicated property will also insure that the identity is stable across versions/threads/machines. You could also just use an incrementing ID if you have a method of insuring everything gets a unique ID, but using a standard UUID implementation will protect you from issues from set merges and parallel systems generating data at the same time.
If you use anything else for the comparable, that means that it is comparable in a way separate from its identity/value. So you will need to define what comparable means for this object, and document that. For example, people are comparable by name, DOB, height, or a combination by order of precedence; most naturally by name as a convention (for easier lookup by humans) which is separate from if two people are the same person. You will also have to accept that compareto and equals are disjoint because they are based on different things.
You could add a second property (say int id or long id) which would be unique for each instance of your class (you can have a static counter variable and use it to initialize the id in your constructor).
Then your compareTo method can first compare the names, and if the names are equal, compare the ids.
Since each instance has a different id, compareTo will never return 0.
While I stick by my original answer that you should use a UUID property for a stable and consistent compare / equality setup, I figured I'd go ahead an answer the question of "how far could you go if you were REALLY paranoid and wanted a guaranteed unique identity for comparable".
Basically, in short if you don't trust UUID uniqueness or identity uniqueness, just use as many UUIDs as it takes to prove god is actively conspiring against you. (Note that while not technically guaranteed not to throw an exception, needing 2 UUID should be overkill in any sane universe.)
import java.time.Instant;
import java.util.ArrayList;
import java.util.UUID;
public class Test implements Comparable<Test>{
private final UUID antiCollisionProp = UUID.randomUUID();
private final ArrayList<UUID> antiuniverseProp = new ArrayList<UUID>();
private UUID getParanoiaLevelId(int i) {
while(antiuniverseProp.size() < i) {
antiuniverseProp.add(UUID.randomUUID());
}
return antiuniverseProp.get(i);
}
#Override
public int compareTo(Test o) {
if(this == o)
return 0;
int temp = System.identityHashCode(this) - System.identityHashCode(o);
if(temp != 0)
return temp;
//If the universe hates you
temp = this.antiCollisionProp.compareTo(o.antiCollisionProp);
if(temp != 0)
return temp;
//If the universe is activly out to get you
temp = System.identityHashCode(this.antiCollisionProp) - System.identityHashCode(o.antiCollisionProp);;
if(temp != 0)
return temp;
for(int i = 0; i < Integer.MAX_VALUE; i++) {
UUID id1 = this.getParanoiaLevelId(i);
UUID id2 = o.getParanoiaLevelId(i);
temp = id1.compareTo(id2);
if(temp != 0)
return temp;
temp = System.identityHashCode(id1) - System.identityHashCode(id2);;
if(temp != 0)
return temp;
}
// If you reach this point, I have no idea what you did to deserve this
throw new IllegalStateException("RAGNAROK HAS COME! THE MIDGARD SERPENT AWAKENS!");
}
}
Assuming that with two objects with same name, if equals() returns false then compareTo() should not return 0. If this is what you want to do then following can help:
Override hashcode() and make sure it doesn't rely solely on name
Implement compareTo() as follows:
public void compareTo(MyObject object) {
this.equals(object) ? this.hashcode() - object.hashcode() : this.getName().compareTo(object.getName());
}
You are having unique objects, but as Eran said you may need an extra counter/rehash code for any collisions.
private static Set<Pair<C, C> collisions = ...;
#Override
public boolean equals(C other) {
return this == other;
}
#Override
public int compareTo(C other) {
...
if (this == other) {
return 0
}
if (super.equals(other)) {
// Some stable order would be fine:
// return either -1 or 1
if (collisions.contains(new Pair(other, this)) {
return 1;
} else if (!collisions.contains(new Pair(this, other)) {
collisions.add(new Par(this, other));
}
return 1;
}
...
}
So go with the answer of Eran or put the requirement as such in question.
One might consider the overhead of non-identical 0 comparisons neglectable.
One might look into ideal hash functions, if at some point of time no longer instances are created. This implies you have a collection of all instances.
There are times (although rare) when it is necessary to implement an identity-based compareTo override. In my case, I was implementing java.util.concurrent.Delayed.
Since the JDK also implements this class, I thought I would share the JDK's solution, which uses an atomically incrementing sequence number. Here is a snippet from ScheduledThreadPoolExecutor (slightly modified for clarity):
/**
* Sequence number to break scheduling ties, and in turn to
* guarantee FIFO order among tied entries.
*/
private static final AtomicLong sequencer = new AtomicLong();
private class ScheduledFutureTask<V>
extends FutureTask<V> implements RunnableScheduledFuture<V> {
/** Sequence number to break ties FIFO */
private final long sequenceNumber = sequencer.getAndIncrement();
}
If the other fields used in compareTo are exhausted, this sequenceNumber value is used to break ties. The range of a 64bit integer (long) is sufficiently large to count on this.

Java - Check if reference to object in Map exists

A few weeks back I wrote a Java class with the following behavior:
Each object contains a single final integer field
The class contains a static Map (Key: Integer, Content: MyClass)
Whenever an object of the class is instantiated a look-up is done, if an object with the wanted integer field already exists in the static map: return it, otherwise create one and put it in the map.
As code:
public class MyClass
{
private static Map<Integer, MyClass> map;
private final int field;
static
{
map = new HashMap<>();
}
private MyClass(int field)
{
this.field = field;
}
public static MyClass get(int field)
{
synchronized (map)
{
return map.computeIfAbsent(field, MyClass::new);
}
}
}
This way I can be sure, that only one object exists for each integer (as field). I'm currently concerned, that this will prevent the GC to collect objects, which I no longer need, since the objects are always stored in the map (a reference exists)...
If I wrote a loop like function like this:
public void myFunction() {
for (int i = 0; i < Integer.MAX_VALUE; i++) {
MyClass c = MyClass.get(i);
// DO STUFF
}
}
I would end up with Integer.MAX_VALUE objects in memory after calling the method. Is there a way I can check, whether references to objects in the map exists and otherwise remove them?
This looks like a typical case of the multiton pattern: You want to have at most one instance of MyClass for a given key. However, you also seem to want to limit the amount of instances created. This is very easy to do by lazily instantiating your MyClass instances as you need them. Additionally, you want to clean up unused instances:
Is there a way I can check, whether references to objects in the map exists and otherwise remove them?
This is exactly what the JVM's garbage collector is for; There is no reason to try to implement your own form of "garbage collection" when the Java core library already provides tools for marking certain references as "not strong", i.e. should refer to a given object only if there is a strong reference (i.e. in Java, a "normal" reference) somewhere referring to it.
Implementation using Reference objects
Instead of a Map<Integer, MyClass>, you should use a Map<Integer, WeakReference<MyClass>> or a Map<Integer, SoftReference<MyClass>>: Both WeakReference and SoftReference allow the MyClass instances they refer to to be garbage-collected if there are no strong (read: "normal") references to the object. The difference between the two is that the former releases the reference on the next garbage collection action after all strong references are gone, while the latter one only releases the reference when it "has to", i.e. at some point which is convenient for the JVM (see related SO question).
Plus, there is no need to synchronize your entire Map: You can simply use a ConcurrentHashMap (which implements ConcurrentMap), which handles multi-threading in a way much better than by locking all access to the entire map. Therefore, your MyClass.get(int) could look like this:
private static final ConcurrentMap<Integer, Reference<MyClass>> INSTANCES = new ConcurrentHashMap<>();
public static MyClass get(final int field) {
// ConcurrentHashMap.compute(...) is atomic <https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#compute-K-java.util.function.BiFunction->
final Reference<MyClass> ref = INSTANCES.compute(field, (key, oldValue) -> {
final Reference<MyClass> newValue;
if (oldValue == null) {
// No instance has yet been created; Create one
newValue = new SoftReference<>(new MyClass(key));
} else if (oldValue.get() == null) {
// The old instance has already been deleted; Replace it with a
// new reference to a new instance
newValue = new SoftReference<>(new MyClass(key));
} else {
// The existing instance has not yet been deleted; Re-use it
newValue = oldValue;
}
return newValue;
});
return ref.get();
}
Finally, in a comment above, you mentioned that you would "prefer to cache maybe up to say 1000 objects and after that only cache, what is currently required/referenced". Although I personally see little (good) reason for it, it is possible to perform eager instantiation on the "first"† 1000 objects by adding them to the INSTANCES map on creation:
private static final ConcurrentMap<Integer, Reference<MyClass>> INSTANCES = createInstanceMap();
private static ConcurrentMap<Integer, Reference<MyClass>> createInstanceMap() {
// The set of keys to eagerly initialize instances for
final Stream<Integer> keys = IntStream.range(0, 1000).boxed();
final Collector<Integer, ?, ConcurrentMap<Integer, Reference<MyClass>>> mapFactory = Collectors
.toConcurrentMap(Function.identity(), key -> new SoftReference<>(new MyClass(key)));
return keys.collect(mapFactory);
}
†How you define which objects are the "first" ones is up to you; Here, I'm just using the natural order of the integer keys because it's suitable for a simple example.
Your function for examining your cache is cringe worthy. First, as you said, it creates all the cache objects. Second, it iterates Integer.MAX_VALUE times.
Better would be:
public void myFunction() {
for(MyClass c : map.values()) {
// DO STUFF
}
}
To the issue at hand: Is it possible to find out whether an Object has references to it?
Yes. It is possible. But you won't like it.
http://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/jvmti.html
jvmtiError
IterateOverReachableObjects(jvmtiEnv* env,
jvmtiHeapRootCallback heap_root_callback,
jvmtiStackReferenceCallback stack_ref_callback,
jvmtiObjectReferenceCallback object_ref_callback,
void* user_data)
Loop over all reachable objects in the heap. If a MyClass object is reachable, then, well, it is reachable.
Of course, by storing the object in your cache, you are making it reachable, so you'd have to change your cache to WeakReferences, and see if you can exclude those from the iteration.
And you're no longer using pure Java, and jvmti may not be supported by all VM's.
As I said, you won't like it.

immutable objects and lazy initialization.

http://www.javapractices.com/topic/TopicAction.do?Id=29
Above is the article which i am looking at. Immutable objects greatly simplify your program, since they:
allow hashCode to use lazy initialization, and to cache its return value
Can anyone explain me what the author is trying to say on the above
line.
Is my class immutable if its marked final and its instance variable
still not final and vice-versa my instance variables being final and class being normal.
As explained by others, because the state of the object won't change the hashcode can be calculated only once.
The easy solution is to precalculate it in the constructor and place the result in a final variable (which guarantees thread safety).
If you want to have a lazy calculation (hashcode only calculated if needed) it is a little more tricky if you want to keep the thread safety characteristics of your immutable objects.
The simplest way is to declare a private volatile int hash; and run the calculation if it is 0. You will get laziness except for objects whose hashcode really is 0 (1 in 4 billion if your hash method is well distributed).
Alternatively you could couple it with a volatile boolean but need to be careful about the order in which you update the two variables.
Finally for extra performance, you can use the methodology used by the String class which uses an extra local variable for the calculation, allowing to get rid of the volatile keyword while guaranteeing correctness. This last method is error prone if you don't fully understand why it is done the way it is done...
If your object is immutable it can't change it's state and therefore it's hashcode can't change. That allows you to calculate the value once you need it and to cache the value since it will always stay the same. It's in fact a very bad idea to implement your own hasCode function based on mutable state since e.g. HashMap assumes that the hash can't change and it will break if it does change.
The benefit of lazy initialization is that hashcode calculation is delayed until it is required. Many object don't need it at all so you save some calculations. Especially expensive hash calculations like on long Strings benefit from that.
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}
Edit: as pointed out by #assylias, using unsynchronized / non volatile code is only guaranteed to work if there is only 1 read of hashCode because every consecutive read of that field could return 0 even though the first read could already see a different value. Above version fixes the problem.
Edit2: replaced with more obvious version, slightly less code but roughly equivalent in bytecode
public int hashCode() {
int h = hashCode; // only read
return h != 0 ? h : (hashCode = a + b);
// ^- just a (racy) write to hashCode, no read
}
What that line means is, since the object is immutable, then the hashCode has to only be computed once. Further, it doesn't have to be computed when the object is constructed - it only has to be computed when the function is first called. If the object's hashCode is never used then it is never computed. So the hashCode function can look something like this:
#Override public int hashCode(){
synchronized (this) {
if (!this.computedHashCode) {
this.hashCode = expensiveComputation();
this.computedHashCode = true;
}
}
return this.hashCode;
}
And to add to other answers.
Immutable object cannot be changed. The final keyword works for basic data types such as int. But for custom objects it doesn't mean that - it has to be done internally in your implementation:
The following code would result in a compilation error, because you are trying to change a final reference/pointer to an object.
final MyClass m = new MyClass();
m = new MyClass();
However this code would work.
final MyClass m = new MyClass();
m.changeX();

Using UUIDs for cheap equals() and hashCode()

I have an immutable class, TokenList, which consists of a list of Token objects, which are also immutable:
#Immutable
public final class TokenList {
private final List<Token> tokens;
public TokenList(List<Token> tokens) {
this.tokens = Collections.unmodifiableList(new ArrayList(tokens));
}
public List<Token> getTokens() {
return tokens;
}
}
I do several operations on these TokenLists that take multiple TokenLists as inputs and return a single TokenList as the output. There can be arbitrarily many TokenLists going in, and each can have arbitrarily many Tokens.
These operations are expensive, and there is a good chance that the same operation (ie the same inputs) will be performed multiple times, so I would like to cache the outputs. However, performance is critical, and I am worried about the expense of performing hashCode() and equals() on these objects that may contain arbitrarily many elements (as they are immutable then hashCode could be cached, but equals will still be expensive).
This led me to wondering whether I could use a UUID to provide equals() and hashCode() simply and cheaply by making the following updates to TokenList:
#Immutable
public final class TokenList {
private final List<Token> tokens;
private final UUID uuid;
public TokenList(List<Token> tokens) {
this.tokens = Collections.unmodifiableList(new ArrayList(tokens));
this.uuid = UUID.randomUUID();
}
public List<Token> getTokens() {
return tokens;
}
public UUID getUuid() {
return uuid;
}
}
And something like this to act as a cache key:
#Immutable
public final class TopicListCacheKey {
private final UUID[] uuids;
public TopicListCacheKey(TopicList... topicLists) {
uuids = new UUID[topicLists.length];
for (int i = 0; i < uuids.length; i++) {
uuids[i] = topicLists[i].getUuid();
}
}
#Override
public int hashCode() {
return Arrays.hashCode(uuids);
}
#Override
public boolean equals(Object other) {
if (other == this) return true;
if (other instanceof TopicListCacheKey)
return Arrays.equals(uuids, ((TopicListCacheKey) other).uuids);
return false;
}
}
I figure that there are 2^128 different UUIDs and I will probably have at most around 1,000,000 TokenList objects active in the application at any time. Given this, and the fact that the UUIDs are used combinatorially in cache keys, it seems that the chances of this producing the wrong result are vanishingly small. Nevertheless, I feel uneasy about going ahead with it as it just feels 'dirty'. Are there any reasons I should not use this system? Will the performance costs of the SecureRandom used by UUID.randomUUID() outweigh the gains (especially since I expect multiple threads to be doing this at the same time)? Are collisions going to be more likely than I think? Basically, is there anything wrong with doing it this way??
Thanks.
What you are trying is very tricky and needs detailed analysis. So you need to check below questions before deciding on any approach.
These operations are expensive, and there is a good chance that the same operation (ie the same inputs) will be performed multiple times
1) When you say "Same input" in the above line, what do you mean exactly? Does this mean, exact same object i.e. one object referred through several references (same memory location) or does it mean memory-wise separate objects but having logically the same data??
Here if the object is same i.e. same memory location, then == comparison would do fine. For this you have to keep object reference as key in cache.
But if it's the second case i.e. memory-wise separate objects but logically same, then I don't think UUID will help you. Because you have to make sure that such 2 separate objects will get same UUID. THis won't be much easy as anyways you have to go through whole TokenList data to make sure of this
2) Using hashcode in cache, is it safe? I suggest not to use hashcode as key because even though 2 objects are different, they may have the same hashcode. So your logic may go horribly wrong.
So get answers for these questions clear first & only then think about approach.
SecureRandom won't give you any boost, it is just "more" random than a normal Random. The chance of a collision is something on the order of the number squared divided by the total possible UUIDs, so a very small number. Still, I wouldn't rely on the number always being unique. You could try this, but it would be best to check and make sure the number isn't already included somewhere else in the hashcode list. Otherwise, you might get yourself into some very weird problems...

Caching objects built with multiple parameters

I have a factory that creates objects of class MyClass, returning already generated ones when they exist. As I have the creation method (getOrCreateMyClass) taking multiple parameters, which is the best way to use a Map to store and retrieve the objects?
My current solution is the following, but it doesn't sound too clear to me.
I use the hashCode method (slightly modified) of class MyClass to build an int based on the parameters of class MyClass, and I use it as the key of the Map.
import java.util.HashMap;
import java.util.Map;
public class MyClassFactory {
static Map<Integer, MyClass> cache = new HashMap<Integer, MyClass>();
private static class MyClass {
private String s;
private int i;
public MyClass(String s, int i) {
}
public static int getHashCode(String s, int i) {
final int prime = 31;
int result = 1;
result = prime * result + i;
result = prime * result + ((s == null) ? 0 : s.hashCode());
return result;
}
#Override
public int hashCode() {
return getHashCode(this.s, this.i);
}
}
public static MyClass getOrCreateMyClass(String s, int i) {
int hashCode = MyClass.getHashCode(s, i);
MyClass a = cache.get(hashCode);
if (a == null) {
a = new MyClass(s, i);
cache.put(hashCode , a);
}
return a;
}
}
Your getOrCreateMyClass doesn't seem to add to the cache if it creates.
I think this will also not perform correctly when hashcodes collide. Identical hashcodes do not imply equal objects. This could be the source of the bug you mentioned in a comment.
You might consider creating a generic Pair class with actual equals and hashCode methods and using Pair<String, Integer> class as the map key for your cache.
Edit:
The issue of extra memory consumption by storing both a Pair<String, Integer> key and a MyClass value might be best dealt with by making the Pair<String, Integer> into a field of MyClass and thereby having only one reference to this object.
With all of this though, you might have to worry about threading issues that don't seem to be addressed yet, and which could be another source of bugs.
And whether it is actually a good idea at all depends on whether the creation of MyClass is much more expensive than the creation of the map key.
Another Edit:
ColinD's answer is also reasonable (and I've upvoted it), as long as the construction of MyClass is not expensive.
Another approach that might be worth consideration is to use a nested map Map<String, Map<Integer, MyClass>>, which would require a two-stage lookup and complicate the cache updating a bit.
You really shouldn't be using the hashcode as the key in your map. A class's hashcode is not intended to necessarily guarantee that it will not be the same for any two non-equal instances of that class. Indeed, your hashcode method could definitely produce the same hashcode for two non-equal instances. You do need to implement equals on MyClass to check that two instances of MyClass are equal based on the equality of the String and int they contain. I'd also recommend making the s and i fields final to provide a stronger guarantee of the immutability of each MyClass instance if you're going to be using it this way.
Beyond that, I think what you actually want here is an interner.... that is, something to guarantee that you'll only ever store at most 1 instance of a given MyClass in memory at a time. The correct solution to this is a Map<MyClass, MyClass>... more specifically a ConcurrentMap<MyClass, MyClass> if there's any chance of getOrCreateMyClass being called from multiple threads. Now, you do need to create a new instance of MyClass in order to check the cache when using this approach, but that's inevitable really... and it's not a big deal because MyClass is easy to create.
Guava has something that does all the work for you here: its Interner interface and corresponding Interners factory/utility class. Here's how you might use it to implement getOrCreateMyClass:
private static final Interner<MyClass> interner = Interners.newStrongInterner();
public static MyClass getOrCreateMyClass(String s, int i) {
return interner.intern(new MyClass(s, i));
}
Note that using a strong interner will, like your example code, keep each MyClass it holds in memory as long as the interner is in memory, regardless of whether anything else in the program has a reference to a given instance. If you use newWeakInterner instead, when there isn't anything elsewhere in your program using a given MyClass instance, that instance will be eligible for garbage collection, helping you not waste memory with instances you don't need around.
If you choose to do this yourself, you'll want to use a ConcurrentMap cache and use putIfAbsent. You can take a look at the implementation of Guava's strong interner for reference I imagine... the weak reference approach is much more complicated though.

Categories

Resources