Enforcing a unique id in a class - java

Just for the sake of a thought exercise, how could the uniqueness of an attribute be enforced for each instance of a given class ?
Uniqueness here can be defined as being on a single JVM and within a single user session.
This is at Java-level and not to do with databases, the main purpose being to verify if a collision has occurred.
The first obvious step is to have a static attribute at class level.
Having an ArrayList or other container seems impractical as the number of instances rises.
Incrementing a numeric counter at class level appears to be a simplest approach but the id must always follow the last-used-id.
Enforcing a hash or non-numeric id could be problematic.
Concurrency might be of concern. If it is possible for two instances get an id at the same time then this should be prevented.
How should this problem be tackled ? What solutions/approaches might already exist ?

If you care about performance, here is a thread safe, fast (lock-free) and collision-free version of unique id generation
public class Test {
private static AtomicInteger lastId = new AtomicInteger();
private int id;
public Test() {
id = lastId.incrementAndGet();
}
...

Simply use the UUID class in Java http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html. Create a field of the type UUID in the classes under inspection and initialize this field in the constructor.
public class Test {
public UUID id;
public Test() {
id = UUID.randomUUID();
}
}
When it comes time for detecting collisions, simply compare the string representations of the UUIDs of the objects like this ...
Test testObject1 = new Test();
Test testObject2 = new Test();
boolean collision = testObject1.id.toString().equals(testObject2.id.toString());
Or more simply use the compareTo() method in the UUID class ...
boolean collision = testObject2.id.compareTo(testObject1.id) == 0 ? true : false;
0 means that the ids are the same. +1 and -1 when they are not equal.
Merit: universally unique (can be time based, random) and hence should takes care of threading issues (some one should confirm this ... this is based off the best of my knowledge). more information here and here.
To make it thread-safe refer to this question on SO is java.util.UUID thread safe?
Demerit: will require a change in the structure of the classes under inspection, i.e. the id field will have to added in the source of the classes themselves. which might or might not be convenient.

UUID is a good solution, but UUID.randomUUID() on the backend use method:
synchronized public void SecureRandom.nextBytes(byte[] bytes)
So this is slow: threads lock a single monitor object in each id generation operation.
The AtomicInteger is better, because it loops in a CAS operation. But again, for each id generation operation a synchronization operation must be done.
In the solution below, only prime numbers generation is synchronized. Synchronization is on a volatile int, so is fast and thread-safe. Having a set of primes, many ids are generated in a iteration.
Fixed number of threads
Edit: Solution for fixed number of thread
I you know apriory how many threads will use the Id generation, then You can generate IDs with values
Id = I mod X + n*X
Where X is the number of threads, I is the thread number, and N is a local variable that is incremented for each Id generation. The code for this solution is really simple, but it must be integrated with the hole program infrastructure.
Ids generated from primes
The idea is to generate the ids as factors of prime numbers
id = p_1^f1 * p_2^f2 * p_2^f3 * ... * p_n^fn
We use different prime numbers in each thread to generate different sets of ids in each thread.
Assuming that we use primes (2,3,5), the sequence will be:
2, 2^2, 2^3, 2^4, 2^5,..., 2^64
Then, when we see that a overflow will be generated, we roll the factor to the next prime:
3, 2*3 , 2^2*3, 2^3*3, 2^4*3, 2^5*3,..., 2^62*3
and next
3^2, 2*3^2 , 2^2*3^2, .....
Generation class
Edit: primer order generation must be done on AtomicInteger to be correct
Each instance of class IdFactorialGenerator will generate different sets of ids.
To have a thread save generation of Ids, just use ThreadLocal to have a per-thread instance setup. Synchronization is realized only during prime number generation.
package eu.pmsoft.sam.idgenerator;
public class IdFactorialGenerator {
private static AtomicInteger nextPrimeNumber = 0;
private int usedSlots;
private int[] primes = new int[64];
private int[] factors = new int[64];
private long id;
public IdFactorialGenerator(){
usedSlots = 1;
primes[0] = Sieve$.MODULE$.primeNumber(nextPrimeNumber.getAndAdd(1));
factors[0] = 1;
id = 1;
}
public long nextId(){
for (int factorToUpdate = 0; factorToUpdate < 64; factorToUpdate++) {
if(factorToUpdate == usedSlots) {
factors[factorToUpdate] = 1;
primes[factorToUpdate] = Sieve$.MODULE$.primeNumber(nextPrimeNumber.getAndAdd(1));
usedSlots++;
}
int primeToExtend = primes[factorToUpdate];
if( primeToExtend < Long.MAX_VALUE / id) {
// id * primeToExtend < Long.MAX_VALUE
factors[factorToUpdate] = factors[factorToUpdate]*primeToExtend;
id = id*primeToExtend;
return id;
} else {
factors[factorToUpdate] = 1;
id = 1;
for (int i = 0; i < usedSlots; i++) {
id = id*factors[i];
}
}
}
throw new IllegalStateException("I can not generate more ids");
}
}
To get the prime numbers I use a implementations on scala provided here in the problem 7: http://pavelfatin.com/scala-for-project-euler/
object Sieve {
def primeNumber(position: Int): Int = ps(position)
private lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i =>
ps.takeWhile(j => j * j <= i).forall(i % _ > 0))
}

Related

DynamoDB Sequence with Spring Data

We are using SpringData to implement a (distributed) sequence generator with DynamoDB with the help of DynamoDB's conditional updates feature - specifically using optimistic locking provided by Amazon DynamoDB SDK's #DynamoDBVersionAttribute.
The annotated POJO for the counter item:
#Data
#DynamoDBTable(tableName = "counter")
public class Counter {
#DynamoDBHashKey
private String key = "counter";
#DynamoDBVersionAttribute
private Long value;
}
The SpringData Repository (we are using Boost Chicken's SpringData community lib for DynamoDB)
#Repository
interface CounterRepository extends CrudRepository<Counter, String> {
}
and the implementation itself:
#Slf4j
#Component
#RequiredArgsConstructor
public class SequenceGenerator {
private static final int MAX_VALUE = 1_000_000;
private final CounterRepository repository;
public int next() {
try {
var counter = repository.findById("counter").orElse(new Counter());
var updated = repository.save(counter);
var value = (int) (updated.getValue() % MAX_VALUE);
log.debug("Generated new sequence number {}", value);
return value;
} catch (ConditionalCheckFailedException ex) {
log.debug("Detected an optimistic lock while trying to generate a new sequence number. Will try to generate a new one.");
return next();
}
}
}
Our solution seems to work just fine, but I'm a little bit worried about performance.
Testing shows that 5 concurrent threads (i.e., 5 concurrent MicroService instances in the end) looping to generate sequence numbers take at around 4 seconds to generate 25 numbers, because they constantly run into optimistic locking conditions.
Is there a better way to achieve our goal? I've looked into AtomicCounters, but the doc specifically states
An atomic counter would not be appropriate where overcounting or undercounting can't be tolerated (for example, in a banking application). In this case, it is safer to use a conditional update instead of an atomic counter.
which rules them out for our case, where numbers must be unique.

creating unique request id for each request using timemillis method in Servlet [duplicate]

This question already has answers here:
How do I create a unique ID in Java? [duplicate]
(11 answers)
Closed 7 years ago.
I am working on a servlet where i need to provide a unique request to each request and store each request params in an audit table.
I am worried of dirty read operations on database tables if i try to increment a value by looking back the table for the previous id.
I want to know if if using time in milli seconds when the request is arrived in servlet to solve this.
I am afraid if there can be any other request coming from other geography location at same time for the same servlet so that the java.lang.System.currentTimeMillis() will be same for the other request coming.
The reason which made me post this doubt is i believe the multithreading behavior of servlet is taking a request at a time and then spanning cpu cycles for each request based on some algorithm.
The System.currentTimeMillis() is not guaranteed to be unique when called by multiple threads. When faced with this situation in the past I've used an AtomicLong to create unique ids - this class's getAndIncremenet should be lock-free (and hence reasonably efficient) on most JVMs
public class IdUtil {
private static final AtomicLong counter = new AtomicLong(System.currentTimeMillis());
// return a single id
public static long getId() {
return counter.getAndIncrement();
}
// return a block of ids
public static Queue<Long> getIds(long span) {
long max = counter.addAndGet(span);
Queue<Long> queue = new ArrayDeque<>(span);
for(int i = max - span; i < max; i++) {
queue.add(i);
}
}
}
Even synchronized it is wrong. You may get the same ID for two requests very close in time. You should better use a random long or sequential number.
private static final Random r = new Random(); // <- shared resource
// ...
long id = r.nextLong();
or
private static final AtomicLong counter = new AtomicLong(System.currentTimeMillis()); // <- shared resource
// ...
long id = counter.getAndIncrement();
counter is initialized with milliseconds so it does not provide the same id sequence after program restart.

resolve intersecting values in arraylist

public class IDS{
public String id;
public long startTime;
public long endTime;
}
List<IDS> existingIDS = new ArrayList<IDS>();
List<IDS> newIDSToWrite = new ArrayList<IDS>();
I want to merge the newIDSToWrite values with existingIDS values, with newIDSToWrite values taking precedence if a conflict occurs.
existingIDS has values like this (id1,4,7) (id2,10,14) (id3,16,21)
newIDSToWrite has values like this (id4,1,5) (id5,8,9) (id6,12,15) (id7,18,20)
If the newIDSToWrite above is merged with existingIDS result should be like (id4,1,5) (id1,5,7) (id5,8,9) (id2,10,12) (id6,12,15) (id3,16,18) (id7,18,20) (id3,20,21)
Whats the best way of doing this?
You can use the method List.retainAll():
existingIDS.retainAll(newIDSToWrite);
Link to the doc.
UPDATE:
Good comment by dasblinkenlight: in class ID you should override hash() and equals() methods in order to achieve a correct behavior (two IDs that are created with the same values should be equal even if they don't point to the same object in the heap).
You can also use apache commons ListUtils
ListUtils.union(existingIDS ,newIDSToWrite );
You can find the documentation here.
For the second part of the question you can use the same logic as the question you has asked earlier but a little modification
delete intersecting values in an arraylist
(1, 3) (2, 4) (5, 6)
curmax = -inf
curmax = 3
2 < 3 - mark first and second as "bad". curmax = 4
Update value of 2 to (3+1)
5 > 4 - do nothing. curmax = 6.
(5,6) - is the only good segment.
I recommend something like this (note that it may not compile directly since I wrote it quickly). This code basically makes your class more object oriented and requires the object be initialized with a valid id. As other mentioned above, you must implement hashCode() and equals() to be able to properly compare objects and check if an object is contained within a collection (I usually have eclipse generate these functions for me upon selecting fields):
public class IDS {
private String id;
private long startTime;
private long endTime;
public IDS(String id){
if(id == null) throw new Exception("RAII failure: IDS requires non-null ID");
this.id = id;
}
// id - getter
// startTime, endTime - getters/setters
public boolean equals(IDS otherId){
if(otherId == null) return false;
return otherId.getId().equals(getId());
}
}
List<IDS> existingIDS = new ArrayList<IDS>();
List<IDS> newIDSToWrite = new ArrayList<IDS>();
Set<IDS> mergedIds = new HashSet<IDS>(newIDSToWrite);
for(IDS id : existingIDS){
if(!mergedIds.contains(id)) mergedIds.add(id);
}

ConcurrentSkipListSet and replace remove(key)

I am using ConcurrentSkipListSet, which I fill with 20 keys.
I want to replace these keys continuously. However, ConcurrentSkipListSet doesn't seem to have an atomic replace function.
This is what I am using now:
ConcurrentSkipListSet<Long> set = new ConcurrentSkipListSet<Long>();
AtomicLong uniquefier = new AtomicLong(1);
public void fillSet() {
// fills set with 20 unique keys;
}
public void updateSet() {
Long now = Calendar.getInstance().getTimeInMillis();
Long oldestKey = set.first();
if (set.remove(oldestKey)) {
set.add(makeUnique(now));
}
}
private static final long MULTIPLIER = 1024;
public Long makeUnique(long in) {
return (in*MULTIPLIER+uniquefier.getAndSet((uniquefier.incrementAndGet())%(MULTIPLIER/2)));
}
The goal of this whole operation is to keep the list as long as it is, and only update by replacing. updateSet is called some 100 times per ms.
Now, my question is this: does remove return true if the element itself was present before (and isn't after), or does the method return true only if the call was actually responsible for the removal?
I.e.: if multiple threads call remove on the very same key at the very same time, will they /all/ return true, or will only one return true?
set.remove will only return true for the thread that actually caused the object to be removed.
The idea behind the set's concurrency is that multiple threads can be updating multiple objects. However, each individual object can only be updated by one thread at a time.

Optimized implementations of java.util.Map and java.util.Set?

I am writing an application where memory, and to a lesser extent speed, are vital. I have found from profiling that I spend a great deal of time in Map and Set operations. While I look at ways to call these methods less, I am wondering whether anyone out there has written, or come across, implementations that significantly improve on access time or memory overhead? or at least, that can improve these things given some assumptions?
From looking at the JDK source I can't believe that it can't be made faster or leaner.
I am aware of Commons Collections, but I don't believe it has any implementation whose goal is to be faster or leaner. Same for Google Collections.
Update: Should have noted that I do not need thread safety.
Normally these methods are pretty quick.
There are a couple of things you should check: are your hash codes implemented? Are they sufficiently uniform? Otherwise you'll get rubbish performance.
http://trove4j.sourceforge.net/ <-- this is a bit quicker and saves some memory. I saved a few ms on 50,000 updates
Are you sure that you're using maps/sets correctly? i.e. not trying to iterate over all the values or something similar. Also, e.g. don't do a contains and then a remove. Just check the remove.
Also check if you're using Double vs double. I noticed a few ms performance improvements on ten's of thousands of checks.
Have you also set up the initial capacity correctly/appropriately?
Have you looked at Trove4J ? From the website:
Trove aims to provide fast, lightweight implementations of the java.util.Collections API.
Benchmarks provided here.
Here are the ones I know, in addition to Google and Commons Collections:
http://trove4j.sourceforge.net/
http://javolution.org/
http://fastutil.dsi.unimi.it/
Of course you can always implement your own data structures which are optimized for your use cases. To be able to help better, we would need to know you access patterns and what kind of data you store in the collections.
Try improving the performance of your equals and hashCode methods, this could help speed up the standard containers use of your objects.
You can extend AbstractMap and/or AbstractSet as a starting point. I did this not too long ago to implement a binary trie based map (the key was an integer, and each "level" on the tree was a bit position. left child was 0 and right child was 1). This worked out well for us because the key was EUI-64 identifiers, and for us most of the time the top 5 bytes were going to be the same.
To implement an AbstractMap, you need to at the very least implement the entrySet() method, to return a set of Map.Entry, each of which is a key/value pair.
To implement a set, you extend AbstractSet and supply implementations of size() and iterator().
That's at the very least, however. You will want to also implement get and put, since the default map is unmodifiable, and the default implementation of get iterates through the entrySet looking for a match.
You can possibly save a little on memory by:
(a) using a stronger, wider hash code, and thus avoiding having to store the keys;
(b) by allocating yourself from an array, avoiding creating a separate object per hash table entry.
In case it's useful, here's a no-frills Java implementation of the Numerical Recipies hash table that I've sometimes found useful. You can key directly on a CharSequence (including Strings), or else you must yourself come up with a strong-ish 64-bit hash function for your objects.
Remember, this implementation doesn't store the keys, so if two items have the same hash code (which you'd expect after hashing in the order of 2^32 or a couple of billion items if you have a good hash function), then one item will overwrite the other:
public class CompactMap<E> implements Serializable {
static final long serialVersionUID = 1L;
private static final int MAX_HASH_TABLE_SIZE = 1 << 24;
private static final int MAX_HASH_TABLE_SIZE_WITH_FILL_FACTOR = 1 << 20;
private static final long[] byteTable;
private static final long HSTART = 0xBB40E64DA205B064L;
private static final long HMULT = 7664345821815920749L;
static {
byteTable = new long[256];
long h = 0x544B2FBACAAF1684L;
for (int i = 0; i < 256; i++) {
for (int j = 0; j < 31; j++) {
h = (h >>> 7) ^ h;
h = (h << 11) ^ h;
h = (h >>> 10) ^ h;
}
byteTable[i] = h;
}
}
private int maxValues;
private int[] table;
private int[] nextPtrs;
private long[] hashValues;
private E[] elements;
private int nextHashValuePos;
private int hashMask;
private int size;
#SuppressWarnings("unchecked")
public CompactMap(int maxElements) {
int sz = 128;
int desiredTableSize = maxElements;
if (desiredTableSize < MAX_HASH_TABLE_SIZE_WITH_FILL_FACTOR) {
desiredTableSize = desiredTableSize * 4 / 3;
}
desiredTableSize = Math.min(desiredTableSize, MAX_HASH_TABLE_SIZE);
while (sz < desiredTableSize) {
sz <<= 1;
}
this.maxValues = maxElements;
this.table = new int[sz];
this.nextPtrs = new int[maxValues];
this.hashValues = new long[maxValues];
this.elements = (E[]) new Object[sz];
Arrays.fill(table, -1);
this.hashMask = sz-1;
}
public int size() {
return size;
}
public E put(CharSequence key, E val) {
return put(hash(key), val);
}
public E put(long hash, E val) {
int hc = (int) hash & hashMask;
int[] table = this.table;
int k = table[hc];
if (k != -1) {
int lastk;
do {
if (hashValues[k] == hash) {
E old = elements[k];
elements[k] = val;
return old;
}
lastk = k;
k = nextPtrs[k];
} while (k != -1);
k = nextHashValuePos++;
nextPtrs[lastk] = k;
} else {
k = nextHashValuePos++;
table[hc] = k;
}
if (k >= maxValues) {
throw new IllegalStateException("Hash table full (size " + size + ", k " + k);
}
hashValues[k] = hash;
nextPtrs[k] = -1;
elements[k] = val;
size++;
return null;
}
public E get(long hash) {
int hc = (int) hash & hashMask;
int[] table = this.table;
int k = table[hc];
if (k != -1) {
do {
if (hashValues[k] == hash) {
return elements[k];
}
k = nextPtrs[k];
} while (k != -1);
}
return null;
}
public E get(CharSequence hash) {
return get(hash(hash));
}
public static long hash(CharSequence cs) {
if (cs == null) return 1L;
long h = HSTART;
final long hmult = HMULT;
final long[] ht = byteTable;
for (int i = cs.length()-1; i >= 0; i--) {
char ch = cs.charAt(i);
h = (h * hmult) ^ ht[ch & 0xff];
h = (h * hmult) ^ ht[(ch >>> 8) & 0xff];
}
return h;
}
}
Check out GNU Trove:
http://trove4j.sourceforge.net/index.html
There is at least one implementation in commons-collections that is specifically built for speed: Flat3Map it's pretty specific in that it'll be really quick as long as there are no more than 3 elements.
I suspect that you may get more milage through following #thaggie's advice add look at the equals/hashcode method times.
You said you profiled some classes but have you done any timings to check their speed? I'm not sure how you'd check their memory usage. It seems like it would be nice to have some specific figures at hand when you're comparing different implementations.
There are some notes here and links to several alternative data-structure libraries: http://www.leepoint.net/notes-java/data/collections/ds-alternatives.html
I'll also throw in a strong vote for fastutil. (mentioned in another response, and on that page) It has more different data structures than you can shake a stick at, and versions optimized for primitive types as keys or values. (A drawback is that the jar file is huge, but you can presumably trim it to just what you need)
I went through something like this a couple of years ago -- very large Maps and Sets as well as very many of them. The default Java implementations consumed way too much space. In the end I rolled my own, but only after I examined the actual usage patterns that my code required. For example, I had a known large set of objects that were created early on and some Maps were sparse while others were dense. Other structures grew monotonically (no deletes) while in other places it was faster to use a "collection" and do the occasional but harmless extra work of processing duplicate items than it was to spend the time and space on avoiding duplicates. Many of the implementations I used were array-backed and exploited the fact that my hashcodes were sequentially allocated and thus for dense maps a lookup was just an array access.
Take away messages:
look at your algorithm,
consider multiple implementations, and
remember that most of the libraries out there are catering for general purpose use (eg insert and delete, a range of sizes, neither sparse nor dense, etc) so they're going to have overheads that you can probably avoid.
Oh, and write unit tests...
At times when I have see Map and Set operations are using a high percentage of CPU, it has indicated that I have over used Map and Set and restructuring my data has almost eliminated collections from the top 10% CPU consumer.
See if you can avoid copies of collections, iterating over collections and any other operation which results in accessing most of the elements of the collection and creating objects.
It's probably not so much the Map or Set which causing the problem, but the objects behind them. Depending upon your problem, you might want a more database-type scheme where "objects" are stored as a bunch of bytes rather than Java Objects. You could embed a database (such as Apache Derby) or do your own specialist thing. It's very dependent upon what you are actually doing. HashMap isn't deliberately big and slow...
Commons Collections has FastArrayList, FastHashMap and FastTreeMap but I don't know what they're worth...
Commons Collections has an id map which compares through ==, which should be faster.
-[Joda Primities][1] as has primitive collections, as does Trove. I experimented with Trove and found that its memory useage is better.
I was mapping collections of many small objects with a few Integers. altering these to ints saved nearly half the memory (although requiring some messier application code to compensate).
It seems reasonable to me that sorted trees should consume less memory than hashmaps because they don't require the load factor (although if anyone can confirm or has a reason why this is actually dumb please post in the comments).
Which version of the JVM are you using?
If you are not on 6 (although I suspect you are) then a switch to 6 may help.
If this is a server application and is running on windows try using -server to use the correct hotspot implementation.
I use the following package (koloboke) to do a int-int hashmap, because it supports promitive type and it stores two int in a long variable, this is cool for me. koloboke

Categories

Resources