How can I get a unique long from Gemfire? - java

We are working on a layered Java application that talks directly to Gemfire.
We need to be able to generate unique "long" sequence numbers, guaranteed unique across all nodes of the application. (Not all nodes are clustered)
Normally I would create a sequence in Oracle, but in this case, even though our Gemfire configuration has a connection to the relational database for write behind persistence, our application has no other knowledge of the database.
What would be the best way to generate those guaranteed unique long values, without going to the database?

The first question to ask yourself is do you really need a long sequence number (monotonically increasing long integer) or do you just need a globally unique identifier (like a UUID).
The most performant solution is going to be a globally unique id and I would just suggest using a GUID.
If you need a globally unique monotonically increasing long value (long sequence) then you will have to use some distributed locking and increment a value in the region. The method for this and performance depends on the type of region you are using.
Look at Region.replace(K, V, V). It can perform globally atomic updates to values under specific region definitions. You may need to consider a region that just has your sequences if your current region type is not sufficiently defined.

Related

sequence is not in a row on oracle tables

I have a project that its structure is java ee 7. I use hibernate as ORM and my database is Oracle.
I use #SequenceGenerator with allocationSize = 1 for id of my entity and #GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seq"). My database sequence in Oracle has cache=1000.
But when I persist two records in database, first record's id is older than second record even if after a day and the id is not in a row and continuously.
How can I resolve this problem and what is my problem?
As you are using 11g ( a very old version, so your company should think in upgrade asap ) the option for RAC has to pass for a balance between performance and integrity gap.
You have two options noorder vs order
create sequence xxx start with 1 increment by 1 noorder|order cache xxx
How do the instances co-ordinate their use of sequence values and avoid the risk of two instances using the same value?
There are two solutions: the default noorder mechanism where each instance behaves as if it doesn’t know about the other instances. The other is the order option, where the instances continuously negotiate through the use of global enqueues to determine which instance should be responsible for the sequence at any moment.
Noorder
The upshot of this noorder mechanism is that each instance will be working its way through a different range of numbers, and there will be no overlaps between instances. If you had sessions that logged on to the database once per second to issue a call to nextval (and they ended up connecting through a different instance each time), then the values returned would appear to be fairly randomly scattered over a range dictated by “number of instances x cache size.” Uniqueness would be guaranteed, but ordering would not.
Order
If you declare a sequence with the order option, Oracle adopts a strategy of using a single “cache” for the values and introduces a mechanism for making sure that only one instance at a time can access and modify that cache. Oracle does this by taking advantage of its Global Enqueue services. Whenever a session issues a call to nextval, the instance acquires an exclusive SV lock (global enqueue) on the sequence cache, effectively saying, “who’s got the most up to date information about this sequence – I want control”. The one instance holding the SV lock in exclusive mode is then the only instance that can increment the cached value and, if necessary, update the seq$ table by incrementing the highwater. This means that the sequence numbers will, once again, be generated in order. But this option has a penalty in performance and should be considered carefully.
Summary
If your transactions are fast, you can use order and test how it behaves. If your transactions are not fast, I would avoid order all together. The best option is to upgrade to 19c ( 12c is already near of obsolescence ) and use IDENTTTY COLUMNS
If you have unordered (separate) caches on each node (the default):
node 1: cache values (1 - 1000)
node 2: cache values (1001 - 2000)
then the caches cannot overlap values and value used will depend on which node performs the insert. That is why your sequence values currently appear to be out of order.
Using the NOCACHE and/or ORDERED options will result sequential numbers, but you can expect at least some performance impact to your application, as the database must perform more overhead to determine the current sequence value before making it available to your SQL command. Reducing the cache size or eliminating the cache entirely can have a severe negative impact on performance if you are executing a lot of inserts (as suggested by your current cache value of 1000).
Assuming for now that you continue to use a cache (whether ordered or not), be aware that every time you restart your database, or a node (depending on your exact configuration), the unused cached values will be flushed and lost and a new cache will be created.
In the end, it is important to realize that sequence values are not intended (for most applications) to be perfectly sequential without gaps, or even (as in your case) ordered. They are only intended to be unique. Be sure to understand your requirement, and don't be put off if sequences don't behave quite like you expected: must the values be sequential in the order inserted, and will gaps in the sequence affect your application? If the answer is no, the application won't care, then stick with what you've got for the sake of performance.

Is there a difference between finding by primary key vs finding by unique column?

If i have an entity with a primary key id and an unique column name. Is there any difference whether i do a SQL request findById(long id) or findByName(String name). Can a search for the primary key be done in O(1) while the other one works in O(n)? Which data structures are used for saving them?
The difference is speed :
Running SQL query against Integer will always be faster than running against a string.
From the perspective of order complexity of the operation, then the two are equivalent.
As others have pointed out, an integer lookup is generally faster than a string lookup. Here are three reasons:
The index would typically be smaller because, integers are 4 bytes and strings are typically bigger.
Indexes on fixed length keys have some additional efficiencies in the tree structure (no need to "find the end of the string").
In many databases, strings incur additional overhead to handle collations.
That said, another factor is that the primary key is often clustered in many databases. This eliminates the final lookup of the row in data pages -- which might be a noticeable efficiency as well. Note that not all databases support clustered indexes, so this is not true in all cases.
If both columns were INTEGER, then the answer would be "no". A PRIMARY KEY is effectively a UNIQUE constraint on a column and little more. Additionally, as usually they both cause internal indexes to be created, again they behave basically the same way.
In your specific case, however, the NAME column is a string. Even though it has a UNIQUE constraint, by virtue of its data type you will incur in some performance loss.
As your question is probably dictated by "ease of use" to some extent (for your debugging purposes it's certainly easy to remember the "name" than it is to remember the "id") the questions you need to ask yourself are:
Will the NAME column always be unique or could it be change to something not unique? Should it actually be unique in the first place (maybe they set it up wrong)?
How many rows do you expect in your table? This is important to know as while a small table won't really show any performance issue, a high cardinality may start to show some.
How many transactions/second do you expect? If it's an internal application or a small amateurial project, you can live with the NAME column being queried whereas if you need extreme scalability you should stay away from it.

Service/database/technology to emulate long sequence object(not UUID)

I'm looking for 3rdparty service in order to create/emulate something similar to a Postgres sequence database object.
I need this thread safe functionality in order to be able to ask it for next, unique Long value. I'm going to use this value as a surrogate key for my Spring Boot/Neoj4 application entities.
The main criterion is a speed. It should be pretty fast and durable(not only in memory but also persisted to HDD in order to survive after crashes and restarts)
Also, I don't want to go with UUID because I have to expose these IDs within my web application url parameters and in case of UUID my urls look awful. I want to go with a plain Long values for IDs.
Could you please suggest some database/service/technology that can be installed on my server and called for the unique IDs ?
UPDATED
Is it possible to implement fault-tolerant(persisted) AtomicLong sequence with Apache ZooKeeper or Hazelcast ? If so, is there any open source implementation of this solution that can be downloaded and used?
Something like Snowflake (https://github.com/twitter/snowflake/releases/tag/snowflake-2010) or snowcast (https://github.com/noctarius/snowcast) might be of interest to you.
If you want just numbers, why did you don't want to concatenate 2 longs from UUID (most + least)
According to your comment's question:
Yes, concatenated (not summed!) value from this methods with numbers filled leading 0 to full long length will gurantee all the same as UUID by itself (because it is kind of UUIDs view).

GUID from two different JVM

I am writing a software where my code runs on two different machines. Will my GUID be still unique across the whole cluster if I have generation logic on multiple JVM.what are chances of collision in my specific use case ?
If your "GUID" is a UUID.randomUUID() then the probabilities are quite low. Otherwise, it depends on how you are generating your GUID, but the general principle
behind them is that you have enough random bits so that a collision will be unlikely.
Just in case all instances of your distributed system use a common database, then you could create a sequence on that database and use values from that sequence to avoid duplicate IDs.
GLOBALLY unique ID, in this case global means universal.
Every GUID generated is guaranteed to be unique in the universe. That's the whole purpose of GUIDs.

Distributed sequence number generation?

I've generally implemented sequence number generation using database sequences in the past.
e.g. Using Postgres SERIAL type http://www.neilconway.org/docs/sequences/
I'm curious though as how to generate sequence numbers for large distributed systems where there is no database. Does anybody have any experience or suggestions of a best practice for achieving sequence number generation in a thread safe manner for multiple clients?
OK, this is a very old question, which I'm first seeing now.
You'll need to differentiate between sequence numbers and unique IDs that are (optionally) loosely sortable by a specific criteria (typically generation time). True sequence numbers imply knowledge of what all other workers have done, and as such require shared state. There is no easy way of doing this in a distributed, high-scale manner. You could look into things like network broadcasts, windowed ranges for each worker, and distributed hash tables for unique worker IDs, but it's a lot of work.
Unique IDs are another matter, there are several good ways of generating unique IDs in a decentralized manner:
a) You could use Twitter's Snowflake ID network service. Snowflake is a:
Networked service, i.e. you make a network call to get a unique ID;
which produces 64 bit unique IDs that are ordered by generation time;
and the service is highly scalable and (potentially) highly available; each instance can generate many thousand IDs per second, and you can run multiple instances on your LAN/WAN;
written in Scala, runs on the JVM.
b) You could generate the unique IDs on the clients themselves, using an approach derived from how UUIDs and Snowflake's IDs are made. There are multiple options, but something along the lines of:
The most significant 40 or so bits: A timestamp; the generation time of the ID. (We're using the most significant bits for the timestamp to make IDs sort-able by generation time.)
The next 14 or so bits: A per-generator counter, which each generator increments by one for each new ID generated. This ensures that IDs generated at the same moment (same timestamps) do not overlap.
The last 10 or so bits: A unique value for each generator. Using this, we don't need to do any synchronization between generators (which is extremely hard), as all generators produce non-overlapping IDs because of this value.
c) You could generate the IDs on the clients, using just a timestamp and random value. This avoids the need to know all generators, and assign each generator a unique value. On the flip side, such IDs are not guaranteed to be globally unique, they're only very highly likely to be unique. (To collide, one or more generators would have to create the same random value at the exact same time.) Something along the lines of:
The most significant 32 bits: Timestamp, the generation time of the ID.
The least significant 32 bits: 32-bits of randomness, generated anew for each ID.
d) The easy way out, use UUIDs / GUIDs.
You could have each node have a unique ID (which you may have anyway) and then prepend that to the sequence number.
For example, node 1 generates sequence 001-00001 001-00002 001-00003 etc. and node 5 generates 005-00001 005-00002
Unique :-)
Alternately if you want some sort of a centralized system, you could consider having your sequence server give out in blocks. This reduces the overhead significantly. For example, instead of requesting a new ID from the central server for each ID that must be assigned, you request IDs in blocks of 10,000 from the central server and then only have to do another network request when you run out.
Now there are more options.
Though this question is "old", I got here, so I think it might be useful to leave the options I know of (so far):
You could try Hazelcast. In it's 1.9 release it includes a Distributed implementation of java.util.concurrent.AtomicLong
You can also use Zookeeper. It provides methods for creating sequence nodes (appended to znode names, though I prefer using version numbers of the nodes). Be careful with this one though: if you don't want missed numbers in your sequence, it may not be what you want.
Cheers
It can be done with Redisson. It implements distributed and scalable version of AtomicLong. Here is example:
Config config = new Config();
config.addAddress("some.server.com:8291");
Redisson redisson = Redisson.create(config);
RAtomicLong atomicLong = redisson.getAtomicLong("anyAtomicLong");
atomicLong.incrementAndGet();
If it really has to be globally sequential, and not simply unique, then I would consider creating a single, simple service for dispensing these numbers.
Distributed systems rely on lots of little services interacting, and for this simple kind of task, do you really need or would you really benefit from some other complex, distributed solution?
There are a few strategies; but none that i know can be really distributed and give a real sequence.
have a central number generator. it doesn't have to be a big database. memcached has a fast atomic counter, in the vast majority of cases it's fast enough for your entire cluster.
separate an integer range for each node (like Steven Schlanskter's answer)
use random numbers or UUIDs
use some piece of data, together with the node's ID, and hash it all (or hmac it)
personally, i'd lean to UUIDs, or memcached if i want to have a mostly-contiguous space.
Why not use a (thread safe) UUID generator?
I should probably expand on this.
UUIDs are guaranteed to be globally unique (if you avoid the ones based on random numbers, where the uniqueness is just highly probable).
Your "distributed" requirement is met, regardless of how many UUID generators you use, by the global uniqueness of each UUID.
Your "thread safe" requirement can be met by choosing "thread safe" UUID generators.
Your "sequence number" requirement is assumed to be met by the guaranteed global uniqueness of each UUID.
Note that many database sequence number implementations (e.g. Oracle) do not guarantee either monotonically increasing, or (even) increasing sequence numbers (on a per "connection" basis). This is because a consecutive batch of sequence numbers gets allocated in "cached" blocks on a per connection basis. This guarantees global uniqueness and maintains adequate speed. But the sequence numbers actually allocated (over time) can be jumbled when there are being allocated by multiple connections!
Distributed ID generation can be archived with Redis and Lua. The implementation available in Github. It produces a distributed and k-sortable unique ids.
I know this is an old question but we were also facing the same need and was unable to find the solution that fulfills our need.
Our requirement was to get a unique sequence (0,1,2,3...n) of ids and hence snowflake did not help.
We created our own system to generate the ids using Redis. Redis is single threaded hence its list/queue mechanism would always give us 1 pop at a time.
What we do is, We create a buffer of ids, Initially, the queue will have 0 to 20 ids that are ready to be dispatched when requested. Multiple clients can request an id and redis will pop 1 id at a time, After every pop from left, we insert BUFFER + currentId to the right, Which keeps the buffer list going. Implementation here
I have written a simple service which can generate semi-unique non-sequential 64 bit long numbers. It can be deployed on multiple machines for redundancy and scalability. It use ZeroMQ for messaging. For more information on how it works look at github page: zUID
Using a database you can reach 1.000+ increments per second with a single core. It is pretty easy. You can use its own database as backend to generate that number (as it should be its own aggregate, in DDD terms).
I had what seems a similar problem. I had several partitions and I wanted to get an offset counter for each one. I implemented something like this:
CREATE DATABASE example;
USE example;
CREATE TABLE offsets (partition INTEGER, offset LONG, PRIMARY KEY (partition));
INSERT offsets VALUES (1,0);
Then executed the following statement:
SELECT #offset := offset from offsets WHERE partition=1 FOR UPDATE;
UPDATE offsets set offset=#offset+1 WHERE partition=1;
If your application allows you, you can allocate a block at once (that was my case).
SELECT #offset := offset from offsets WHERE partition=1 FOR UPDATE;
UPDATE offsets set offset=#offset+100 WHERE partition=1;
If you need further throughput an cannot allocate offsets in advance you can implement your own service using Flink for real time processing. I was able to get around 100K increments per partition.
Hope it helps!
The problem is similar to:
In iscsi world, where each luns/volumes have to be uniquely identifiable by the initiators running on the client side.
The iscsi standard says that the first few bits have to represent the Storage provider/manufacturer information, and the rest monotonically increasing.
Similarly, one can use the initial bits in the distributed system of nodes to represent the nodeID and the rest can be monotonically increasing.
One solution that is decent is to use a long time based generation.
It can be done with the backing of a distributed database.
My two cents for gcloud. Using storage file.
Implemented as cloud function, can easily be converted to a library.
https://github.com/zaky/sequential-counter

Categories

Resources