There is an old database which needs to be ported to a new one but the porting process will take significant time during which the old db will remain operational.
In the old DB the IDs are sequential numbers in the new one we need globally unique identifiers which in case of hibernate we are (or were) going to generate using the hibernate's built in UUID generation. I have no idea about how it(hibernate's UUID) works and don't know if this is the same as the java's native UUID generation. Actually hibernate provides 2 uuid generation strategies (uuid and uuid2).
The problem is that if we use hibernates generation it's possible that there will be a collision between the hibernate generated IDs and the IDs coming from the old DB while it's still working. We are going to write a number of oracle procedures to help us "smartly" transfer the IDs from the old database to the new one but what ID generation technique we should use in those procedures so we don't have ID collisions with the IDs generated by hibernate?
We can use oracle's built in GUID generator or write a procedure in java and use java's native UUID generator. But nowhere it's written that those 2 won't produce collisions with hibernate generated IDs.
Related
I have a client application that uses integers to store Datastore ids. This was working fine as the Datastore ids used to be sequencially generated but now needs to be changed to auto generated to scale. The two options for autogenerating are default and legacy. The default option uses long datatype and hence will break the client. The legacy says it generates small integers but is there a guarantee that it won't cross to the long datatype.
Cloud Datastore keys uses ids (long ints, 64-bit) or names (string). The auto-incrementing legacy id allocation starts off with small numbers, but given enough entities it will eventually exceed what can be stored in just a 32-bit int.
I moving some data from Mysql to the Datastore and for this data migaration I want to keep the old Ids from Mysql.
I found this note here
Instead of using key name strings or generating numeric IDs automatically, advanced applications may sometimes wish to assign their own numeric IDs manually to the entities they create. Be aware, however, that there is nothing to prevent Datastore from assigning one of your manual numeric IDs to another entity. The only way to avoid such conflicts is to have your application obtain a block of IDs with the allocateIds() method. Cloud Datastore's automatic ID generator will keep track of IDs that have been allocated with these methods and will avoid reusing them for another entity, so you can safely use such IDs without conflict.
So allocateIds seems perfect for what I am trying to do. I want to use the method to allocate all the auto incremented ids from Mysql so that I can then use the datastore Id generator without worrying about collision.
However I can't find this method anywhere. I am using the cloud datastore java library as a standalone library, without using the app engine.
The Cloud Datastore API does not expose a method for reserving a user-specified ID. The AllocateIds method picks IDs for you.
One possible approach would be to assign the MySQL-generated IDs to the name (string) field in your keys. Cloud Datastore never auto-assigns the name field. The downside is that your application code would be responsible for generating future values.
I have a question regarding UUID generation.
Typically, when I'm generating a UUID I will use a random or time based generation method.
HOWEVER, I'm migrating legacy data from MySQL over to a C* datastore and I need to change the legacy (auto-incrementing) integer IDs to UUIDS. Instead of creating another denormalized table with the legacy integer IDs as the primary key and all the data duplicated, I was wondering what folks thought about padding 0's onto the front of the integer ID to form a UUID. Example below.
*Something important to note is that the legacy IDs highest values will never top 1 million, so overflow isn't really an issue.
The idea would look like this:
Legacy ID: 123456 ---> UUID: 00000000-0000-0000-0000-000000123456
This would be done using some string concats and the UUID.fromString("00000000-0000-0000-0000-000000123456" method.
Does this seem like a bad pattern to anyone? I'm not a huge fan of the idea, gives me a bad taste in my mouth, but I don't have a technical reason for why haha.
As far as collisions go, the probability of a collision occurring is still ridiculously low. So I'm not worried about increasing collisions. I suppose it just seems like bad practice to me, that its "too easy".
We faced the same kind of issue before when migrating from Oracle with ids generated by sequence to Cassandra with generated UUIDs.
We had to design a type to both support old data coming from Oracle with type long and new data with uuid.
The obvious solution is to use type blob to store the id. A blob can encode a long or an uuid.
This solution only works for partition key because you query them using =. It won't work for clustering column using operators like > or < because we need an ordering on their value.
There was a small objection at that time, which was using a blob to store the id makes it opaque to user, for example in cqlsh when you're doing a SELECT and you need to provide the id, how would you make a blob ?
Fortunately, the native functions of CQL bigIntAsBlob(), blobAsBigInt(), uuidAsBlob() and blobAsUUID() come in very handy.
I've decided to go in a different direction from doanduyhai's answer.
In order to maintain data consistency, we decided to fully de-normalize the data and create another table in C* that is keyed on our legacy IDs. When migrating the objects from our legacy into C*, they are assigned a new randomly generated UUID, which will be their new primary ID for the future. The legacy IDs will be kept around until such a time that we decide they are no longer needed. Upon that time, we can cleanly drop the legacy ID table and be done with them.
This solution allowed for a cleaner break from our legacy ID system in the future, and allowed us to prevent the use of strange custom made UUIDs. I also wasn't a huge fan of having the ID field as a blob type that could have multiple types of data stored in it since, in the future, we plan on only wanting UUIDs to be there.
I am trying to generate alpha-numberic (e.g. TESLA1001) primary keys automatically in Hibernate. I am currently using Oracle database, so I have a JDBC call to my_sequence.NEXTVAL (1002) to increment number and append to the prefix (TESLA).
We are considering MySQL as an option, but they do not support sequences. So I am forced to re-write the Custom ID generation technique using JDBC call to a stored procedure.
Is there any way I can have a generic implementation to generate custom primary keys without the use of JDBC and database dependent queries? So, in future, if I need to test my application with MSSQL, I need to change my hiberate configuration only and things work fine!
Because you need a way to coordinate the sequence number, you'll always have to use a centralized sequence generator. An alpha-numerical primary key will perform worse on indexing than a UUID generator.
If I were you, I'd switch to UUID identifers which are both unique and portable across all major RDBMS.
Lets consider that I have a simple bug tracking system hosted on 10 servers and there is a centralized database. I have to ensure that bug ID is unique at all times and of course ID should be meaningful.
Couple of alternatives that I can think of are
Use an ID generation strategy provided by ORM frameworks such JPA (may be a sequence generator strategy or more precisely a Table generation strategy can be adopted). However I see these strategies as an increased overload on database as we are asking it to generate an ID before we insert it into the table.
I can use UUID which provides an Unique ID but those are not meaningful. Those are just 128 bit numbers.
Can we come up with a strategy that lets us to create ID at the application level and pass it on to Database just to insert it. Point worth noticing is its a distributed environment where there are can be concurrent session yet the ID generated by the applications running in various servers should be Unique.
Apologies for writing an essay before I put in my question but just looking for best way to handle the requirement and clear off any myths that I have in the approaches I mentioned above :)