How do I reuse Oracle sequence gaps in primary key column? - java

I used Oracle sequence as primary key of a table, and used int in Java application mapping this primary key, now I found my customer has reached to the maximum int in table, even the sequence can be continuous increase. but Java int has no longer able to store it, I don't want change Java code from int to long because of very big cost. then I found customer DB there have many big gaps in ID column. can any way I can reuse these missing Id number?
If can do this in DB level, something like I can re-org this sequence to add these missing number to it, so no Java code change then I can use these gaps. it should be great.
I will write a function to find the gap ranges, after having these numbers, If I can, I want assign them to pool of the sequence value, so maybe from now on, it will not use auto-incrementing, just use the number I assigned. in Java code, I can continue use findNextNumber to call the sequence. but sequence will be able to return the value I assigned to it. it seems impossible, right? any alternative?

Do you mean, will the sequence ever return a value that is in a "gap" range? I don't think so, unless you drop/re-create it for some reason. I guess you could write a function of some sorts to find the PK gaps in your table, then save those gap ranges to another table, and "roll" your own sequence function using the gap table. Very ugly. Trying to "recover" these gaps just sounds like a desperate attempt to avoid the unavoidable - your java PK data type should have aligned with the DB data type. I had the same problem a long time ago with a VB app that had a class key defined as 16-bit integer, and the sequence exceeded 32K, had to change the variables to a Long. I say, bite the bullet, and make the conversion. A little pain now, will save you a lot of ongoing pain later. Just my opinion.

I would definitely make the change to be able to use longer numbers, but in the meantime you might manage until you can make that change by using a sequence that generates negative numbers. There'd be a performance impact on the maintenance of the PK index, and it would grow larger disproportionately quicker, though

Related

How to re-generate deleted sequence numbers in hibernate?

As we know the below hibernate annotation generates a new number each time from the sequence starting from 1. Consider a situation wherein i have a set of records with ids(1-5).Now a record is deleted from the table which had id as 3. If we see number 3 is missing from the sequence 1-5 now because of the operation. I have a requirement for the sequence to re-generate and reassign that number 3 when i will be adding new record in the table. How to do this ?
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private int id;
I don't think this is a great idea. A sequence is just a number incremented of 1 each time. This allows it to be fast but already this is a bottleneck for a distributed database for writes as all the nodes need to synchronize on that number.
If you try to get the first available integer, you need basically to do a full table scan, order the records by id and check the first missing one. That's extremely costly and inefficient for something that shall be as cheap as possible.
You should view the id as a technical ID without functional meaning and thus do not care if there are holes in the sequence or not.
Edit:
I also would add the implications go deeper, even in term of business.
If I get an ID for a article I sell as a merchant and I model its deletion as removing the record or even better put a status "deleted" on it potentially with a date and reason for deletion, I have much easier bookkeeping. Actually, I would prefer the last design: keep the record and have a status that is dynamic and potentially with history. The item could be unavailable for 1 year and be used again if I sell it again.
If on the contrary I silently reuse the ID, then, my system may display an old bill with the data of the new article. Instead of ski boots that I don't sell anymore, it may become a PS5 or 1kg of rice. This is error prone.
This may not apply to all business cases, of course, but its better to consider this kind of usage before going with a design that delete data.
I Agree with Nicolas, but Just to clarify.
You are using an "Identity" and not a "Sequence" there are some differences between them, and how are declared and used (Each database could have their propietary implementation).
A Sequence is an independent object in your database with some properties (like start, end,increment,...) and an identity is a "property" of the column that depends on how the database handles it.
In the case of sequence (and depending on the database in some identities) you could create "cyclic" sequences to repeat the numbers after the cycle ends. But never a sequence or identity scans for "gaps" in the ids. (As Nicolas said is really bad for perfomance)
But depending on how your code will work you could create a cycle in a sequence to prevent having an always increasing value. But Only you are sure that there will not be conflicts when inserting new records.

Is there a difference between finding by primary key vs finding by unique column?

If i have an entity with a primary key id and an unique column name. Is there any difference whether i do a SQL request findById(long id) or findByName(String name). Can a search for the primary key be done in O(1) while the other one works in O(n)? Which data structures are used for saving them?
The difference is speed :
Running SQL query against Integer will always be faster than running against a string.
From the perspective of order complexity of the operation, then the two are equivalent.
As others have pointed out, an integer lookup is generally faster than a string lookup. Here are three reasons:
The index would typically be smaller because, integers are 4 bytes and strings are typically bigger.
Indexes on fixed length keys have some additional efficiencies in the tree structure (no need to "find the end of the string").
In many databases, strings incur additional overhead to handle collations.
That said, another factor is that the primary key is often clustered in many databases. This eliminates the final lookup of the row in data pages -- which might be a noticeable efficiency as well. Note that not all databases support clustered indexes, so this is not true in all cases.
If both columns were INTEGER, then the answer would be "no". A PRIMARY KEY is effectively a UNIQUE constraint on a column and little more. Additionally, as usually they both cause internal indexes to be created, again they behave basically the same way.
In your specific case, however, the NAME column is a string. Even though it has a UNIQUE constraint, by virtue of its data type you will incur in some performance loss.
As your question is probably dictated by "ease of use" to some extent (for your debugging purposes it's certainly easy to remember the "name" than it is to remember the "id") the questions you need to ask yourself are:
Will the NAME column always be unique or could it be change to something not unique? Should it actually be unique in the first place (maybe they set it up wrong)?
How many rows do you expect in your table? This is important to know as while a small table won't really show any performance issue, a high cardinality may start to show some.
How many transactions/second do you expect? If it's an internal application or a small amateurial project, you can live with the NAME column being queried whereas if you need extreme scalability you should stay away from it.

Integer to UUID conversion using padded 0's

I have a question regarding UUID generation.
Typically, when I'm generating a UUID I will use a random or time based generation method.
HOWEVER, I'm migrating legacy data from MySQL over to a C* datastore and I need to change the legacy (auto-incrementing) integer IDs to UUIDS. Instead of creating another denormalized table with the legacy integer IDs as the primary key and all the data duplicated, I was wondering what folks thought about padding 0's onto the front of the integer ID to form a UUID. Example below.
*Something important to note is that the legacy IDs highest values will never top 1 million, so overflow isn't really an issue.
The idea would look like this:
Legacy ID: 123456 ---> UUID: 00000000-0000-0000-0000-000000123456
This would be done using some string concats and the UUID.fromString("00000000-0000-0000-0000-000000123456" method.
Does this seem like a bad pattern to anyone? I'm not a huge fan of the idea, gives me a bad taste in my mouth, but I don't have a technical reason for why haha.
As far as collisions go, the probability of a collision occurring is still ridiculously low. So I'm not worried about increasing collisions. I suppose it just seems like bad practice to me, that its "too easy".
We faced the same kind of issue before when migrating from Oracle with ids generated by sequence to Cassandra with generated UUIDs.
We had to design a type to both support old data coming from Oracle with type long and new data with uuid.
The obvious solution is to use type blob to store the id. A blob can encode a long or an uuid.
This solution only works for partition key because you query them using =. It won't work for clustering column using operators like > or < because we need an ordering on their value.
There was a small objection at that time, which was using a blob to store the id makes it opaque to user, for example in cqlsh when you're doing a SELECT and you need to provide the id, how would you make a blob ?
Fortunately, the native functions of CQL bigIntAsBlob(), blobAsBigInt(), uuidAsBlob() and blobAsUUID() come in very handy.
I've decided to go in a different direction from doanduyhai's answer.
In order to maintain data consistency, we decided to fully de-normalize the data and create another table in C* that is keyed on our legacy IDs. When migrating the objects from our legacy into C*, they are assigned a new randomly generated UUID, which will be their new primary ID for the future. The legacy IDs will be kept around until such a time that we decide they are no longer needed. Upon that time, we can cleanly drop the legacy ID table and be done with them.
This solution allowed for a cleaner break from our legacy ID system in the future, and allowed us to prevent the use of strange custom made UUIDs. I also wasn't a huge fan of having the ID field as a blob type that could have multiple types of data stored in it since, in the future, we plan on only wanting UUIDs to be there.

How do I set a hibernate sequence manually in mysql?

I'm doing some data migration after some data model refactoring and I'm taking a couple tables with composite primary keys and combining them into a larger table and giving it its own unique primary key. At this point, I've written some SQL to copy the old table data into a new table and assign a primary key using AUTO_INCREMENT. After the migration is done, I remove the AUTO_INCREMENT from the PK field. So, now that's all gravy, but the problem is that I need the hibernate sequence to know what the next available PK will be. We use the TABLE strategy generally for all of our entities and I'd like to stay consistent and avoid using AUTO_INCREMENT and the IDENTITY strategy for future objects. I've gotten away with temporarily setting the respective row in the generated "hibernate_sequences" table to the max id of the newly created table, but this is just a bandaid fix to the problem. Also, this results in the next IDs created to be much larger than the max id. I'm certain this is because I don't understand the HiLo id-assigning mechanism, which is why I'm posting here. Is there a way to set this up so that the Ids will be sequential? Or, where is the code that generates the HiLo value so that I can calculate what it should be to ensure sequential ids?
If I understood you correctly, the problem is that hibernate doesn't generate sequental IDs for you. But that is how hi/lo generator works and I do not understand exactly why you do not like it.
In basic, Hi/lo generator is based on supporting HIGH and LOW values separately. When LOW reaches its limit it is reset and HIGH is incremented. The result key is based on combining HIGH and LOW values together. E.g. assume key is double word and HIGH and LOW are words. HIGH can be left two bytes and LOW right two bytes.
Jumps in ID depend on two factors - the max value for LOW and on event which triggers changing the value of HIGH.
By default in Hibernate, max value for LOW is Short.MAX_VALUE and is reset on each generator initialization. HIGH value is read from the table and incremented on each initialization, also it is incremented when LOW reaches it's upper limit. All this means that on each restart of your application, you will have gaps in IDs.
Looking at the code, it seems, that if you would use value <1 for max_lo, the key will be generated just by incrementing hi value, which is read from DB. You would probably like that behaviour :)
Have a look at the source code for org.hibernate.id.MultipleHiLoPerTableGenerator#generate
Using org.hibernate.id.MultipleHiLoPerTableGenerator#generate, I figured out that my batches were of size 50, and so for my purposes using the max id / 50 + 1 generated a usable number to throw into the sequence to make them as close to sequential as possible.

Primary Key Type: int vs long

I know some software shops have been burned by using the int type for the primary key of a persistent class. That being said, not all tables grow past 2 billions. As a matter of fact, most don't.
So, do you guys use the long type only for those classes that are mapped to potentially large tables OR for every persistent class just to be consistent? What's the industry concensus?
I'll leave this question open for a while so that you can share with us your success/horror stories.
Long can be advantageous even if the table does not grow super large, yet has a high turnover ie if rows are deleted/inserted frequently. Your auto-generated/sequential unique identifier may increment to a high number while the table remains small.
I generally use Long because the performance benefits are not noticeable in most of my projects, however a bug due to overflow would be very noticeable!
That's not to say that Int is not a better option for other people's scenarios, for example for data crunching or complex query systems. Just be clear of the risks/benefits and how they impact your specific project.
I don't know about "burned". It's not difficult to change from int to long when you need to. The conversion is straight forward in SQL, and then it's just a search and replace in your client code (or make the change in your persistence layer, and then compile and see what breaks.) You're moving from one integer type to another, so you don't have to worry about subtle conversion issues or truncation..
Going from float to double would be a lot harder.
I use Integer for my surrogate keys unless I have a need for them to be something else. It is not necessary to always use a Long if you don't have a need for it.
(I typically use JPA/Hibernate in my projects running against either Oracle 10g or MySQL 5.x databases.)
Because Int will always be faster for Select/Sorts.

Categories

Resources