Changing foreign key results in new row - java

i´m currently working on my first Java application based on a MySQL DB. I´m using EclipseLink2.0 and NetBeans, at the time i am facing a behaviour i cannot explain, maybe someone has stumbled over this problem in the past and can help me with this. Here goes:
Table 1 has the PK of table 2 as Fk. Application-side, there is an UI where users can generate content for table 1. The value for the fk(Table2ID) is beeing chosen with a dropdown menu, which gets each values by reading the Collection of table2 rows. Now, when i try to change the value for the fk to another (already existing) value, instead of doing just that, a new row with a fresh ID is generated on table2, all other column values are cloned from the row i tried to change the Fk to. So, for example, when i try to set table1 rows 3,4 and 5 to table1.fkcolumn =6 (i.e Table2ID=6), the program instead clones the row with ID=6 3 times and sets each of the table1 columns to one of them.
Any help would be greatly appreciated .

The problem is you are changing the primary key of an entity. In EclipseLink, when you change the PK of an entity, you have a new entity. As such, Eclipselink inserts the new rows and leaves the old rows alone.
To get around this you have three choices:
1) Change the database. Primary keys really shouldn't be changed.
2) Set the application to execute an update query which changes the primary key values and requery them.
3) Delete the old rows and re-create with a new primary key.

Related

Update primary keys without creating duplicate rows?

I'm working on a Java project which needs to be able to alter all the primary keys in a table - and in most cases - some other values in the row as well.
The problem I have is that, if I update a row by selecting by its old primary key (SET pk=new_pk WHERE pk=old_pk) I get duplicate rows (since the old PK value may be equal to another row's new PK value and both rows are then updated).
I figured that in Oracle and some other DBs I might be able to do this with ROWNUM or something similar, but the system should work with most DB systems and right now we can't get this to work for MySQL.
I should add that I don't have access to change the schema of the DB - so, I can't add a column.
What I have tried:
Updating ResultSets directly with RS.updateRow() - this seems to
work, but is very slow.
Hashing the PK's in the table, storing the hash in code and selecting on the hashed PK. This acts sort of as a signature, since a
hashed PK indicates that the row has been read but not yet updated
and I can avoid appropriate rows that way. The issue with this seems
to have been hash collisions as I was getting duplicate PKs.
PS:
I realise this sounds like either a terrible idea, or terrible design, but we really have no choice. The system I'm working on aims to anonymize private data and that may entail changing PKs in some tables. Don't fear, we do account for FK references.
In this case you can use simple update with delta = max Pk from updating table
select delta
select max(pk) as delta from table
and then use it in query
update table SET pk=pk+delta+1
Before this operation you need to disable constraints. And don't forget that you should also update foreign keys.

Hive: Best way to do incremetal updates on a main table

So I have a main table in Hive, it will store all my data.
I want to be able to load a incremental data update about every month
with a large amount of data couple billion rows. There will be new data
as well as updated entries.
What is the best way to approach this, I know Hive recently upgrade and supports update/insert/delete.
What I've been thinking is to somehow find the entries that will be updated and remove them from the main table and then just insert the new incremental update. However after trying this, the inserts are very fast, but the deletes are very slow.
The other way is to do something using the update statement to match the key values from the main table and the incremental update and update their fields. I haven't tried this yet. This also sounds painfully slow since Hive would have to update each entry 1 by 1.
Anyone got any ideas as to how to do this most efficiently and effectively ??
I'm pretty new to Hive and databases in general.
If merge in ACID mode is not applicable, then it's possible to update using FULL OUTER JOIN or using UNION ALL + row_number.
To find all entries that will be updated you can join increment data with old data:
insert overwrite target_data [partition() if applicable]
SELECT
--select new if exists, old if not exists
case when i.PK is not null then i.PK else t.PK end as PK,
case when i.PK is not null then i.COL1 else t.COL1 end as COL1,
...
case when i.PK is not null then i.COL_n else t.COL_n end as COL_n
FROM
target_data t --restrict partitions if applicable
FULL JOIN increment_data i on (t.PK=i.PK);
It's possible to optimize this by restricting partitions in target_data that will be overwritten and joined using WHERE partition_col in (select distinct partition_col from increment_data) or pass partition list if possible as a parameter and use in the where clause, it will work even faster.
Also if you want to update all columns with new data, you can apply this solution with UNION ALL+row_number(), it works faster than full join: https://stackoverflow.com/a/44755825/2700344
Here is my solution/work around if you are using old hive version. This works better when you have large data in target table which we can't drop and recreate with full data every time.
create one more table say delete_keys table. This will hold all the key from main table which are deleted along with its surrogate key.
While loading incremental data into main table, do a left join with main table. For all the matching records, we ideally should update the main table. But instead, we take keys (along with surrogate key) from main table for all matching records and insert that to delete_keys table. Now we can insert all delta records into main table as it irrespective of whether they are to be updated or inserted.
Create view on main table using delete-keys table so that matching keys with delete-keys table are not fetched. So, this view will be final target table. This view will not show records from main table which are updated with latest records.

Increment ID on insertion of a new row on database

I need something which I dont know if it's possible to achieve. Basically I'll be adding new rows of information from java to a database with the table structure , ex:
Number | Color | Size
0 | Red | Big
1 | Green | Small
2 | Yellow| Medium
I'm using java and I'll only input the Color and Size, and I would like to know if it's possible to create a trigger that will store the variable Number(id) on the database side, and increment it each time I do an Insert of a new row into the db. I was thinking doing something like, " INSERT INTO table ((null),'Red', 'Big'), and then the database would update the value with the proper Number.
Also should be possible to bare with fact that some rows can be deleted, but it won'shouldn't affect anything, example: if I have ID's 0 , 1 ,2 and I delete 1, next row should still be 3.
Another thing, I'm using Sybase SQL Anywhere 12 to do this.
You should use autoincrement column in your database.
See this.
http://www.sqlines.com/sybase-asa/autoincrement_identity
As #Gordon Linoff said ...
Identity columns are doing that, for example ...
create table T1 (ID int identity(1,1), Name nvarchar(100))
In this case you would go ...
insert into T1 (Name) values ('John')
So, you would insert Name 'John' and DB itself would give him ID 1
Next time you do insert, ID 2 would be set ... and so on and on ..
Identity(1,1) - it means start from 1 and increment it by 1 on new insert
Thing about this is that once number is taken, there is no going back, so if you have IDs 1, 2, 3 .. and delete ID 3 .. on next insert ID will go up to 4, it will not populate "missing number"
there are several solutions that satisfy your requirements but those are different in several aspects and you should decide to select the best one.
some solutions are exists in DB context. (for example #Gregory answer),
but some other solutions are independent of the DB type and specific features. it means that you implement your solution independent of your db type and you could change your db (oracle, sql-server, my-sql , ...) and there is no need to change your java code.
in jpa there are three Sequence Strategies to solving this problem with #GeneratedValue.
1) Table sequencing: you use a separate table for this purpose in your db. this table hold the proper ids for other tables that have auto-increment-columns with this strategy.
2) Sequence objects: you use a sequence object in your db and jpa handle it. Sequence objects are only supported in some databases, such as Oracle, DB2, and Postgres.
3) Identity sequencing: uses special IDENTITY columns in the database to allow the database to automatically assign an id to the object when its row is inserted. Identity columns are supported in many databases, such as MySQL, DB2, SQL Server, Sybase, and PostgreSQL. Oracle does not support IDENTITY columns but it is possible to simulate them using sequence objects and triggers.
if you want to be independent from your db type, I recommend you to use "table strategy" in jpa.
see Java Persistence/Identity and Sequencing for details.
you asked:
I would like to know if it's possible to create a trigger that will
store the variable Number(id) on the database side, and increment it
each time I do an Insert of a new row into the db.
yes, you could use trigger, but as i mentioned there are some simpler solutions rather it.
Also should be possible to bare with fact that some rows can be
deleted, but it won'shouldn't affect anything
in jpa solutions the deleted ids don't be used in next usage, but if you implement your own solution you could use them.
I hope this answer help you.

To avoid duplication entry in the master table using cascadeType

I have 2 master tables 'A', 'B' and one main table 'C' which contains the foreign key of the A and B. I have done all the annotation mapping in the pojo class and used cascadeType. And used many to one relationship from the C to A and B table pojo class. When I try to insert in the C table, A and B master table getting inserted. That's fine. If the value exists already in the master table, Just I need to insert the foreign key in the C table. I mean, the duplicate entry should not happens in the master table's.
1) If I set the unique key constraints, hibernate stop in the master table insertion itself.
2) I don't want to check a condition whether the value exists in the master table or not and get the primary key if the value exists in the master table and update it in the main table. Without those condition check, I am trying to achieve it in annotation itself.
This is just a example. I have 4 foreign key relationship in my table. So, I am trying to avoid those conditions.
Can any one help me on this ?
I hope the question is clear. If any other information needed, Kindly let me know.
Thanks in advance.

avoid multi table relationship with one row table

I have a question about hibernate constrains, about one issue that I never had before.
Imagine that I have a table(Snapshot) where I can add some snapshot rows, which every single one have to be related with just one row of another tables. But this relationship is not only with one table, multiple tables can join with this snapshot table. But I want to prevent that once one row of snapshot is already link with another row table let´s say:
A.row1->Snapshot.row1
It´s not possible that another table pick up the same row for his relationship
B.row1->Snapshot.row1.
Because if dont, imagine the issue when I´m trying to do a delete on cascade on A.
Any idea how to make this work with hibernate unique constraints
in Snapshot make the field that links to the other table (i suppose it's called row1) unique.
#Column(unique = true)
edit:
you can not control how many other tables are using your primary key. what you can do is introduce a new table where you manage your linking. there you coudl have to columns one called link_from and the other one link_to and make link_to unique.

Categories

Resources