The strange behaviour of the Oracle "insert into" command

The strange behaviour of the Oracle "insert into" command - java

I'm observing the strange situation in work "insert into" command.
I'll try to explain the situation from my point a view
There is TEMP_LINKS table in my database and application inserts data into it.
Say the query lays in insert1.sql
insert into TEMP_LINK (ID, SIDE)
select ID, SIDE
from //inner query//
group by ID, SIDE;
commit;
and there is java1 class which execute it
...
executeSqlScript(getResource("path-to-query1"));
...
After that, another java2 class make another insert into the same TEMP_LINK table
...
executeSqlScript(getResource("path-to-query2"));
...
where query2 looks like
insert into TEMP_LINK (ID, SIDE)
select
ID, 'B'
from (
select ID
from ...tables
where ..conditions
minus (
select ID
from ..tables
union
select ID
from TEMP_LINKS
);
commit;
Both java1 and java2 are executed in different threads and java1 is finished earlier that java2.
But time to time, second insert(from query2) don't insert data at all. I see in log: Update count 0 and in TEPM_LINKS there are data only from query1.
If I'm running the application again the issue is disappeared and both of the queries inserted properly data.
Earlier I tried to put both of the queries into one sql file, but the issue has appeared too.
So, maybe someone has ideas about what should I do, because mine is over. One interesting fact - sql "minus" operation is used only once - in that query2.

A big difference between Oracle and SQL Server, Oracle NEVER blocks a read. This is true even when records are locked. The following is a simplified explanation. Oracle uses the System Change Number (SCN) at the time a transaction starts to determine the state of the database for that transaction. All sorts of things can happen, inserts, updates, and deletes, the transaction sees the database as it was at the start of that transaction. Changes only matters at the point where the commit/rollback is executed.
In your situation, if the second query starts before the first has committed, the second won't see any changes the first has made, even after the first commits. You need to synchronize those transactions. The easiest way is to combine them into a single sequential execution. Oracle has many more complex synchronization methods, I would not go that route in this situation.

Related

SQL Merge vs Check and Insert/Update in Java

I have an Java(Spring) REST API endpoint where I get 3 data inputs and I need to Insert in the oracle database based on some unique ID using JDBCTemplate. But just to be sure something doesn't break, I want have a check first if I need to insert or just update.
1st Approach
Make a database call with a simple query like
SELECT COUNT(*) FROM TABLENAME WHERE ID='ABC' AND ROWNUM=1
And based on the value of count, make a separate Database call for Insert or Update. (count would never exceed 1)
2nd Approach
Make one single MERGE query hit using jdbctemplate.update() that would look like
MERGE INTO TABLENAME
USING DUAL ON ID='ABC'
WHEN MATCHED THEN UPDATE
SET COL1='A', COL2='B'
WHERE ID='ABC'
WHEN NOT MATCHED THEN
INSERT (ID, COL1, COL2) VALUES ('ABC','A','B')
Based on what I read on different sites, using MERGE is a bit more costly in terms of CPU reads based on an experiment on this site. But they have done it for purely for DB script use where they do it with 2 tables and my context of use is via API call and using DUAL.
I also read on this question that MERGE could result in ORA-0001: unique constraint and some concurrency issue.
I want to do this on a table on which some other operation is possible at the same time for a different row and a very very small chance for the same row value. So I want to know which approach to follow for such use case and I know this might be a common one but I could not find answer to what I'm looking for anywhere. I want to know the performance/reliability of both approach.

Looking at the code running in concurrent sessions environment, after each atomic statement we need to ask "what if another session have just broken our assumption?" and make adjustments according to that.
Option 1. Count and decide INSERT or UPDATE
declare
v_count int;
begin
SELECT count(1) INTO v_count FROM my_table WHERE ...;
IF v_count = 0 THEN
-- what if another session inserted the same row just before this point?
-- this statement will fail
INSERT INTO my_table ...;
ELSE
UPDATE my_table ...;
END IF;
end;
Option 2. UPDATE, if nothing is updated - INSERT
begin
UPDATE my_table WHERE ...;
IF SQL%COUNT = 0 THEN
-- what if another session inserted the same row just before this point?
-- this statement will fail
INSERT INTO my_table ...;
END IF;
end;
Option 3. INSERT, if failed - UPDATE
begin
INSERT INTO my_table ...;
exception when DUP_VAL_ON_INDEX then
-- what if another session updated the same row just before this point?
-- this statement will override previous changes
-- what if another session deleted this row?
-- this statement will do nothing silently - is it satisfactory?
-- what if another session locked this row for update?
-- this statement will fail
UPDATE my_table WHERE ...;
end;
Option 4. use MERGE
MERGE INTO my_table
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...
-- We have no place to put our "what if" question,
-- but unfortunately MERGE is not atomic,
-- it is just a syntactic sugar for the option #1
Option 5. use interface for DML on my_table
-- Create single point of modifications for my_table and prevent direct DML.
-- For instance, if client has no direct access to my_table,
-- use locks to guarantee that only one session at a time
-- can INSERT/UPDATE/DELETE a particular table row.
-- This could be achieved with a stored procedure or a view "INSTEAD OF" trigger.
-- Client has access to the interface only (view and procedures),
-- but the table is hidden.
my_table_v -- VIEW AS SELECT * FROM my_table
my_table_ins_or_upd_proc -- PROCEDURE (...) BEGIN ...DML on my_table ... END;
PROCEDURE my_table_ins_or_upd_proc(pi_row my_table%ROWTYPE) is
l_lock_handle CONSTANT VARCHAR2(100) := 'my_table_' || pi_row.id;
-- independent lock handle for each id allows
-- operating on different ids in parallel
begin
begin
request_lock(l_lock_handle);
-->> this code is exactly as in option #2
UPDATE my_table WHERE ...;
IF SQL%COUNT = 0 THEN
-- what if another session inserted the same row just before this point?
-- NOPE it cannot happen: another session is waiting for a lock on the line # request_lock(...)
INSERT INTO my_table ...;
END IF;
--<<
exception when others then
release_lock(l_lock_handle);
raise;
end;
release_lock(l_lock_handle);
end;
Not going too deep into low level details here, see this article to find out how to use locks in Oracle DBMS.
Thus, we see that options 1,2,3,4 have potential problems that cannot be avoided in a general case. But they could be applied if the safety is guaranteed by domain rules or a particular design conventions.
Option 5 is bulletproof and fast as it is relies on the DBMS contracts.
Nevertheless, this will be a prize for clean design, and it cannot be implemented if my_table is barenaked and clients rely on straightforward DML on this table.
I believe that performance is less important than data integrity, but let's mention that for completeness.
After proper consideration it is easy to see that the options order according to the "theoretical" average performance is:
2 -> 5 -> (1,4) -> 3
Of course, the step of performance measuring goes after obtaining at least two properly working solutions, and should be done exclusively for a particular application under a given workload profile. And that is another story. At this moment no need to bother about theoretical nanoseconds in some synthetic benchmarks.
I guess currently we see that there will be no magic. Somewhere in the application it is required to ensure that every id inserted into my_table is unique.
If id values do not matter (95% of cases) - just go for using a SEQUENCE.
Otherwise, create a single point of manipulation on my_table (either in Java or in DBMS schema PL/SQL) and control the uniqueness there. If the application can guarantee that at most a single session at a time manipulates data in my_table, then it is possible to just apply the option #2.

Counting Number Of Specific Record In Database

I have a application which needs to aware of latest number of some records from a table from database, the solution should be applicable without changing the database code or add triggers or functions to it ,so I need a database vendor independent solution.
My program written in java but database could be (SQLite,MySQL,PostgreSQL or MSSQL),for now I'm doing Like that:
In a separate thread that is set as a daemon my application sends a simple command through JDBC to database to be aware of latest number of the records with condition:
while(true){
SELECT COUNT(*) FROM Mytable WHERE exited='1'
}
and this sort of coding causes DATABASE To lock,slows down the whole system and generates huge DB Logs which finally brings down the whole thing!
how can i do it in a right way to always have latest number of certain records or only counting when the number changed?

A SELECT statement should not -- by itself -- have the behavior that you are describing. For instance, nothing is logged with a SELECT. Now, it is possible that concurrent insert/update/delete statements are going on, and that these cause problems because the SELECT locks the table.
Two general things you can do:
Be sure that the comparison is of the same type. So, if exited is a number, do not use single quotes (mixing of types can confuse some databases).
Create an index on (exited). In basically all databases, this is a single command: create index idx_mytable_exited on mytable(exited).
If locking and concurrent transactions are an issue, then you will need to do more database specific things, to avoid that problem.

As others have said, make sure that exited is indexed.
Also, you can set the transaction isolation on your query to do a "dirty read"; this indicates to the database server that you do not need to wait for other processes' transactions to commit, and instead you wish to read the current value of exited on rows that are being updated by those other processes.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED is the standard syntax for using "dirty read".

Solve Concurrent Update/Delete Statements Java Oracle

The problem I have right now deals with the SQL UPDATE and DELETE statements concurrently. If the program is only called one after the other then there is no problems, however, if two people decide to run the program it might fail.
What my program does:
A program about food which all has a description and a date of when that description was made. As people enter the description of the food it gets entered into a database where you can quickly retrieve the description. If the description is lets say 7 days old then we delete it cause its outdated. However, if a user enters a food already in the database with a different description then we update it and change the date. The deletion happens after the update/insertion (those that dont need updating will be inserted and then the program checks for outdated things in the database and deletes them).
The problem:
Two people run the program and right as one person is trying to update a food, the other clears it out with the deletion cause it just finished. The update will not happen, and the program will continue with the rest of the updates (<- I read that this is because my driver doesn't stop. Some drivers stop updating if there is an error).
What I want to do:
I want my program to stop at the bad update or grab that food position and restart the process/thread. The restarting will include sorting out which foods needs to be updated or inserted. Therefore, the bad record will be moved into the inserting method and not the update. The update will continue where it left off. And all's well.
I know this is not the only way, so different methods on how to solve this problem is welcome. I have looked up that you can use an upsert statement, but that also has race conditions. (Question about the upsert statement: If I make the upsert method synchronized will it not have race conditions?)
Thanks

There are different pratical solutions to your problem depending on jout jdbc connection management.
If the application is a client server one and it uses a dedicated persistent connection (i.e. it opens a jdbc connection at program startup and it closes when the program shutdowns) for each client you can use a select for update statement.
You must issue a select for update when displaying records to the user and when the user does its action you do what is needed and commit.
This approach serializes the dabatabase operations and if you show and lock multiple records it may not be feasible.
A second approach is usable when you have a web application with a connection pool or when you don't have a dedicated connection you can use for the read and update/delete operation. In this case you have this scenario
user 1 selects its data with jdbc connection 1
user 2 selects its data (the same as user 1) with jdbc connection 2
user 2 submit data causing some deletions with jdbc connection 3
user 1 submit data and lot an update beacuse the data was deleted with jdbc connection 2
Since you cannot realy on the same jdbc connection to lock the data you read, you can issue a select for update before updating the data and check if there are data. If you have the data you can update them (and they will not be deleted by other sessions since every delete command on the same data is waiting your select for update to terminate); if you don't have the data because they where deleted during user display you must reinsert them. You delete statement must have a filter on the date column that represent the last update.
You can use other approaches and avoid the select for update using for example an
update food-table set last_update=? where id=? and last_update=<the last update you have in java program>
and you must check that the update statement did update a row (in jdbc executeUpdate returns the number of rows modified, but you did not specifiy if you are using "plain" JDBC or some sort of framework) and if it did not update a row you must isse the insert statement.

Set transaction level to serializable in java code. Then your statements should look like:
update food_table set update_time = ? where ....
delete from food_table where update_time < ?
You may get an serializable exception in either case. In the case of the update you will need to reinsert the entry. In the second case, just ignore and run again.

Return (self) generated value from insert statement (no id, no returning)

sorry, if the question title is misleading or not accurate enough, but i didn't see how to ask it in one sentence.
Let's say we have a table where the PK is a String (numbers from '100,000' to '999,999', comma is for readability only).
Let's also say, the PK is not sequentially used.
Now i want to insert a new row into the table using java.sql and show the PK of the inserted row to the User. Since the PK is not generated by default (e.g. insert values without the PK didn't work, something like generated_keys is not available in the given environment) i've seen two different approaches:
in two different statements, first find a possible next key, then try to insert (and expect that another transaction used the same key in the time between the two statements) - is it valid to retry until success or could any sql trick with transaction-settings/locks help here? how can i realize that in java.sql?
for me, that's a disappointing solution, because of the non-deterministic behaviour (perhaps you could convince me of the contrary), so i searched for another one:
insert with a nested select statement that looks up the next possible PK. looking up other answers on generating the PK myself I came close to a working solution with that statement (left out the casts from string to int):
INSERT INTO mytable (pk,othercolumns)
VALUES(
(SELECT MIN(empty_numbers.empty_number)
FROM (SELECT t1.pk + 1 as empty_number
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON t1.pk + 1 = t2.pk
WHERE t2.pk IS NULL
AND t1.pk > 100000)
as empty_numbers),
othervalues);
that works like a charm and has (afaik) a more predictable and stable solution than my first approach, but: how can i possibly retrieve the generated PK from that statement? I've read that there is no way to return the inserted row (or any columns) directly and most of the google results i've found, point to returning generated keys - even though my key is generated, it's not generated by the DBMS directly, but by my statement.
Note, that the DBMS used in development is MSSQL 2008 and the productive system is currently a DB2 on AS/400 (don't know which version) so i have to stick close to SQL standards. i can't change the db-structure in any way (e.g. use generated keys, i'm not sure about stored procedures).

DB2 for i allows generated keys, stored procedures, user defined functions - pretty much all of the things SQL Server can do. The exact implementation is different, but that's what manuals are for :-) Ask your admin what version of IBM i they're running, then hit up the Infocenter for specifics.
The constraining factor is that you can't alter the database design; you are stuck with apparently multiple processes trying to INSERT while backfilling 'holes' in the existing keyspace. That's a very tough nut to crack. Because you can't change the DB design, there's nothing to be done except to allow for and handle PK collisions. There's no SQL trick that'll help - the SQL way is to have the DB generate the PK, not the application.
There are several alternatives to suggest, in the event that some change is allowed. All have issues needing a workaround, but that is unavoidable at this point due to the application design.
Create a UDF that all INSERT clients use to retrieve the next available PK. Use a table of 'available numbers' and delete them as they are issued.
Pre-INSERT all the available numbers. Force clients to do an UPDATE. Make them FETCH...FOR UPDATE where (rest of data = not populated). This will lock the row, avoiding collisions as well as make the PK immediately available.
Leave the DB and the other application programs using this table as-is, but have your INSERT process draw from a block of keys that's been set aside for your use. Keep the next available number in an SQL SEQUENCE or an IBM i data area. This only works if there's a very large hole in the keyspace that's not yet used.

SQL - INSERT IF NOT EXIST, CHECK if the same OR UPDATE

I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)

I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.

Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;

Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.