Batching "UPDATE vs. INSERT" Queries Against Oracle Database

Batching "UPDATE vs. INSERT" Queries Against Oracle Database - java

Let's assume that I have an Oracle database with a table called RUN_LOG I am using to record when jobs have been executed.
The table has a primary key JOB_NAME which uniquely identifies the job that has been executed, and a column called LAST_RUN_TIMESTAMP which reflects when the job was last executed.
When an job starts, I would like to update the existing row for a job (if it exists), or otherwise insert a new row into the table.
Given Oracle does not support a REPLACE INTO-style query, it is necessary to try an UPDATE, and if zero rows are affected follow this up with an INSERT.
This is typically achieved with jdbc using something like the following:
PreparedStatement updateStatement = connection.prepareStatement("UPDATE ...");
PreparedStatement insertStatement = connection.prepareStatement("INSERT ...");
updateStatement.setString(1, "JobName");
updateStatement.setTimestamp(2, timestamp);
// If there are no rows to update, it must be a new job...
if (updateStatement.executeUpdate() == 0) {
// Follow-up
insertStatement.setString(1, "JobName");
insertStatement.setTimestamp(2, timestamp);
insertStatement.executeUpdate();
}
This is a fairly well-trodden path, and I am very comfortable with this approach.
However, let's assume my use-case requires me to insert a very large number of these records. Performing individual SQL queries against the database would be far too "chatty". Instead, I would like to start batching these INSERT / UPDATE queries
Given the execution of the UPDATE queries will be deferred until the batch is committed, I cannot observe how many rows are affected until a later date.
What is the best mechanism for achieving this REPLACE INTO-like result?
I'd rather avoid using a stored procedure, as I'd prefer to keep my persistence logic in this one place (class), rather than distributing it between the Java code and the database.

What about the SQL MERGE statement. You can insert large number of records to temporary table, then merge temp table with RUN_LOG For example:
merge into RUN_LOG tgt using (
select job_name, timestamp
from my_new_temp_table
) src
on (src.job_name = tgt.job_name)
when matched then update set
tgt.timestamp = src.timestamp
when not matched then insert values (src.job_name, src.timestamp)
;

Related

Concurrency issue in database operations in vertx

I have to insert two attributes into a table(device_id, timestamp) but before this, I have to delete previous day's records and perform select count to get total count of records from the table.
Based on the count value, data will be inserted in the table.
I have a total of 3 queries which works fine for single user testing but if run a concurrency test with 10 users or more, my code is breaking.
I am using hsqldb and vertx jdbc client.
Is there a way to merge all three queries?
The queries are :
DELETE FROM table_name WHERE timestamp <= DATE_SUB(NOW(), INTERVAL 1 DAY)
SELECT COUNT(*) FROM table_name WHERE device_id = ?
INSERT into table_name(device_id,timestamp) values (?,?)

You need to set auto-commit to false and commit after the last statement.
If the database transaction control is the default LOCKS mode, you will not get any inconsistency, because the table is locked by the DELETE statement until the commit.
If you have changed the transaction control to MVCC, then it depends on the way you use the COUNT in the INSERT statement.

Postgres: How to get SELECT to detect new insertions within a specific table

I'm trying to execute an SQL query (SELECT operation) using the following Java code:
ResultSet resultSet = statement.executeQuery("SELECT * FROM tasks");
while (resultSet.next()) {
while (1) {
//loop infinitely until a worker executes the task
}
}
But that is inefficient in the case when a new task gets added, as SELECT won't detect the new change ..
So, what is the Postgres SQL syntax that fetches the whole entries while detecting new insertions, within a specific table?

What you want sounds like change data capture (CDC), where you only want what was changed since you last queried the table.
to do that, you need a way to:
1. mark the rows that were changed
2. mark the rows that were already existent
The only way to do that is to:
1. keep a copy of the table so you can compare it against the table being updated/inserted.
2. use the audit columns within the table such as date_inserted, last_modified, etc. and pull the rows with dates after the last time you looked at the table.
3. Implement the table being updated as a slowly changing dimension.

MySQL loop through every row (big table)

I have a table with ID and name. I want to go through every row of this table.
TheID is a primary key and auto_increment.
I can't use(?) a single query to get all rows because the table is huge.
I am doing something with every result. I want the possibility to stop this task and continue with it later.
I thought I could do something like this:
for (int i = 0; i < 90238529; i++) {
System.out.println("Current ID :" + i);
query = "SELECT name FROM table_name WHERE id = " + i;
...
}
But that does not work because the auto_increment skipped some numbers.
As mentioned, I need an option to stop this task in a way that would allow me to start again where I left. Like with the example code above, I know the ID of the current entry and if I want to start it again, I just set int i = X.

Use a single query to fetch all the records :
query = "SELECT name FROM table_name WHERE id > ? ORDER BY id";
Then iterate over the ResultSet and read how many records you wish (you don't have to read all the row returned by the ResultSet).
Next time you run the query, pass the last ID you got in the previous execution.

You mention this is a big table. It's important to note then that the MySQL Connector/J API Implementation Notes say
ResultSet
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
So, I think you need to do that and I would use a try-with-resources Statement. Next, I suggest you let the database help you iterate the rows
String query = "SELECT id, name FROM table_name ORDER BY id";
try (PreparedStatement ps = conn.prepareStatement(query,
ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = ps.executeQuery();) {
while (rs.next()) {
int id = rs.getInt("id");
String name = rs.getString("name");
System.out.printf("id=%d, name=%s%n", id, name);
}
} catch (SQLException e) {
e.printStackTrace();
}

I can't use a single query to get all rows because the table is huge and I am doing something with every result. Also I want the possibility to stop this task and continue with it later.
Neither of these reasons eliminate using a single query. It only impacts performance (keeping one connection alive for a long time vs. constantly opening and closing a connection, which can be mitigated using a connection pool).
As mentioned I need a option to stop this task but so that I could start again where I left. Like with the example code above I know the ID of the current entry and if I want to start it again I just set the int i = X
If you think about it, this wouldn't work either, as you said yourself
But that does not work because the auto_increment skipped some numbers.
More importantly, rows could have been inserted or deleted since the last time you queried the DB.
First of all, this sounds like a classical XY Problem, (you are describing a problem with your solution to the problem, rather than the actual problem). Secondly, seem to be using an RDBM for something (A queue) that it was never really designed for.
If you really want to do this, rather than use a better suited database there are a number of approaches you can use. Your first problem is that you want to resume from a certain point/state, but that this is not stored in the Database, so will not work in a scenario where there are multiple DB connections. The first way to fix this is to introduce a "processed" field in your table (which you can clear with an UPDATE statement if you want to resume from an arbitrary point), now depending on which problem you're actually trying to solve, this can be a simple true/false field, a unique identifier of the currently processing thread, or a relational table. Depending on requirements.
Then you can go back to using SQL to get the data you want.

How to run multiple sql commands with SQLQuery object in java

I'm trying to use SQLQuery object with multiple sql commands. I need to split the query in order to get better performance.
CREATE TABLE x (
id integer,
key integer)
select *
from x, users,.......
where .......
DROP TABLE x

If your issue is creating and dropping tables, create a TEMPORARY table with drop on commit. Then when you commit your db transaction the table will be gone.
The issue is that usually you only get the last statement's results returned. If you need something else, look at wrapping with a user defined function and presenting a single tabular result set back.

Updating a database while using a preparedStatement select

I'm selecting a subset of data from a MS SQL datbase, using a PreparedStatement.
While iterating through the resultset, I also want to update the rows. At the moment I use something like this:
prepStatement = con.prepareStatement(
selectQuery,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_UPDATABLE);
rs = prepStatement.executeQuery();
while(rs.next){
rs.updateInt("number", 20)
rs.updateRow();
}
The database is updated with the correct values, but I get the following exception:
Optimistic concurrency check failed. The row was modified outside of this cursor.
I've Googled it, but haven't been able to find any help on the issue.
How do I prevent this exception? Or since the program does do what I want it to do, can I just ignore it?

The record has been modified between the moment it was retrieved from the database (through your cursor) and the moment when you attempted to save it back. If the number column can be safely updated independently of the rest of the record or independently of some other process having already set the number column to some other value, you could be tempted to do:
con.execute("update table set number = 20 where id=" & rs("id") )
However, the race condition persists, and your change may be in turn overwritten by another process.
The best strategy is to ignore the exception (the record was not updated), possibly pushing the failed record to a queue (in memory), then do a second pass over the failed records (re-evaluating the conditions in query and updating as appropriate - add number <> 20 as one of the conditions in query if this is not already the case.) Repeat until no more records fail. Eventually all records will be updated.

Assuming you know exactly which rows you will update, I would do
SET your AUTOCOMMIT to OFF
SET ISOLATION Level to SERIALIZABLE
SELECT row1, row1 FROM table WHERE somecondition FOR UPDATE
UPDATE the rows
COMMIT
This is achieved via pessimistic locking (and assuming row locking is supported in your DB, it should work)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.