I'm selecting a subset of data from a MS SQL datbase, using a PreparedStatement.
While iterating through the resultset, I also want to update the rows. At the moment I use something like this:
prepStatement = con.prepareStatement(
selectQuery,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_UPDATABLE);
rs = prepStatement.executeQuery();
while(rs.next){
rs.updateInt("number", 20)
rs.updateRow();
}
The database is updated with the correct values, but I get the following exception:
Optimistic concurrency check failed. The row was modified outside of this cursor.
I've Googled it, but haven't been able to find any help on the issue.
How do I prevent this exception? Or since the program does do what I want it to do, can I just ignore it?
The record has been modified between the moment it was retrieved from the database (through your cursor) and the moment when you attempted to save it back. If the number column can be safely updated independently of the rest of the record or independently of some other process having already set the number column to some other value, you could be tempted to do:
con.execute("update table set number = 20 where id=" & rs("id") )
However, the race condition persists, and your change may be in turn overwritten by another process.
The best strategy is to ignore the exception (the record was not updated), possibly pushing the failed record to a queue (in memory), then do a second pass over the failed records (re-evaluating the conditions in query and updating as appropriate - add number <> 20 as one of the conditions in query if this is not already the case.) Repeat until no more records fail. Eventually all records will be updated.
Assuming you know exactly which rows you will update, I would do
SET your AUTOCOMMIT to OFF
SET ISOLATION Level to SERIALIZABLE
SELECT row1, row1 FROM table WHERE somecondition FOR UPDATE
UPDATE the rows
COMMIT
This is achieved via pessimistic locking (and assuming row locking is supported in your DB, it should work)
Related
Through a ton of experiments I found that
(jdbcTemplate is a JdbcTemplate and is just used to make execution of queries easy. It has no real relation to the question)
jdbcTemplate.query(
"select 0 from example where id = 23 for update with rs use and keep update locks",
rs -> {
rs.next();
return null;
});
obtains a lock on the selected row, while this one doesn't:
jdbcTemplate.query(
"select 0 from example where id = 23 for update with rs use and keep update locks",
rs -> {
return null;
});
I assume for the same reason the following does not obtain a lock:
jdbcTemplate.execute(
"select id from example where id = 23 for update with rs use and keep update locks"
);
I have a two part question:
What the heck is going on?
How can I execute the select for update in the database so it does result in the lock, but does not return data per selected row? Maybe some kind of script?
There is a Github repository testing these (and many more variants).
Citation from the documentation on Read Stability (RS) isolation level:
The read stability isolation level locks only those rows that an
application retrieves during a unit of work.
-- Rows are locked / accessed during "fetch"
select * from example where id = 23 with rs use and keep update locks;
-- Rows must be accessed either during "execute" / "open" or "fetch" to get the corresponding result
select count(1) from example where id = 23 with rs use and keep update locks;
The main requirement to achieve RS is in other words: if some row participated in the result of the 1-st call of this statement, then it must participate in the results of each subsequent call of the same statement with the same values.
When you just select rows, they are retrieved (accessed) during fetch only. They are not accessed until you fetch them. There is no need to lock these rows beforehand to achieve the logical goal described above. Why to decrease the system concurrency, if you may "lazy" fetch / lock the rows needed?
But when you make some aggregation on the rows needed, they must be accessed either during execute / open or during the 1-st fetch (and the only fetch, since it's aggregation) to lock the corresponding rows. I believe, that such a behavior is not documented - it's only current observed behavior.
I'd suggest you to do 1 fetch on the aggregation statement anyway to be on the safe side. Nobody can guarantee (except if you ask IBM Support on this and get the corresponding clarification), that such a behavior may not change in future.
Trying to write a job that executes SQL query in Java using JDBC drivers (the DB vendors can be either Oracle, DB2 or Postgres).
The query does not really matter. Let’s say it filters on certain values in few columns in 1 DB table and the result is few thousand rows.
For each row in the ResultSet I need to do some logic and sometimes that can fail.
I have a cursor position so, I “remember” last successfully processed row position.
Now I want to implement a “Resume” functionality in case of failure in order not to process again the entire ResultSet.
I went to JDBC spec of Java 8 and found nothing about the order of the rows (is it the same for the same query on the same data or not)?
Also failed to find anything in DB vendors specs.
Anyone who could hint where to look for the answer about row order predictability?
You can guarantee the order of rows by including an ORDER BY clause that includes all of the columns required to uniquely identify a row. In fact, that's the only way to guarantee the order from repeated invocations of a SELECT statement, even if nothing has changed in the database. Without an unambiguous ORDER BY clause the database engine is free to return the rows in whatever order is most convenient for it at that particular moment.
Consider a simple example:
You are the only user of the database. The database engine has a row cache in memory that can hold the last 1000 rows retrieved. The database server has just been restarted, so the cache is empty. You SELECT * FROM tablename and the database engine retrieves 2000 rows, the last 1000 of which remain in the cache. Then you do SELECT * FROM tablename again. The database engine checks the cache and finds the 1000 rows from the previous query, so it immediately returns them because in doing so it won't have to hit the disk again. Then it proceeds to go find other 1000 rows. The net result is that the 1000 rows that were returned last for the initial SELECT are actually returned first for the subsequent SELECT.
I'm trying to execute an SQL query (SELECT operation) using the following Java code:
ResultSet resultSet = statement.executeQuery("SELECT * FROM tasks");
while (resultSet.next()) {
while (1) {
//loop infinitely until a worker executes the task
}
}
But that is inefficient in the case when a new task gets added, as SELECT won't detect the new change ..
So, what is the Postgres SQL syntax that fetches the whole entries while detecting new insertions, within a specific table?
What you want sounds like change data capture (CDC), where you only want what was changed since you last queried the table.
to do that, you need a way to:
1. mark the rows that were changed
2. mark the rows that were already existent
The only way to do that is to:
1. keep a copy of the table so you can compare it against the table being updated/inserted.
2. use the audit columns within the table such as date_inserted, last_modified, etc. and pull the rows with dates after the last time you looked at the table.
3. Implement the table being updated as a slowly changing dimension.
I'm currently using the following query to insert into a table only if the record does not already exist, presumably this leads to a table scan. It inserts 28000 records in 10 minutes:
INSERT INTO tblExample(column)
(SELECT ? FROM tblExample WHERE column=? HAVING COUNT(*)=0)
If I change the query to the following, I can insert 98000 records in 10 minutes:
INSERT INTO tblExample(column) VALUES (?)
But it will not be checking whether the record already exists.
Could anyone suggest another way of querying such that my insert speed is faster?
One simple solution (but not recommended) could be to simply have insert statement, catch duplicate key exception and log them. Assuming that the table has unique key constraint.
Make sure that you have an index on the column[s] you're checking. In general, have a look at the query execution plan that the database is using - this should tell you where the time is going, and so what to do about it.
For Derby db this is how you get a plan and how to read it.
Derby also has a merge command, which can act as insert-if-not-there. I've not used it myself, so you'd need to test it to see if it's faster for your circumstances.
I have a table with ID and name. I want to go through every row of this table.
TheID is a primary key and auto_increment.
I can't use(?) a single query to get all rows because the table is huge.
I am doing something with every result. I want the possibility to stop this task and continue with it later.
I thought I could do something like this:
for (int i = 0; i < 90238529; i++) {
System.out.println("Current ID :" + i);
query = "SELECT name FROM table_name WHERE id = " + i;
...
}
But that does not work because the auto_increment skipped some numbers.
As mentioned, I need an option to stop this task in a way that would allow me to start again where I left. Like with the example code above, I know the ID of the current entry and if I want to start it again, I just set int i = X.
Use a single query to fetch all the records :
query = "SELECT name FROM table_name WHERE id > ? ORDER BY id";
Then iterate over the ResultSet and read how many records you wish (you don't have to read all the row returned by the ResultSet).
Next time you run the query, pass the last ID you got in the previous execution.
You mention this is a big table. It's important to note then that the MySQL Connector/J API Implementation Notes say
ResultSet
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
So, I think you need to do that and I would use a try-with-resources Statement. Next, I suggest you let the database help you iterate the rows
String query = "SELECT id, name FROM table_name ORDER BY id";
try (PreparedStatement ps = conn.prepareStatement(query,
ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = ps.executeQuery();) {
while (rs.next()) {
int id = rs.getInt("id");
String name = rs.getString("name");
System.out.printf("id=%d, name=%s%n", id, name);
}
} catch (SQLException e) {
e.printStackTrace();
}
I can't use a single query to get all rows because the table is huge and I am doing something with every result. Also I want the possibility to stop this task and continue with it later.
Neither of these reasons eliminate using a single query. It only impacts performance (keeping one connection alive for a long time vs. constantly opening and closing a connection, which can be mitigated using a connection pool).
As mentioned I need a option to stop this task but so that I could start again where I left. Like with the example code above I know the ID of the current entry and if I want to start it again I just set the int i = X
If you think about it, this wouldn't work either, as you said yourself
But that does not work because the auto_increment skipped some numbers.
More importantly, rows could have been inserted or deleted since the last time you queried the DB.
First of all, this sounds like a classical XY Problem, (you are describing a problem with your solution to the problem, rather than the actual problem). Secondly, seem to be using an RDBM for something (A queue) that it was never really designed for.
If you really want to do this, rather than use a better suited database there are a number of approaches you can use. Your first problem is that you want to resume from a certain point/state, but that this is not stored in the Database, so will not work in a scenario where there are multiple DB connections. The first way to fix this is to introduce a "processed" field in your table (which you can clear with an UPDATE statement if you want to resume from an arbitrary point), now depending on which problem you're actually trying to solve, this can be a simple true/false field, a unique identifier of the currently processing thread, or a relational table. Depending on requirements.
Then you can go back to using SQL to get the data you want.