Partial Data deletes using Java SQL Hibernate - java

Our Java program has to delete large number of records from DB2 tables but it is running out of Transaction logs.
We are working on increasing the logs space but due to internal processes and other things... it will take a week+ to complete.
We are looking for a temporary way to delete a few records at a time instead of deleting all. For e.g When we have 1000 records to delete, we want the program to delete 50 records at a time and then do a commit and proceed for next 50 until all 1000 records deleted. If delete fails after X number of records deletion, it is still fine.
We are using Hibernate.
Requesting your suggestion on how it can be achieved. I'm looking at checking the sqlstate and sqlcode in java even on successful sql execution scenario but my bad, i couldn't find a way.
so like, loop do... while(check sqlcode for completion not true)
We cannot delete from back-end as the deletions are supposed to happen on user requests from java web application and in addition, we also have some table constraints and delete cascades.

I don't know Hibernate but you can change your delete to:
delete from (
select * from schema.table where id=:id fetch first 50 rows only
)
Assuming executeUpdate() returns the number of rows deleted you can put this in a loop (pseudocode):
... stmt = session.createSQLquery("delete from ( ... fetch first 50 rows only )");
int n = 1
while (n > 0) {
transaction = session.beginTransaction();
n = stmt.executeUpdate();
transaction.commit();
}

Related

Concurrency issue in database operations in vertx

I have to insert two attributes into a table(device_id, timestamp) but before this, I have to delete previous day's records and perform select count to get total count of records from the table.
Based on the count value, data will be inserted in the table.
I have a total of 3 queries which works fine for single user testing but if run a concurrency test with 10 users or more, my code is breaking.
I am using hsqldb and vertx jdbc client.
Is there a way to merge all three queries?
The queries are :
DELETE FROM table_name WHERE timestamp <= DATE_SUB(NOW(), INTERVAL 1 DAY)
SELECT COUNT(*) FROM table_name WHERE device_id = ?
INSERT into table_name(device_id,timestamp) values (?,?)
You need to set auto-commit to false and commit after the last statement.
If the database transaction control is the default LOCKS mode, you will not get any inconsistency, because the table is locked by the DELETE statement until the commit.
If you have changed the transaction control to MVCC, then it depends on the way you use the COUNT in the INSERT statement.

Is the order of the rows in the ResultSet constant for the same SQL query via JDBC on the same state of DB data?

Trying to write a job that executes SQL query in Java using JDBC drivers (the DB vendors can be either Oracle, DB2 or Postgres).
The query does not really matter. Let’s say it filters on certain values in few columns in 1 DB table and the result is few thousand rows.
For each row in the ResultSet I need to do some logic and sometimes that can fail.
I have a cursor position so, I “remember” last successfully processed row position.
Now I want to implement a “Resume” functionality in case of failure in order not to process again the entire ResultSet.
I went to JDBC spec of Java 8 and found nothing about the order of the rows (is it the same for the same query on the same data or not)?
Also failed to find anything in DB vendors specs.
Anyone who could hint where to look for the answer about row order predictability?
You can guarantee the order of rows by including an ORDER BY clause that includes all of the columns required to uniquely identify a row. In fact, that's the only way to guarantee the order from repeated invocations of a SELECT statement, even if nothing has changed in the database. Without an unambiguous ORDER BY clause the database engine is free to return the rows in whatever order is most convenient for it at that particular moment.
Consider a simple example:
You are the only user of the database. The database engine has a row cache in memory that can hold the last 1000 rows retrieved. The database server has just been restarted, so the cache is empty. You SELECT * FROM tablename and the database engine retrieves 2000 rows, the last 1000 of which remain in the cache. Then you do SELECT * FROM tablename again. The database engine checks the cache and finds the 1000 rows from the previous query, so it immediately returns them because in doing so it won't have to hit the disk again. Then it proceeds to go find other 1000 rows. The net result is that the 1000 rows that were returned last for the initial SELECT are actually returned first for the subsequent SELECT.

Inserting 1million tuples in DB taking long time

I have an scenario where i need to insert 1million entries using sql. Which is taking long time.
I have Database structure with 9 tables, in every operation i need perform below tasks.
1. Open Connection
2. Insert tuples in 9 tables with maintaining Primary/foreign key relationship
3. Commit
4. close the connection
Repeat the operation for 1million time.
i am inserting 800 iterations/hour.
Which i feel too slow,
Do you have any better ways to improve on this?
Try inserting by Batch (i.e. PreparedStatement). Maybe you are inserting them individually.
e.g.
for(int i = 0; i < rows.length; i++){
// set parameter 1 to rows[i][0]
preparedStatement.setObject(1, rows[i][0]);
// set parameter 2 to rows[i][1]
.
.
.
preparedStatement.addBatch();
// insert 10k rows
if(i % 10000 == 0)
preparedStatement.executeBatch();
}
preparedStatement.executeBatch();
In this case where you have foreign keys, batch insert first the data to tables without FKs.
Why are batch inserts/updates faster? How do batch updates work?
Try to run execute the query in bulk as suggested in above answers. make sure to take care of rollback strategy in case error is thrown for a particular data.
Different strategies if error is thrown by a batch are :
1. full rollback
2. Commit at logical point and rest rollback.

How to handle bad or duplicate records in Statement.ExecuteBatch()

I have a java program which processes a file of 1 Million records and insert it into a table using bulk insertion i.e. Statement.addbatch() and then Statement.executeBatch() after every 1000 records. The program runs quite fast.
However, if there is a duplicate record i.e. the table raises an exception the whole batch is gone and the rest of the records are un-trackable.
Even if I get the updatecount() is is of no help because I cannot insert the duplicates into another table etc.
Is there a way, that in 1 particular batch insert of 1000, if there is a bad record, then each record in that batch can be processed 1 by 1 so that the bad/duplicate records can be placed in another table and the non-duplicates in regular table ?
Is there any other class I can use ? I know in c++ the Oracle provides OCI which can handle single records in the batch (called Host array operation), but in Java the Bulk insertion is usually done by Statement.adding in the loop and then inserting it using Statement.executeBatch().
Thanks.
I would break it up into smaller chunks of 1,000, like this
final int BATCH_SIZE = 1000;
for (int i = 0; i < DATA_SIZE; i++) {
statement.setString(1, "a#a.com");
statement.setLong(2, 1);
statement.addBatch();
if (i % BATCH_SIZE == BATCH_SIZE - 1)
statement.executeBatch();
}
if (DATA_SIZE % BATCH_SIZE != 0)
statement.executeBatch();
It is quite frequent for a batch of records to contain few bad ones. If you try to insert all the records in one go, and one record fails, the whole lot of insertions will be rejected. That is expected and is the core purpose of "transaction handling".
Usually for batch inserts, you can take two approaches:
1)Commit after each record insertion --> Very performance intensive process.
2)Divide the total records into smaller "chunks" and insert into database. So that just the chunk containing the bad record will fail and other chunks will be inserted into the database.
Alternatively, if you dont want to handle these things yourself, go for a framework. Spring batch
Could be one of your options in that case

How to optimize the processing speed for inserting data using java?

I have a requirement to read an Excel file with its headers and data and create a table in a Database (MySQL) on the basis of header and put value which is extracted from file. For that I am using JDBC for creating and inserting data (used prepared statement) in DB Table.
It works nicely but when the number of records are increased -suppose file contains 200000 or more records- it's going to be slow. Please guide me how I am able to optimize the processing speed of inserting data into an DB Table.
Thanks, Sameek
To optimize it you should first use the same PreparedStatement object in all of the inserts.
To further optimize the code you can send batches of updates.
e.g. batches of 5:
//create table
PreparedStatement ps = conn.prepareStatement(sql);
for(int i =0; i < rows.length; ++i) {
if(i != 0 && i%5 == 0) {
pstmt.executeBatch();
}
pstmt.setString(1, rows[i].getName());
pstmt.setLong(2, rows[i].getId());
pstmt.addBatch();
}
pstmt.executeBatch();
Wrap your inserts in a transaction. Pseudo code:
1) Begin transaction
2) Create prepared statement
3) Loop for all inserts, setting prepared statement parameters and executing for each insert
4) Commit transaction
I'll take the example of hibernate. Hibernate have a concept called HibernateSession which stores the SQL command that are not yet sent to DB.
With Hibernate you can do inserts and flush the session every 100 inserts which means sending SQL queries every 100 inserts. This helps to gain performance because it communicates with database every 100 inserts and not each insert.
So you can make the same thing by executing the executeUpdate every 100 (or what ever you want) times or use preparedStatement.

Categories

Resources