I have two DB's where i use DB link to connect to DB2 from Db1.
I am using JDBC, C3Po jar, oracle 11g. This is a batch job runs every day.
At first run which is successful there are around 400k records inserted during merge command, during second run (its a daily job) we are facing issues. I guess the issue is because of merge query? merge has condition, if exists update otherwise insert. most likely it will re-run 400k which are mostly identical(for now) so is this looking up and update/insert causing this problem?
This is how my logic looks
method(){
1.select query where DBlink involved, contains Db1 and Db2 tables
2.iterate the result and using batch updates
save (using MERGE) the result from step 1 to a table which is not used in step 1 query.
we are dealing with over 100k records here.
3. stmt.executeBatch();//This is where the sql exception is occurring
closeResultSet(rs);
clearBatch(stmt);
closePreparedStatement(stmt);
closePreparedStatement(pstmt);//select statement
}
There is no closing of DBlink in my code. this above logic will run on 8 (greater or less) threads parallely with different inputs each having its own connection to db.
here are my thoughts and doubts
i think each thread will create a DBlink
do i need to increase the shared poll size as suggested by this link
here?
If it is because of Dblink why am i facing exception while
executeBatch() where there is no use of Dblink?
I do not think closing the DBlink will work in my case. since these
are running parallely they will mostly reach to the code at the same
time, do i need change the configuration to have more Dblinks?
Related
I want to delete entries from multiple tables in a postgreSQL DB.
The tables have foreign key constraints, so I need to delete them in particular order only (otherwise delete will fail).
I am thinking of adding them to a batch and running executeBatch()
I understand that executeBatch submits all the statements together to driver but how are statements executed? Is the order of deletion will be maintained as per order of adding to the batch? I can't find it mentioned in API doc
The JDBC 4.3 specification explicitly specifies the behaviour of a batch execution in section 14.1.2 Successful Execution:
Batch commands are executed serially (at least logically) in the order
in which they were added to the batch.
and
The entries in the array are ordered according to the order in which
the commands were processed (which, again, is the same as the order in
which the commands were originally added to the batch).
The "at least logically" gives databases some leeway to reorder things as an optimization, as long as the resulting behaviour is the same as if the batch was executed in the specified order. Execution in-order is also necessary to ensure the returned update counts match, and for exception behaviour.
They are executed in order.
The purpose of "batching" is to collect the SQL statements and transmit them as a block, a sequence of statements, in order to reduce the network overhead of communicating with the database server.
A full "send SQL, wait for response" takes time, so by sending multiple requests together, a lot of waiting time can be eliminated.
I am fetching records (of large data set, around 1 Million records)from MariaDB in batches of size 500 (by using 'limit').
For each fetch iteration I am opening and closing the connection.
In my peer review I was advised to fetch the result set once and batch process by iterating on the result set itself, i.e. without closing the connection.
Is the second method right way of doing it ?
Edit : After I fetch records in batches of size 500 I am updating a field for each record and putting it on a messaging queue.
Yes, the second method is the right way to do it. Here are some reasons:
There is overhead to running the query multiple times.
The underlying data might change in the tables you are using, and the separate batches might be inconsistent.
You are depending on the ordering of the results and the sorting might have duplicates.
Your program starts
Connect to database
do some SQL (selects/inserts/whatever)
do some more SQL (selects/inserts/whatever)
do some more SQL (selects/inserts/whatever)
...
Disconnect from database
Your program ends
That is, keep the connection open as long as needed during the program. (Even if you don't explicitly disconnect, the termination of your program will do the termination. This is important to note when doing a web site -- each 'page' is essentially a separate 'program'; the db connection cannot be held between pages.)
You have another implied question... "Should I grab a batch of rows at once, then process them in the client?" The answer is "It depends".
If the processing can be done in SQL, it is probably much more efficient to do it there. Example: summing up some numbers.
If you fetch some rows from one table, then for each of those rows, fetch row(s) from another table... It will be much more efficient to use an SQL JOIN.
"Batching" may not be relevant. The client interface is probably
Fetch rows from a table (possibly all rows, even if millions)
Look at each row in turn.
Please provide the specifics of what you will be doing with the million rows so we can discuss more specifically.
Polling loop:
If you check for new things to do only once a minute, do reconnect each time.
Given that, it does not make sense to hang onto a resultset between pools.
Opening and closing a database connection is quite time consuming. Keeping the connection open will save a lot of time.
I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).
Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)
One jdbc "select" statement takes 5 secs to complete.
So doing 5 statements takes 25 secs.
Now I try to do the job in parallel. The db is mysql with innodb.
I start 5 threads and give each thread its own db connection. But it still takes 25 secs for all to complete?
Note I provide java with enough heap and have 8 cores but only one hd (maybe having only one hd is the bottleneck here?)
Is this the expected behavour with mysql out of the box?
here is example code:
public void doWork(int n) {
try (Connection conn = pool.getConnection();
PreparedStatement stmt = conn.prepareStatement("select id from big_table where id between "+(n * 1000000)" and " +(n * 1000000 +1000000));
) {
try (ResultSet rs = stmt.executeQuery();) {
while (rs.next()) {
Long itemId = rs.getLong("id");
}
}
}
}
public void doWorkBatch() {
for(int i=1;i<5;i++)
doWork(i);
}
public void doWorkParrallel() {
for(int i=1;i<5;i++)
new Thread(()->doWork(i)).start();
System.console().readLine();
}
(I don't recall where but I read that a standard mysql installation can easily handle 1000 connections in parallel)
Looking at your problem definitely multi-threading will improve your performance because even i once converted an 4-5 hours batch job into a 7-10 minute job by doing exactly the same what you're thinking but you need to know the following things before hand while designing :-
1) You need to think about inter-task dependencies i.e. tasks getting executed on different threads.
2) Using connection pool is a good sign since Creating Database connections are slow process in Java and takes long time.
3) Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.
4) Cut tasks into several work units where each unit does one job.
5) Particularly for your case, i.e. using mysql. Which database engine you use would also affect the performance as the InnoDB engine uses row-level locking. This way, it will handle much higher traffic. The (usual) alternative, however, (MyISAM) does not support row-level locking, it uses table locking.
i'm talking about the case What if another thread comes in and wants to update the same row before the first thread commits.
6) To improve performance of Java database application is running queries with setAutoCommit(false). By default new JDBC connection has there auto commit mode ON, which means every individual SQL Statement will be executed in its own transaction. while without auto commit you can group SQL statement into logical transaction, which can either be committed or rolled back by calling commit() or rollback().
You can also checkout springbatch which is designed for batch processing.
Hope this helps.
It depends where the bottleneck in your system is...
If your queries spend a few seconds each establishing the connection to the database, and only a fraction of that actually running the query, you'd see a nice improvement.
However if the time is spent in mysql, running the actual query, you wouldn't see as much of a difference.
The first thing I'd do, rather than trying concurrent execution is to optimize the query, maybe add indices to your tables, and so forth.
Concurrent execution may be faster. You should also consider batch execution.
Concurrent execution will help if there is any room for parallelization. In your case, there seems to be no room for parallelization, because you have a very simple query which performs a sequential read of a huge amount of data, so your bottleneck is probably the disk transfer and then the data transfer from the server to the client.
When we say that RDBMS servers can handle thousands of requests per second we are usually talking about the kind of requests that we usually see in web applications, where each SQL query is slightly more complicated than yours, but results in much smaller disk reads (so they are likely to be found in a cache) and much smaller data transfers (stuff that fit within a web page.)
Is it possible to abort an insert ... select clause from Java ? Using either JDBC or Hibernate, it doesn't matter. The DB is Oracle.
I reckon it's not possible because there is a single DB call and the process is running in Oracle, not the JVM.
Oracle OCI (C driver) provides an OCIBreak() function. It's even thread-safe and you can call it from any bg thread, while the main thread is using the same connection.
Maybe that Statement.cancel() does the same thing.
This OCIBreak() requires round trip to DB server (i.e. the network must be functional) and then the main thread receives an error:
java.sql.SQLException: ORA-01013: user requested cancel of current operation
You should be able to mark this exception as non-critical on JBOSS level (using ExceptionSorter).
PS: I'm really curious if this could be called from hibernate. As JPA leaves many long running queries on our DB servers.