Is it possible to abort an insert ... select clause from Java ? Using either JDBC or Hibernate, it doesn't matter. The DB is Oracle.
I reckon it's not possible because there is a single DB call and the process is running in Oracle, not the JVM.
Oracle OCI (C driver) provides an OCIBreak() function. It's even thread-safe and you can call it from any bg thread, while the main thread is using the same connection.
Maybe that Statement.cancel() does the same thing.
This OCIBreak() requires round trip to DB server (i.e. the network must be functional) and then the main thread receives an error:
java.sql.SQLException: ORA-01013: user requested cancel of current operation
You should be able to mark this exception as non-critical on JBOSS level (using ExceptionSorter).
PS: I'm really curious if this could be called from hibernate. As JPA leaves many long running queries on our DB servers.
Related
I have two DB's where i use DB link to connect to DB2 from Db1.
I am using JDBC, C3Po jar, oracle 11g. This is a batch job runs every day.
At first run which is successful there are around 400k records inserted during merge command, during second run (its a daily job) we are facing issues. I guess the issue is because of merge query? merge has condition, if exists update otherwise insert. most likely it will re-run 400k which are mostly identical(for now) so is this looking up and update/insert causing this problem?
This is how my logic looks
method(){
1.select query where DBlink involved, contains Db1 and Db2 tables
2.iterate the result and using batch updates
save (using MERGE) the result from step 1 to a table which is not used in step 1 query.
we are dealing with over 100k records here.
3. stmt.executeBatch();//This is where the sql exception is occurring
closeResultSet(rs);
clearBatch(stmt);
closePreparedStatement(stmt);
closePreparedStatement(pstmt);//select statement
}
There is no closing of DBlink in my code. this above logic will run on 8 (greater or less) threads parallely with different inputs each having its own connection to db.
here are my thoughts and doubts
i think each thread will create a DBlink
do i need to increase the shared poll size as suggested by this link
here?
If it is because of Dblink why am i facing exception while
executeBatch() where there is no use of Dblink?
I do not think closing the DBlink will work in my case. since these
are running parallely they will mostly reach to the code at the same
time, do i need change the configuration to have more Dblinks?
Developing a Java application that share a single Connection between mutiple threads, the problem of concurrency arise.
If thread A updates record 1 in table T, and simultaneously thread B issues a SELECT on record 1 in table T, how do I ensure thread B reads the updated values of thread A?
java.sql.Connection offers transactions with begin(), commit() and rollback(), but does this process also cover data correctness?
I think I'm missing something.
Two points:
You shouldn't share a jdbc.Connection between threads, at least for any 'seriously production' code, see here. For demo purposes, I think, sharing a Connection is OK;
If a thread reads from DB after relevant DB transaction is committed, it will see data written by another thread.
For your second question
will thread B timeout until the first transaction has commit() or rollback()
-- B will block till A tx is finished (either by commit or rollback) if:
B tries to update/delete same table row which is being updated by A, and ...
A updates that row under DB-level lock, using SELECT ... FOR UPDATE.
You can get this behavior using two consoles (for example, with PostgreSQL psql), each console stands for a thread:
in A console type following:
BEGIN;
SELECT some_col FROM some_tbl WHERE some_col = some_val FOR UPDATE;
now in B console type:
BEGIN;
UPDATE some_tbl SET some_col = new_val WHERE some_col = some_val;
You should see that UPDATE blocks until in A you do either COMMIT or ROLLBACK.
Above explanation uses separate DB connections, just like Java JDBC connection pool. When you share single connection between Java threads, I think, any interaction with DB will block if connection is used by some other thread.
Jdbc is a standard that is broadly adopted but with uneven levels of adherence, it is probably not good to make sweeping statements about what is safe.
I would not expect there is anything to keep statement executions and commits and rollbacks made from multiple threads from getting interleaved. Best case, only one thread can use the connection at a time and the others block, making multithreading useless.
If you don't want to provide a connection to each thread, you could have the threads submit work items to a queue that is consumed by a single worker thread handling all the jdbc work. But it's probably less impact on existing code to introduce a connection pool.
In general if you have concurrent updates and reads then they happen in the order that they happen. Locking and isolation levels provide consistency guarantees for concurrent transactions but if one hasn't started its transaction yet those aren't applicable. You could have a status flag, version number, or time stamp on each row to indicate when an update occurred.
If you have a lot of updates it can be better to collect them in a flat file and execute a bulk copy. It can be much faster than using jdbc. Then with updates out of the way execute selects in jdbc.
Is there a DB2 System Table - Batch Runtime log in Mainframe? In DB2 for i Series, there is a table function QSYS2.GET_JOB_INFO() that returns Job Information during runtime including the Status (Active /Complete) and most importantly V_SQL_STATEMENT_TEXT - Statement of the last SQL run.
Scenario:
I want to retrieve the last executed SQL Statement during runtime in Cobol Batch Job. The main purpose of this is to determine if a COMMIT or ROLLBACK has been issued, while the job is running. The aim is to create small program, let's call it "controller", to monitor DB2 when Commit or Commit interval is issued, or even Rollback. To be more specific - this "controller" will act as mini OS and will have the capacity to trigger the Main Programs.
For instance, if the Main program issues a ROLLBACK the "controller program" can issue specific business logic and can control the updates. Updates can be done in both T1 and T2 Type of DB2 Connection. By that means, updates are done in batch client side or Java side running in EXCI (EXCI using RRS recovery).
A quick look in the IBM Documentation for DB2 seems to indicate "no."
However, while not an exact match for your situation, here's what we used to do...
Create a table, call it APP_RESTART_DATA with columns to uniquely identify an execution of your process. We used PROC_NAME and STEP_NAME as we were confined to batch jobs. Also have a KEY column and any other metadata you might find helpful in a restart situation. Some people stored the record number instead of the actual key value.
In your controller program, begin by doing a SELECT with your unique identifier(s) to determine if you're in restart mode. If you get an SQLCODE of 0 then you are in restart mode and will have retrieved the last KEY for which a COMMIT was successfully executed. Under these circumstances you must locate that key in your input data and then begin normal processing with the data immediately subsequent. If you got an SQLCODE of 100 then you are not in restart mode; under these circumstances you can just begin normal processing at the start of your input data.
As you process the input data and reach a COMMIT point, also UPDATE your APP_RESTART_DATA table with the new KEY. Then COMMIT. Our COMMIT points were also dictated by a parameter indicating how many logical units of work to process between COMMITs. We could decrease this parameter if it became necessary to run batch processes during prime shift that were normally run off-shift.
When you complete processing of your input data, DELETE the row for your process in the APP_RESTART_DATA table.
Catching ROLLBACK might be tricky. You could flag your row in APP_RESTART_DATA as having performed a ROLLBACK when done in the code, but if done implicitly in an abend situation you may find yourself registering a condition handler via the Language Environment CEEHDLR callable service so you get control and can indicate a ROLLBACK occurred.
I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.
I have a GUI which allows users to run long running queries. Sometimes, the users regret running the queries and would like to cancel them. The queries are running using iBATIS against an Oracle database, and I know that the java.sql.Statement interface defines a cancel method that might or might not be implemented by the driver. So my question is, is it possible to use iBATIS to invoke this method to cancel the query (given the right driver), or is there any other way of aborting an ongoing long running query.
Well,
I guess that once the got to the DB server, cancelling it is really a "DB vendor specific" issue.
If your requirement is to cancel the query, when it comes to your application
(i.e - if it reaches the Oracle DB server, and is run there, you are fine, as long as you will not get the result), consider using the Future interface which has a cancel method.
You can submit a "Callable" to run your query, and it will return a proper object of type which is an implementation.
If you need to abort - just use the "cancel" method of the future object.
You can also check using "isCanceled" to see if submission was cancelled, and make proper treatment at your code.