For different database technologies, MS-SQL and H2 recently, and even with different applications, I get exceptions from statements that could not acquire a table lock.
This happens in particular when accessing the same table from many parallel threads. Assuming these conflicts cannot be avoided in principle with complex enough statements and enough parallel threads, what is a good way to handle these failed statements?
The MS-SQL exception explicitly suggests to retry the statement.
H2 allows to set a LOCK_TIMEOUT.
Since every statement may fail this way a retry would be needed for every statement in the software. Seems like a lot of boiler-plate code that needs to be added.
In contrast, the timeout is a simple configuration, but it needs to be excessively high to guarantee that a few more threads do not again trigger the exception. Which brings me back to the retry logic.
The legacy code contains several ten statements, each implemented in its own method, opening/closing the triplet of connection/statement/result set. What would be a good way to refactor it apart from wrapping each of them into a retry-loop? Lower performance is acceptable, but no failed statements.
Related
Why do I need Transaction in Hibernate for read-only operations?
Does the following transaction put a lock in the DB?
Example code to fetch from DB:
Transaction tx = HibernateUtil.getCurrentSession().beginTransaction(); // why begin transaction?
//readonly operation here
tx.commit() // why tx.commit? I don't want to write anything
Can I use session.close() instead of tx.commit()?
Transactions for reading might look indeed strange and often people don't mark methods for transactions in this case. But JDBC will create transaction anyway, it's just it will be working in autocommit=true if different option wasn't set explicitly. But there are practical reasons to mark transactions read-only:
Impact on databases
Read-only flag may let DBMS optimize such transactions or those running in parallel.
Having a transaction that spans multiple SELECT statements guarantees proper Isolation for levels starting from Repeatable Read or Snapshot (e.g. see PostgreSQL's Repeatable Read). Otherwise 2 SELECT statements could see inconsistent picture if another transaction commits in parallel. This isn't relevant when using Read Committed.
Impact on ORM
ORM may cause unpredictable results if you don't begin/finish transactions explicitly. E.g. Hibernate will open transaction before the 1st statement, but it won't finish it. So connection will be returned to the Connection Pool with an unfinished transaction. What happens then? JDBC keeps silence, thus this is implementation specific: MySQL, PostgreSQL drivers roll back such transaction, Oracle commits it. Note that this can also be configured on Connection Pool level, e.g. C3P0 gives you such an option, rollback by default.
Spring sets the FlushMode=MANUAL in case of read-only transactions, which leads to other optimizations like no need for dirty checks. This could lead to huge performance gain depending on how many objects you loaded.
Impact on architecture & clean code
There is no guarantee that your method doesn't write into the database. If you mark method as #Transactional(readonly=true), you'll dictate whether it's actually possible to write into DB in scope of this transaction. If your architecture is cumbersome and some team members may choose to put modification query where it's not expected, this flag will point you to the problematic place.
All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (e.g., BEGIN, COMMIT, ROLLBACK).
If you don't declare transaction boundaries explicitly, then each statement will have to be executed in a separate transaction (autocommit mode). This may even lead to opening and closing one connection per statement unless your environment can deal with connection-per-thread binding.
Declaring a service as #Transactional will give you one connection for the whole transaction duration, and all statements will use that single isolation connection. This is way better than not using explicit transactions in the first place.
On large applications, you may have many concurrent requests, and reducing database connection acquisition request rate will definitely improve your overall application performance.
JPA doesn't enforce transactions on read operations. Only writes end up throwing a TransactionRequiredException in case you forget to start a transactional context. Nevertheless, it's always better to declare transaction boundaries even for read-only transactions (in Spring #Transactional allows you to mark read-only transactions, which has a great performance benefit).
Transactions indeed put locks on the database — good database engines handle concurrent locks in a sensible way — and are useful with read-only use to ensure that no other transaction adds data that makes your view inconsistent. You always want a transaction (though sometimes it is reasonable to tune the isolation level, it's best not to do that to start out with); if you never write to the DB during your transaction, both committing and rolling back the transaction work out to be the same (and very cheap).
Now, if you're lucky and your queries against the DB are such that the ORM always maps them to single SQL queries, you can get away without explicit transactions, relying on the DB's built-in autocommit behavior, but ORMs are relatively complex systems so it isn't at all safe to rely on such behavior unless you go to a lot more work checking what the implementation actually does. Writing the explicit transaction boundaries in is far easier to get right (especially if you can do it with AOP or some similar ORM-driven technique; from Java 7 onwards try-with-resources could be used too I suppose).
It doesn't matter whether you only read or not - the database must still keep track of your resultset, because other database clients may want to write data that would change your resultset.
I have seen faulty programs to kill huge database systems, because they just read data, but never commit, forcing the transaction log to grow, because the DB can't release the transaction data before a COMMIT or ROLLBACK, even if the client did nothing for hours.
I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.
What is most suitable way to handle optimistic locking in jpa. I have below solutions but don't know its better to use this.
Handling exception of optimistic locking in catch blocking and retrying again.
Using Atomic variable flag and checking if its processing then wait until other thread finish its processing. This way data modification or locking contention may be avoided.
Maintaining queue of all incoming database change request and processing it one by one.
Anyone please suggest me if there is better solution to this problem.
You don't say why you are using optimistic locking.
You usually use it to avoid blocking resources (like database rows) for a long time, i.e. data is read from a database and displayed to the user. Eventually the user makes changes to the database, and the data is written back.
You don't want to block the data for other users during that time. In a scenario like this you don't want to use option 2, for the same reason.
Option 1 is not easy, because an optimistic locking exception tells you that something has changed the data behind your back, and you would overwrite these changes with your data. Re-trying to write the data won't help here.
Option 3 might be possible in some situations, but adds a lot of complexity and possible errors. This would be my last resort by far.
In my experience optimistic locking exceptions are quite rare. In most cases the easiest way out is to discard everything, and re-do it from start, even if it means to tell the user: sorry, there was an unexpected problem, do it again.
On the other hand, if you get these problems regularly, between two competing threads, you should try to avoid it. In these cases option 2 might by the way to go, but it depends on the scenario.
If the conflict occurs between a user interaction and a background thread (and not between two users) you could try to change the timing of the background thread, or signal the background thread to delay its work.
To sum it up: It mostly depends on your setup, and when and how the exception occurs.
We all know that we should rather reuse a JDBC PreparedStatement than creating a new instance within a loop.
But how to deal with PreparedStatement reuse between different method invocations?
Does the reuse-"rule" still count?
Should I really consider using a field for the PreparedStatement or should I close and re-create the prepared statement in every invocation (keep it local)?
(Of course an instance of such a class would be bound to a Connection which might be a disadvantage in some architectures)
I am aware that the ideal answer might be "it depends".
But I am looking for a best practice for less experienced developers that they will do the right choice in most of the cases.
Of course an instance of such a class would be bound to a Connection which might be a disadvantage
Might be? it would be a huge disadvantage. You'd either need to synchronize access to it, which would kill your multi-user performance stone-dead, or create multiple instances and keep them in a pool. Major pain in the ass.
Statement pooling is the job of the JDBC driver, and most, if not all, of the current crop of drivers do this for you. When you call prepareStatement or prepareCall, the driver will handle re-use of existing resource and pre-compiled statements.
Statement objects are tied to a connection, and connections should be used and returned to the pool as quickly as possible.
In short, the standard practice of obtaining a PreparedStatement at the start of the method, using it repeatedly within a loop, then closing it at the end of the method, is best practice.
Many database workloads are CPU-bound, not IO-bound. This means that the database ends up spending more time doing work such as parsing SQL queries and figuring out how to handle them (doing the 'execution plan'), than it spends accessing the disk. This is more true of 'transactional' workloads than 'reporting' workloads, but in both cases the time spent preparing the plan may be more than you expect.
Thus it is always a good idea, if the statement is going to be executed frequently and the hassle of making (correct) arrangements to cache PreparedStatements 'between method invocations' is worth your developer time. As always with performance, measurement is key, but if you can do it cheaply enough, cache your PreparedStatement out of habit.
Some JDBC drivers and/or connection pools offer transparent 'prepared statement caching', so that you don't have to do it yourself. So long as you understand the behaviour of your particular chosen transparent caching strategy, it's fine to let it keep track of things ... what you really want to avoid is the hit on the database.
Yes it can be reused, but I believe this only counts if the same Connection object is being used and if you are using a Database Connection Pool (from within a Web Application, for example) then the Connection objects will be potentially different each time.
I always recreate the PreparedStatement before each use within a Web Application for this reason.
If you aren't using a Connection Pool then you are golden!
I don't see the difference: If I execute the same statement repeatedly against the same connection, why not reuse the PreparedStatement in any way? If multiple methods execute the same statement, then maybe that statement needs to be encapsulated in its own method (or even its own class). That way you wouldn't need to pass around a PreparedStatement.
What are the best ways to avoid race conditions in jsp at the same time not to slow down the process.I have tried
isThreadSafe=false
synchronized(session)
however is there any other alternate solution is available?
A one-size-fits-all solution (e.g. threadSafe=false) causes requests to be executed one at a time. And that inevitably slows down request processing.
To that avoid that scenario, you need to understand why you are getting race conditions and (re-)design your architecture to avoid the problem. For example:
if the race condition is in updates to some shared in-memory data structures, you need to synchronize access and updates to the data structure at the appropriate level of granularity
if the race condition is in updates to the your database, you need to restructure your SQL to use transactions at the appropriate level of granularity.
These are just (possible) schemas for how to solve your race condition problems. In reality, you have to understand the root causes yourself.
It depends on why you have race conditions.
The simplest thought is to have no global variables that you write to.
Do all of your logic in methods that only use local variables.
Don't have any java code in your jsp page, but call out to a bean to do the operations.
What are you doing that is causing a race condition?
isThreadSafe=false
This will probably lead to poor performance as it makes page access sequential. It also will only affect one page, so if you access the data via another page, this will do nothing for you.
synchronized(session)
This is not guaranteed to work (though it will on some servers as a side-effect of the implementation).
Any solution will probably require more information about the data you are trying to guard and the configuration of your server.