Let's say we have a class that writes in a database a log message. This class is called from different parts of the code and executes again and again the same INSERT statement. It seems that is calling to use a PreparedStatement.
However I am wondering what is the right usage of it. Do I still get the benefit of using it, like the DBMS using the same execution path each time it is executed, even if I create a new PreparedStatement each time the method is called or should I have a PreparedStatement as a class member and never close it in order to re use it and get benefit from it?
Now, if the only way to obtain benefit using the PreparedStatement in this scenario is to keeping it opened as class member, may the same connection have different PreparedStatement's (with different queries) opened at the same time? What happens when two of these PreparedStatements are executed at the same time? Does the JDBC driver queue the execution of the PreparedStatements?
Thanks in advance,
Dani.
For all I know and experienced, statements don't run in parallel on one connection. And as you observed correctly, PreparedStatements are bound to the Connection they were created on.
As you probably don't want to synchronize your logging call (one insert at a time plus locking overhead), you'd have to keep the connections reserved for this logging statement.
But having a dedicated pool for only one statement seems very wasteful - don't want to do that as well.
So what options are left?
prepare the statement for every insert. As you'll have I/O operations to send data to the db, the overhead of preparing is relatively small.
prepare the statement inside your pool on creating a new connection and build a Map <Connection,PreparedStatement> to reference them later. Makes creating new connections a bit slower but allowes to recycle the statement.
Use some async way to queue your logs (JMS) and do the Insert as batch inside a message driven bean or similar
Probably some more options - but that's all I could think of right now.
Good luck with that.
Related
I know that, when used the first time, jdbc keeps somewhere the compiled prepared statement so that next time it will be accessed in a more efficient way.
Now, suppose I have this situation:
public class MyDao{
public void doQuery(){
try(PreparedStatement stmt = connection.prepareStatement(MY_STMT)){
}
}
}
Both the following snippets will keep the compiled prepared statement in memory?
Snippet 1:
MyDao dao = new MyDao();
dao.doQuery(); //first one, expensive
dao.doQuery(); //second one, less expensive as it has been already compiled
Snippet 2:
MyDao dao = new MyDao();
dao.doQuery(); //first one, expensive
MyDao dao2 = new MyDao();
dao2.doQuery(); //will it be expensive or less expensive?
I am afraid that, by creating a new dao object, the jvm will see that prepared statement as a new one and so it will not compile it.
And, if it's not the case, is there any situation in which the jvm will "forget" the compiled statement and will compile it again?
Thanks
The most basic scenario for prepared statement reuse is that your code keeps the PreparedStatement open and reuses that prepared statement. Your example code does not fit this criteria because you close the prepared statement. On the other hand trying to keep a prepared statement open for multiple method invocations is usually not a good plan because of potential concurrency problems (eg if multiple threads use the same DAO, you could be executing weird combinations of values from multiple threads, etc).
Some JDBC drivers have an (optional) cache (pool) of prepared statements internally for reuse, but that reuse will only happen if an attempt is made to prepare the same statement text again on the same physical connection. Check the documentation of your driver.
On a separate level, it is possible that the database system will cache the execution plan for a prepared statement, and it can (will) reuse that if the same statement text is prepared again (even for different connections).
You're correct it will be compiled again. PreparedStatements will only be reused if you actually use the statement itself multiple times (ie, you call executeQuery on it multiple times).
However, I wouldn't worry too much about the cost of compiling the statement. If your query takes more than a few milliseconds, the cost of compiling will be insignificant. The overhead of compiling statements only becomes apparent when doing 1000's of operations per second.
Do a benchmark. It is the best way to get some certainty about the performance difference. It is not necessarily the case that the statement is always recompiled at server side. Depending on your RDBMS, it may cache the statements previously compiled. In order to maximize the cache hit probability, submit always exactly the same parameterized SQL text and do it over the same connection.
I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.
I've been researching all around the web the most efficient way to design a connection pool and tried to analyze into details the available libraries (HikariCP, BoneCP, etc.).
Our application is a heavy-load consumer webapp and most of the time the users are working on similar business objects (thus the underlying SQL queries executed are the often the same, but still there are numerous).
It is designed to work with different DBMS (Oracle and MS SQL Server especially).
So a simplified use case would be :
User goes on a particular JSP page (e.g. Enterprise).
A corresponding Bean is created.
Each time it realizes an action (e.g. getEmployees(), computeTurnover()), the Bean asks the pool for a connection and returns it back when done.
If we want to take advantage of the Prepared Statement caching of the underlying JDBC driver (as PStatements are attached to a connection - jTDS doc.), from what I understand an optimal way of doing it would be :
Analyze what kind of SQL query a particular Bean want to execute before providing it an available connection from the pool.
Find a connection where the same prepared statement has already been executed if possible.
Serve the connection accordingly (and use the benefits of the cache/precompiled statement).
Return the connection to the pool and start over.
Am I missing an important point here (like JDBC drivers capable of reusing cached statements regardless of the connection) or is my analysis correct ?
The different sources I found state it is not possible, but why ?
For your scheme to work, you'd need to be able to get the connection that already has that statement prepared.
This falls foul on two points:
In JDBC you obtain the connection first,
Cached prepared statements (if a driver or connection pool even supports that) aren't exposed in a standardized way (if at all) nor would you be able to introspect them.
The performance overhead of finding the right connection (and the subsequent contention on the few connections that already have it prepared) would probably undo any benefit of reusing the prepared statement.
Also note that some database systems also have a serverside cache for prepared statements (meaning that it already has the plan etc available), limiting the overhead from a new prepare from the client.
If you really think the performance benefit is big enough, you should consider using a data source specific for this functionality (so it is almost guaranteed that the connection will have the statement in its cache).
A solution could be for a connection pool implementation to delay retrieving the connection from the pool until the Connection.prepareStatement() is called. At that time a connection pool would look up available connections by the SQL statement text and then play forward all the calls made before Connection.prepareStatement(). This way it would be possible to get a connection with a ready PreparedStatement without the issues other guys suggested.
In other words, when you request a connection from the pool, it would return a wrapper that logs everything until the first operation requiring DB access (such as prepareStatement() is requested.
You'd need to ask a vendor of your connection pool functionality to add this feature.
I've logged this request with C3P0:
https://github.com/swaldman/c3p0/issues/55
Hope this helps.
We all know that we should rather reuse a JDBC PreparedStatement than creating a new instance within a loop.
But how to deal with PreparedStatement reuse between different method invocations?
Does the reuse-"rule" still count?
Should I really consider using a field for the PreparedStatement or should I close and re-create the prepared statement in every invocation (keep it local)?
(Of course an instance of such a class would be bound to a Connection which might be a disadvantage in some architectures)
I am aware that the ideal answer might be "it depends".
But I am looking for a best practice for less experienced developers that they will do the right choice in most of the cases.
Of course an instance of such a class would be bound to a Connection which might be a disadvantage
Might be? it would be a huge disadvantage. You'd either need to synchronize access to it, which would kill your multi-user performance stone-dead, or create multiple instances and keep them in a pool. Major pain in the ass.
Statement pooling is the job of the JDBC driver, and most, if not all, of the current crop of drivers do this for you. When you call prepareStatement or prepareCall, the driver will handle re-use of existing resource and pre-compiled statements.
Statement objects are tied to a connection, and connections should be used and returned to the pool as quickly as possible.
In short, the standard practice of obtaining a PreparedStatement at the start of the method, using it repeatedly within a loop, then closing it at the end of the method, is best practice.
Many database workloads are CPU-bound, not IO-bound. This means that the database ends up spending more time doing work such as parsing SQL queries and figuring out how to handle them (doing the 'execution plan'), than it spends accessing the disk. This is more true of 'transactional' workloads than 'reporting' workloads, but in both cases the time spent preparing the plan may be more than you expect.
Thus it is always a good idea, if the statement is going to be executed frequently and the hassle of making (correct) arrangements to cache PreparedStatements 'between method invocations' is worth your developer time. As always with performance, measurement is key, but if you can do it cheaply enough, cache your PreparedStatement out of habit.
Some JDBC drivers and/or connection pools offer transparent 'prepared statement caching', so that you don't have to do it yourself. So long as you understand the behaviour of your particular chosen transparent caching strategy, it's fine to let it keep track of things ... what you really want to avoid is the hit on the database.
Yes it can be reused, but I believe this only counts if the same Connection object is being used and if you are using a Database Connection Pool (from within a Web Application, for example) then the Connection objects will be potentially different each time.
I always recreate the PreparedStatement before each use within a Web Application for this reason.
If you aren't using a Connection Pool then you are golden!
I don't see the difference: If I execute the same statement repeatedly against the same connection, why not reuse the PreparedStatement in any way? If multiple methods execute the same statement, then maybe that statement needs to be encapsulated in its own method (or even its own class). That way you wouldn't need to pass around a PreparedStatement.
What is the fastest option to issue stored procedures in a threaded environment in Java? According to http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html#connector-j-examples-preparecall Connection.prepareCall() is an expensive method. So what's the alternative to calling it in every thread, when synchronized access to a single CallableStatement is not an option?
The most JDBC drivers use only a single socket per connection. I think MySQL also use also a single socket. That it is a bad performance idea to share one connection between multiple threads.
If you use multiple connection between different threads then you need a CallableStatment for every connection. You need a CallabaleStatement pool for every connection. The simplest to pool it in this case is to wrap the connection class and delegate all calls to the original class. This can be create very fast with Eclipse. In the wrapped method prepareCall() you can add a simple pool. You need also a wrapped class of the CallableStatement. The close method return the CallableStatement to the pool.
But first you should check if the call is real expensive because many driver has already such poll inside. Create a loop of prepareCall() and close() and count the time.
Connection is not thread safe, so you can't share it across threads.
When you prepareCall, the JDBC driver (may) be telling the RDBMS system to do a lot of work that is stored on the server side. You may be guilty of premature optimization here.
After giving this a little thought it seems that if you are having issues with this infrastructure code then your problems are elsewhere. Most applications do not take an inordinate amount of time doing this stuff.
Make sure you are using a DataSource, most do connection caching and some even do caching of statements.
Also for this to be a performance bottle neck it would imply that you are doing many queries one after the other, or that your pool of connections is too small. Maybe you should do some benchmarking on your code to see how much time the stored proc is taking vs how much time the JDBC code is taking.
Of course I would follow the MySQL recommendation of using CallableStatement, I am sure they have benchmarked this. Most apps do not share anything between Threads and it is rarely an issue.