statement.getGeneratedKeys() and MySQL

statement.getGeneratedKeys() and MySQL - java

I've learned it the hard way that last_insert_id in mysql is not pool-safe. i.e. if you are pooling connections, you'll get messed up insert_ids.
How does java's statement.getGeneratedKeys() get the key on inserts? Is it pool-safe?

I am quoting the relevant text from the MySQL Connector/J internals here:
You should be aware, that at times, it
can be tricky to use the 'SELECT
LAST_INSERT_ID()' query, as that
function's value is scoped to a
connection. So, if some other query
happens on the same connection, the
value will be overwritten. On the
other hand, the 'getGeneratedKeys()'
method is scoped by the Statement
instance, so it can be used even if
other queries happen on the same
connection, but not on the same
Statement instance.

Related

Ways to get mysql(innodb) AUTO_INCREMENT Column, JDBC/getGeneratedKeys()/last_insert_id (in OkPacket) and LAST_INSERT_ID()

According to mysql doc:
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-usagenotes-last-insert-id.html
At times, it can be tricky to use the SELECT LAST_INSERT_ID() query,
as that function's value is scoped to a connection. So, if some other
query happens on the same connection, the value is overwritten. On the
other hand, the getGeneratedKeys() method is scoped by the Statement
instance, so it can be used even if other queries happen on the same
connection, but not on the same Statement instance.
First, I consider LAST_INSERT_ID().
The SQL function LAST_INSERT_ID() is connection safe, but not session/transaction/statement safe. It can't be used in production, because in real environment multiple session/transaction/statement in one connection is very common.
Then getGeneratedKeys() using JDBC. When I'm using getGeneratedKeys() in Java. I want to see what it does in database. I try to track the SQL statement with the following statements after a simple insert into a demo table with auto increase primary key using JDBC:
SET GLOBAL log_output = 'TABLE';
SET GLOBAL general_log = 'ON';
SELECT * FROM mysql.general_log;
I'm sure the new row is correctly inserted and getGeneratedKeys() brings the auto-incremented id back. However, I find nothing but just an insert statement which JDBC executed before, and some static data like "SELECT database(),version()...".
Now, conclusion is, getGeneratedKeys() doesn't execute any SQL statement to get auto-incremented id. Then I find another possibility, I debug into call stacks, see JDBC get auto-incremented id from an object called OkPacket. It has a property called last_insert_id. Here I find it finally.
My questions are:
Is there really no way to get a STATEMENT SAFE (at least transaction safe) auto-incremented id using pure SQL statement (without JDBC)?
How does OkPacket work under hood? How does it get a statement safe auto increased id? Maybe it calls some low level C function in MySQL driver or MySQL server/client protocol?

MySQL has an API that clients use to communicate commands and get results.
Actually, it has two forms of this API. One is called the "SQL protocol" in which statements are sent as strings like SELECT * FROM mytable, etc. The other form is called the "binary protocol" where commands are sent using some byte that the server recognizes, even though they are not human-readable strings.
Some commands can be executed by either the SQL protocol or the binary protocol.
For example, START TRANSACTION, COMMIT, PREPARE... there are textual SQL statements for these commands, but there are also non-textual ways for the API to invoke these commands.
You can certainly query SELECT LAST_INSERT_ID(); and get the most recent generated id, but only the most recent. Another INSERT statement will overwrite this value, as you read.
The OkPacket is filled in by the binary protocol. That is, the MySQL Server returns an OkPacket with several pieces of metadata about any statement execution.
See https://github.com/mysql/mysql-connector-j/blob/release/8.0/src/main/protocol-impl/java/com/mysql/cj/protocol/a/result/OkPacket.java#L55
The Ok Packet includes the following:
Affected row count (if any)
Last insert id (if any)
Status flags
Warning count (if any)
String with error message (if any)
The MySQL Server code that documents the OK Packet is unusually thorough with examples:
https://github.com/mysql/mysql-server/blob/8.0/sql/protocol_classic.cc#L665-L838
There's no way to fetch the OK Packet for earlier SQL statements. The client must save the result immediately after a statement execution. In object-oriented code such as the JDBC driver, it makes sense to store that in the NativeResultset object: https://github.com/mysql/mysql-connector-j/blob/release/8.0/src/main/protocol-impl/java/com/mysql/cj/protocol/a/result/NativeResultset.java#L77-L82

Java guidance/tips to safe dao

I have a web application that needs a database back-end.
My back-end is really small (max 4 tables) and the SQL operations are not that much.
So I decided that some robust ORM solution is like hitting a moschito with a hummer and I am going just to do a little DAO pattern so that the code is more clean (instead of hitting the db directly with sql commands).
So far it works but I am not sure that I haven't stepped into a pittfall without knowing.
I use Tomcat's connection pool and I expect concurrent access to the database.
My question is related to concurrency and the use of the java sql objects.
Example:
I do the following:
do a query
get a result set and use that to build an object (dto)
building this object I do a new sql query (using the same connection
and having the previous resultset open)
Is this correct/safe?
Also can I reuse the same connection in a re-entrant manner?
I assume it is no problem to use it via multiple threads right?
Generally any tips/guide to get in the right track is welcome

Regarding the connections, as long as you use the connection pool you are guaranteeing that each thread takes its own connection, so from that point of wiew, there is no problem in your approach in a multithreaded environment (you can check Is java.sql.Connection thread safe?).
With respect to the ResultSet and the second query you are performing, you must take into account that a ResultSet maintains a cursor pointing to its current row of data. So the key point in your question is if you are using the same "SELECT statement", because of in that case, you could get the same cursor attributes and some problems may arise.
Check ResultSet's javadoc, especially this sentence:
A ResultSet object is automatically closed when the Statement object that generated it is closed, re-executed, or used to retrieve the next result from a sequence of multiple results.
and How can I avoid ResultSet is closed exception in Java?.

SQL multi-command atomicity question

I am trying to create a program that updates 2 different tables using sql commands. The only thing I am worried about is that if the program updates one of the tables and then loses connection or whatever and does NOT update the other table there could be an issue. Is there a way I could either
A. Update them at the exact same time
or
B. Revert the first update if the second one fails.

Yes use a SQL transaction. Here is the tutorial:JDBC Transactions

Depending on the database, I'd suggest using a stored procedure or function based on the operations involved. They're supported by:
MySQL
Oracle
SQL Server
PostgreSQL
These encapsulate a database transaction (atomic in nature -- it either happens, or it doesn't at all), without the extra weight of sending the queries over the line to the database... Because they already exist on the database, the queries are parameterized (safe from SQL injection attacks) which means less data is sent -- only the parameter values.

Most SQL servers support transactions, that is, queueing up a set of actions and then having them happen atomically. To do this, you wrap your queries as such:
START TRANSACTION;
*do stuff*
COMMIT;
You can consult your server's documentation for more information about what additional features it supports. For example, here is a more detailed discussion of transactions in MySQL.

Java JDBC Lazy-Loaded ResultSet

Is there a way to get a ResultSet you obtain from running a JDBC query to be lazily-loaded? I want each row to be loaded as I request it and not beforehand.

Short answer:
Use Statement.setFetchSize(1) before calling executeQuery().
Long answer:
This depends very much on which JDBC driver you are using. You might want to take a look at this page, which describes the behavior of MySQL, Oracle, SQL Server, and DB2.
Major take-aways:
Each database (i.e. each JDBC driver) has its own default behavior.
Some drivers will respect setFetchSize() without any caveats, whereas others require some "help".
MySQL is an especially strange case. See this article. It sounds like if you call setFetchSize(Integer.MIN_VALUE), then it will download the rows one at a time, but it's not perfectly clear.
Another example: here's the documentation for the PostgreSQL behavior. If auto-commit is turned on, then the ResultSet will fetch all the rows at once, but if it's off, then you can use setFetchSize() as expected.
One last thing to keep in mind: these JDBC driver settings only affect what happens on the client side. The server may still load the entire result set into memory, but you can control how the client downloads the results.

Could you not achieve this by setting the fetch size for your Statement to 1?
If you only fetch 1 row at a time each row shouldn't be loaded until you called next() on the ResultSet.
e.g.
Statement statement = connection.createStatement();
statement.setFetchSize(1);
ResultSet resultSet = statement.executeQuery("SELECT .....");
while (resultSet.next())
{
// process results. each call to next() should fetch the next row
}

There is an answer provided here.
Quote:
The Presto JDBC driver never buffers the entire result set in memory. The server API will return at most ~1MB of data to the driver per request. The driver will not request more data from the server until that data is consumed (by calling the next() method on ResultSet an appropriate number of times).
Because of how the server API works, the driver fetch size is ignored (per the JDBC specification, it is only a hint).
Prove that the setFetchSize is ignored

I think what you would want to do is defer the actually loading of the ResultSet itself. You would need to implement that manually.

You will find this a LOT easier using hibernate. You will basically have to roll-your-own if you are using jdbc directly.
The fetching strategies in hibernate are highly configurable, and will most likely offer performance options you weren't even aware of.

PreparedStatements and performance

So I keep hearing that PreparedStatements are good for performance.
We have a Java application in which we use the regular 'Statement' more than we use the 'PreparedStatement'. While trying to move towards using more PreparedStatements, I am trying to get a more thorough understanding of how PreparedStatements work - on the client side and the server side.
So if we have some typical CRUD operations and update an object repeatedly in the application, does it help to use a PS? I understand that we will have to close the PS every time otherwise it will result in a cursor leak.
So how does it help with performance? Does the driver cache the precompiled statement and give me a copy the next time I do connection.prepareStatement? Or does the DB server help?
I understand the argument about the security benefits of PreparedStatements and I appreciate the answers below which emphasize it. However I really want to keep this discussion focused on the performance benefits of PreparedStatements.
Update: When I say update data, I really mean more in terms of that method randomly being called several times. I understand the advantage in the answer offered below which asks to re-use the statement inside a loop.
// some code blah blah
update();
// some more code blah blah
update();
....
public void update () throws SQLException{
try{
PreparedStatement ps = connection.prepareStatement("some sql");
ps.setString(1, "foobar1");
ps.setString(2, "foobar2");
ps.execute();
}finally {
ps.close();
}
}
There is no way to actually reuse the 'ps' java object and I understand that the actual connection.prepareStatement call is quite expensive.
Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
I should also mention that we support several databases.
Thanks in advance.

The notion that prepared statements are primarily about performance is something of a misconception, although it's quite a common one.
Another poster mentioned that he noted a speed improvement of about 20% in Oracle and SQL Server. I've noted a similar figure with MySQL. It turns out that parsing the query just isn't such a significant part of the work involved. On a very busy database system, it's also not clear that query parsing will affect overall throughput: overall, it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk.
So as a reason for using prepared statements, the protection against SQL injection attacks far outweighs the performance improvement. And if you're not worried about SQL injection attacks, you probably should be...

Prepared statements can improve performance when re-using the same statement that you prepared:
PreparedStatement ps = connection.prepare("SOME SQL");
for (Data data : dataList) {
ps.setInt(1, data.getId());
ps.setString(2, data.getValue();
ps.executeUpdate();
}
ps.close();
This is much faster than creating the statement in the loop.
Some platforms also cache prepared statements so that even if you close them they can be reconstructed more quickly.
However even if the performance were identical you should still use prepared statements to prevent SQL Injection. At my company this is an interview question; get it wrong and we might not hire you.

Prepared statements are indeed cached after their first use, which is what they provide in performance over standard statements. If your statement doesn't change then it's advised to use this method. They are generally stored within a statement cache for alter use.
More info can be found here:
http://www.theserverside.com/tt/articles/article.tss?l=Prepared-Statments
and you might want to look at Spring JDBCTemplate as an alternative to using JDBC directly.
http://static.springframework.org/spring/docs/2.0.x/reference/jdbc.html

Parsing the SQL isn't the only thing that's going on. There's validating that the tables and columns do indeed exist, creating a query plan, etc. You pay that once with a PreparedStatement.
Binding to guard against SQL injection is a very good thing, indeed. Not sufficient, IMO. You still should validate input prior to getting to the persistence layer.

So how does it help with performance? Does the driver cache the
precompiled statement and give me a copy the next time I do
connection.prepareStatement? Or does the DB server help?
I will answer in terms of performance. Others here have already stipulated that PreparedStatements are resilient to SQL injection (blessed advantage).
The application (JDBC Driver) creates the PreparedStatement and passes it to the RDBMS with placeholders (the ?). The RDBMS precompiles, applying query optimization (if needed) of the received PreparedStatement and (in some) generally caches them. During execution of the PreparedStatement, the precompiled PreparedStatement is used, replacing each placeholders with their relevant values and calculated. This is in contrast to Statement which compiles it and executes it directly, the PreparedStatement compiles and optimizes the query only once. Now, this scenario explained above is not an absolute case by ALL JDBC vendors but in essence that's how PreparedStatement are used and operated on.

Anecdotally: I did some experiments with prepared vs. dynamic statements using ODBC in Java 1.4 some years ago, with both Oracle and SQL Server back-ends. I found that prepared statements could be as much as 20% faster for certain queries, but there were vendor-specific differences regarding which queries were improved to what extent. (This should not be surprising, really.)
The bottom line is that if you will be re-using the same query repeatedly, prepared statements may help improve performance; but if your performance is bad enough that you need to do something about it immediately, don't count on the use of prepared statements to give you a radical boost. (20% is usually nothing to write home about.)
Your mileage may vary, of course.

Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
Yes at least with Oracle. Per Oracle® Database JDBC Developer's Guide Implicit Statement Caching (emphasis added),
When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.
Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string. You cannot use implicit Statement caching with OracleStatement. When you create an OraclePreparedStatement or OracleCallableStatement, the JDBC driver automatically searches the cache for a matching statement.

1. PreparedStatement allows you to write dynamic and parametric query
By using PreparedStatement in Java you can write parametrized sql queries and send different parameters by using same sql queries which is lot better than creating different queries.
2. PreparedStatement is faster than Statement in Java
One of the major benefits of using PreparedStatement is better performance. PreparedStatement gets pre compiled
In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation
3. PreparedStatement prevents SQL Injection attacks in Java
Read more: http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html#ixzz3LejuMnVL

Short answer:
PreparedStatement helps performance because typically DB clients perform the same query repetitively, and this makes it possible to do some pre-processing for the initial query to speed up the following repetitive queries.
Long answer:
According to Wikipedia, the typical workflow of using a prepared statement is as follows:
Prepare: The statement template is created by the application and sent
to the database management system (DBMS). Certain values are left
unspecified, called parameters, placeholders or bind variables
(labelled "?" below): INSERT INTO PRODUCT (name, price) VALUES (?, ?)
(Pre-compilation): The DBMS parses, compiles, and performs query optimization on the
statement template, and stores the result without executing it.
Execute: At a later time, the application supplies (or binds) values
for the parameters, and the DBMS executes the statement (possibly
returning a result). The application may execute the statement as many
times as it wants with different values. In this example, it might
supply 'Bread' for the first parameter and '1.00' for the second
parameter.
Prepare:
In JDBC, the "Prepare" step is done by calling java.sql.Connection.prepareStatement(String sql) API. According to its Javadoc:
This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed. This has no direct effect on users; however, it does affect which methods throw certain SQLException objects.
Since calling this API may send the SQL statement to database, it is an expensive call typically. Depending on JDBC driver's implementation, if you have the same sql statement template, for better performance, you may have to avoiding calling this API multiple times in client side for the same sql statement template.
Precompilation:
The sent statement template will be pre-compiled on database and cached in db server. The database will probably use the connection and sql statement template as the key, and the pre-compiled query and the computed query plan as value in the cache. Parsing query may need to validate table, columns to be queried, so it could be an expensive operation, and computation of query plan is an expensive operation too.
Execute:
For following queries from the same connection and sql statement template, the pre-compiled query and query plan will be looked up directly from cache by database server without re-computation again.
Conclusion:
From performance perspective, using prepare statement is a two-phase process:
Phase 1, prepare-and-precompilation, this phase is expected to be
done once and add some overhead for the performance.
Phase 2,
repeated executions of the same query, since phase 1 has some pre
processing for the query, if the number of repeating query is large
enough, this can save lots of pre-processing effort for the same
query.
And if you want to know more details, there are some articles explaining the benefits of PrepareStatement:
http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

Prepared statements have some advantages in terms of performance with respect to normal statements, depending on how you use them. As someone stated before, if you need to execute the same query multiple times with different parameters, you can reuse the prepared statement and pass only the new parameter set. The performance improvement depends on the specific driver and database you are using.
As instance, in terms of database performance, Oracle database caches the execution plan of some queries after each computation (this is not true for all versions and all configuration of Oracle). You can find improvements even if you close a statement and open a new one, because this is done at RDBMS level. This kind of caching is activated only if the two subsequent queries are (char-by-char) the same. This does not holds for normal statements because the parameters are part of the query and produce different SQL strings.
Some other RDBMS can be more "intelligent", but I don't expect they will use complex pattern matching algorithms for caching the execution plans because it would lower performance. You may argue that the computation of the execution plan is only a small part of the query execution. For the general case, I agree, but.. it depends. Keep in mind that, usually, computing an execution plan can be an expensive task, because the rdbms needs to consult off-memory data like statistics (not only Oracle).
However, the argument about caching range from execution-plans to other parts of the extraction process. Giving to the RDBMS multiple times the same query (without going in depth for a particular implementation) helps identifying already computed structures at JDBC (driver) or RDBMS level. If you don't find any particular advantage in performance now, you can't exclude that performance improvement will be implemented in future/alternative versions of the driver/rdbms.
Performance improvements for updates can be obtained by using prepared statements in batch-mode but this is another story.

Ok finally there is a paper that tests this, and the conclusion is that it doesn't improve performance, and in some cases its slower:
https://ieeexplore.ieee.org/document/9854303
PDF: https://www.bib.irb.hr/1205158/download/1205158.Performance_analysis_of_SQL_Prepared_Statements_in_CRUD_operations_final.pdf

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.