Related
I have a application which needs to aware of latest number of some records from a table from database, the solution should be applicable without changing the database code or add triggers or functions to it ,so I need a database vendor independent solution.
My program written in java but database could be (SQLite,MySQL,PostgreSQL or MSSQL),for now I'm doing Like that:
In a separate thread that is set as a daemon my application sends a simple command through JDBC to database to be aware of latest number of the records with condition:
while(true){
SELECT COUNT(*) FROM Mytable WHERE exited='1'
}
and this sort of coding causes DATABASE To lock,slows down the whole system and generates huge DB Logs which finally brings down the whole thing!
how can i do it in a right way to always have latest number of certain records or only counting when the number changed?
A SELECT statement should not -- by itself -- have the behavior that you are describing. For instance, nothing is logged with a SELECT. Now, it is possible that concurrent insert/update/delete statements are going on, and that these cause problems because the SELECT locks the table.
Two general things you can do:
Be sure that the comparison is of the same type. So, if exited is a number, do not use single quotes (mixing of types can confuse some databases).
Create an index on (exited). In basically all databases, this is a single command: create index idx_mytable_exited on mytable(exited).
If locking and concurrent transactions are an issue, then you will need to do more database specific things, to avoid that problem.
As others have said, make sure that exited is indexed.
Also, you can set the transaction isolation on your query to do a "dirty read"; this indicates to the database server that you do not need to wait for other processes' transactions to commit, and instead you wish to read the current value of exited on rows that are being updated by those other processes.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED is the standard syntax for using "dirty read".
I have a database log appender that inserts a variable number of log lines into the database every once in a while.
I'd like to create an SQL statement in a way that prevents SQL injection, but not using server-side prepared statements (because I have a variable number of rows in every select, caching them won't help but might hurt performance here).
I also like the convenience of prepared statments, and prefer them to string concatination. Is there something like a 'client side prepared statement' ?
It sounds like you haven't benchmarked the simplest solution - prepared statements. You say that they "might hurt performance" but until you've tested it, you really won't know.
I would definitely test prepared statements first. Even if they do hamper performance slightly, until you've tested them you won't know whether you can still achieve the performance you require.
Why spend time trying to find alternative solutions when you haven't tried the most obvious one?
If you find that prepared statement execution plan caching is costly, you may well find there are DB-specific ways of tuning or disabling it.
Not sure if I understand your question correctly. Is there something in PreparedStatement that isn't fitting your needs?
I think that whether or not the statement is cached on the server side is an implementation detail of the database driver and the specific database you're using; if your query/statement changes over time than this should have no impact - the cached/compiled statements simply won't be used.
First, Jon's answer that you should go with the most obvious solution until performance is measured to be a problem is certainly the right approach in general.
I don't think your performance concerns are misplaced. I have certainly seen precompiled complex statements fail dramatically on the performance scale (on MS-SQL 2000). The reason is the statement was so complex that it had several potential execution paths depending on the parameters, but the compilation locked one in for one set of parameters, and the next set of parameters were too slow, whereas a recompile would force a recalculation of the execution plan more appropriate for the different set of parameters.
But that concern is very far fetched until you see it in practice.
The underlying problem here is that parameter escaping is database specific, so unless the JDBC driver for your database is giving you something non-standard to do this (highly unlikely), you are going to have to have a different library or different escaping mechanism that is very specific to this one database.
From the wording of your question, it doesn't sound like your performance concerns have yet come to the point of meriting finding (or developing) such a solution.
It should also be noted that although JDBC drivers may not all behave this way, technically according to the spec the precompilation is supposed to be cached in the PreparedStatement object, and if you throw that away and get a new PreparedStatement every time, it should not actually be caching anything, so the whole issue may be mute and would need to be investigated for your specific JDBC driver.
From the spec:
A SQL statement with or without IN parameters can be pre-compiled and stored in a PreparedStatement object. This object can then be used to efficiently execute this statement multiple times.
what's wrong with using a regular prepared statement e.g. in the following pseudocode:
DatabaseConnection connection;
PreparedStatement insertStatement = ...;
...
connection.beginTransaction();
for (Item item : items)
{
insertStatement.setParameter(1, item);
insertStatement.execute();
}
connection.commitTransaction();
A smart database implementation will batch up several inserts into one communications exchange w/ the database server.
I can't think of a reason why you shouldn't use prepared statements. If you're running this on a J2EE server using connection pooling the server keeps your connections open, and the server caches your access/execution plans. It's not the data it caches!
If you're closing your connection every time, then you're probably not gaining any performance. But you still get the SQL injection prevention
Most java performance tuning books will tell you the same:
Java performance tuning
Prepared Statements don't care about client or server side.
Use them and drop any SQL string concatenation. There is not a single reason to not use Prepared Statements.
Let's presume that you are writing an application for a retail store chain. So, you would design your object model such that you would define 'Store' as the core business object and lots of supporting objects. Let's say 'Store' looks like follows:
class Store implements Validatable{
int storeNo;
int storeName;
... etc....
}
So, your client tells you that you have to import store schedule from a excel sheet into the application and you would have to run a series of validations on 'em. For instance, 'StoreIsInSameCountry';'StoreIsValid'... etc. So, you would design a Rule interface for checking all business conditions. Something like this:
interface Rule T extends Validatable> {
public Error check(T value) throws Exception;
}
Now, here comes the question. I am uploading 2000 stores from this excel sheet. So, I would end up running each rule defined for a store that many times. If I were to have 4 rules = 8000 queries to the database, i.e, 16000 hits to the connection pool. For a simple check where I would just have to check whether the store exists or not, the query would be:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID = ?
That way I would obtain get my 'Store' object. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
Alternatively, I could just do:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID in (1,2,3..... )
This query would actually return much faster than doing the one above it 2000 times.
However, it doesn't go well with the design that a Rule can be run for a single store only.
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
What would you do if you were in my shoes, and what is the best practice?
That way I would obtain get my 'Store' object from the database. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
This is what you should not do.
Create a temporary table, fill the table with your values and JOIN this table, like this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM temptable tt
JOIN STORE s
ON s.STORE_ID = t.id
or this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM STORE s
WHERE s.STORE_ID IN
(
SELECT id
FROM temptable tt
)
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
IN filters duplicates out.
If you want each eligible row to be selected for each duplicate value in the list, use JOIN.
IN is in no way a "not suggested methology".
In fact, there was a time when some databases did not support IN queries effciently, that's why folk wisdom still advices against using it.
But if your store_id is indexed properly (and it most probably is, if it's a PRIMARY KEY which it looks like), then all modern versions of major databases (that is Oracle, SQL Server, MySQL and PostgreSQL) will use an efficient plan to perform this query.
See this article in my blog for performance details in SQL Server:
IN vs. JOIN vs. EXISTS
Note, that in a properly designed database, validation rules are also set-based.
I. e. you implement your validation rules as queries against the temptable.
However, to support legacy rules, you can select values from temptable row-by-agonizing-row, apply the rules, and delete values which did not pass validation.
SELECT store_id FROM store WHERE store_active = 1
or even
SELECT store_id FROM store
will tell you all the active stores in a single query. You can now conduct the other tests on stores you know to exist, and you've saved yourself 1,999 hits to the database.
If you've got relatively uncontested database access, and no time constraint on how long the whole thing is going to take then you've no real need to worry about hitting the connection pool over and over again. That's what it's designed for, after all!
I think it's more of a business question with parameter of how often does the client run the import, how long would it take for you to implement either of the solution, and how expensive is your time per hour.
If it's something that runs once in a while, a bit of bad performance is acceptable in my opinion, especially if you can get the job done quick using clean code.
...a Rule can be run for a single store only.
Managing business rules along with performance is a tricky task, so there is a library ("Persistence Layer") that does exactly that. You define rules, then execute a bulk of commands, then the library fetch from DB whatever the rules require in a single query (by using temp tables rather than 'IN') and then passes it to the rules.
There is an example of a validator in here.
A hobby project of mine is a Java web application. It's a simple web page with a form. The user fills out the form, submits, and is presented with some results.
The data is coming over a JDBC Connection. When the user submits, I validate the input, build a "CREATE ALIAS" statement, a "SELECT" statement, and a "DROP ALIAS" statement. I execute them and do whatever I need to do with the ResultSet from the query.
Due to an issue with the ALIASes on the particular database/JDBC combination I'm using, it's required that each time the query is run, these ALIASes are created with a unique name. I'm using an int to ensure this which gets incremented each and every time we go to the database.
So, my data access class looks a bit like:
private final static Connection connection = // initialized however
private static int uniqueInvocationNumber = 0;
public static Whatever getData(ValidatedQuery validatedQuery) {
String aliasName = "TEMPALIAS" + String.valueOf(uniqueInvocationNumber);
// build statements, execute statements, deal with results
uniqueInvocationNumber++;
}
This works. However, I've recently been made aware that I'm firmly stuck in Jon Skeet's phase 0 of threading knowledge ("Complete ignorance - ignore any possibility of problems.") - I've never written either threaded code or thread-aware code. I have absolutely no idea what can happen when many users are using the application at the same time.
So my question is, (assuming I haven't stumbled to thread-safety by blind luck / J2EE magic):
How can I make this safe?
I've included information here which I believe is relevant but let me know if it's not sufficient.
Thanks a million.
EDIT: This is a proper J2EE web application using the Wicket framework. I'm typically deploying it inside Jetty.
EDIT: A long story about the motivation for the ALIASes, for those interested:
The database in question is DB2 on AS400 (i5, System i, iSeries, whatever IBM are calling it these days) and I'm using jt400.
Although DB2 on AS400 is kind of like DB2 on any other platform, tables have a concept of a "member" because of legacy stuff. A member is kind of like a chunk of a table. The query I want to run is
SELECT thisField FROM thisTable(thisMember)
which treats thisMember as a table in its own right so just gives you thisField for all the rows in the member.
Now, queries such as this run fine in an interactive SQL session, but don't work over JDBC (I don't know why). The workaround I use is to do something like
CREATE ALIAS tempAlias FOR thisTable(thisMember)
then a
SELECT thisField FROM tempAlias
then a
DROP ALIAS tempAlias
which works but for one show-stopping issue: when you do this repeatedly with the ALIAS always called "tempAlias", and have a case where thisField has a different length from one query to the next, the result set comes back garbled for the second query (getString for the first row is fine, the next one has a certain number of spaces prepended, the next one the same number of spaces further prepended - this is from memory, but it's something like that).
Hence the workaround of ensuring each ALIAS has a distinct name which clears this up.
I've just realised (having spent the time to tap this explanation out) that I probably didn't spend enough time thinking about the issue in the first place before seizing on the workaround. Unfortunately I haven't yet fulfilled my dream of getting an AS400 for my bedroom ;) so I can't try anything new now.
Well, I'm going to ignore any SQL stuff for the moment and just concentrate on the uniqueInvocationNumber part. There are two problems here:
There's no guarantee that the thread will see the latest value at any particular point
The increment isn't atomic
The simplest way to fix this in Java is to use AtomicInteger:
private static final AtomicInteger uniqueInvocationNumber = new AtomicInteger();
public static Whatever getData(ValidatedQuery validatedQuery) {
String aliasName = "TEMPALIAS" + uniqueInvocationNumber.getAndIncrement()
// build statements, execute statements, deal with results
}
Note that this still assumes you're only running a single instance on a single server. For a home project that's probably a reasonable assumption :)
Another potential problem is sharing a single connection amongst different threads. Typically a better way of dealing with database connections is to use a connection pool, and "open/use/close" a connection where you need to (closing the connection in a finally block).
If that static variable and the incrementing of the unique invocation number is visible to all requests, I'd say that it's shared state that needs to be synchronized.
I know this doesn't answer your question but I would seriously consider re-implementing the feature so creating all those aliases isn't required. (Could you explain what kind of alias you're creating and why it's necessary?)
I understand this is just a hoby project, but consider putting on your 'to do list' to switch to using a connection pool. It's all part of the learning which I guess is part of your motivation for doing this project. Connection pools are the proper way to deal with multiple simultaneous users in a database backed web-app.
So I keep hearing that PreparedStatements are good for performance.
We have a Java application in which we use the regular 'Statement' more than we use the 'PreparedStatement'. While trying to move towards using more PreparedStatements, I am trying to get a more thorough understanding of how PreparedStatements work - on the client side and the server side.
So if we have some typical CRUD operations and update an object repeatedly in the application, does it help to use a PS? I understand that we will have to close the PS every time otherwise it will result in a cursor leak.
So how does it help with performance? Does the driver cache the precompiled statement and give me a copy the next time I do connection.prepareStatement? Or does the DB server help?
I understand the argument about the security benefits of PreparedStatements and I appreciate the answers below which emphasize it. However I really want to keep this discussion focused on the performance benefits of PreparedStatements.
Update: When I say update data, I really mean more in terms of that method randomly being called several times. I understand the advantage in the answer offered below which asks to re-use the statement inside a loop.
// some code blah blah
update();
// some more code blah blah
update();
....
public void update () throws SQLException{
try{
PreparedStatement ps = connection.prepareStatement("some sql");
ps.setString(1, "foobar1");
ps.setString(2, "foobar2");
ps.execute();
}finally {
ps.close();
}
}
There is no way to actually reuse the 'ps' java object and I understand that the actual connection.prepareStatement call is quite expensive.
Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
I should also mention that we support several databases.
Thanks in advance.
The notion that prepared statements are primarily about performance is something of a misconception, although it's quite a common one.
Another poster mentioned that he noted a speed improvement of about 20% in Oracle and SQL Server. I've noted a similar figure with MySQL. It turns out that parsing the query just isn't such a significant part of the work involved. On a very busy database system, it's also not clear that query parsing will affect overall throughput: overall, it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk.
So as a reason for using prepared statements, the protection against SQL injection attacks far outweighs the performance improvement. And if you're not worried about SQL injection attacks, you probably should be...
Prepared statements can improve performance when re-using the same statement that you prepared:
PreparedStatement ps = connection.prepare("SOME SQL");
for (Data data : dataList) {
ps.setInt(1, data.getId());
ps.setString(2, data.getValue();
ps.executeUpdate();
}
ps.close();
This is much faster than creating the statement in the loop.
Some platforms also cache prepared statements so that even if you close them they can be reconstructed more quickly.
However even if the performance were identical you should still use prepared statements to prevent SQL Injection. At my company this is an interview question; get it wrong and we might not hire you.
Prepared statements are indeed cached after their first use, which is what they provide in performance over standard statements. If your statement doesn't change then it's advised to use this method. They are generally stored within a statement cache for alter use.
More info can be found here:
http://www.theserverside.com/tt/articles/article.tss?l=Prepared-Statments
and you might want to look at Spring JDBCTemplate as an alternative to using JDBC directly.
http://static.springframework.org/spring/docs/2.0.x/reference/jdbc.html
Parsing the SQL isn't the only thing that's going on. There's validating that the tables and columns do indeed exist, creating a query plan, etc. You pay that once with a PreparedStatement.
Binding to guard against SQL injection is a very good thing, indeed. Not sufficient, IMO. You still should validate input prior to getting to the persistence layer.
So how does it help with performance? Does the driver cache the
precompiled statement and give me a copy the next time I do
connection.prepareStatement? Or does the DB server help?
I will answer in terms of performance. Others here have already stipulated that PreparedStatements are resilient to SQL injection (blessed advantage).
The application (JDBC Driver) creates the PreparedStatement and passes it to the RDBMS with placeholders (the ?). The RDBMS precompiles, applying query optimization (if needed) of the received PreparedStatement and (in some) generally caches them. During execution of the PreparedStatement, the precompiled PreparedStatement is used, replacing each placeholders with their relevant values and calculated. This is in contrast to Statement which compiles it and executes it directly, the PreparedStatement compiles and optimizes the query only once. Now, this scenario explained above is not an absolute case by ALL JDBC vendors but in essence that's how PreparedStatement are used and operated on.
Anecdotally: I did some experiments with prepared vs. dynamic statements using ODBC in Java 1.4 some years ago, with both Oracle and SQL Server back-ends. I found that prepared statements could be as much as 20% faster for certain queries, but there were vendor-specific differences regarding which queries were improved to what extent. (This should not be surprising, really.)
The bottom line is that if you will be re-using the same query repeatedly, prepared statements may help improve performance; but if your performance is bad enough that you need to do something about it immediately, don't count on the use of prepared statements to give you a radical boost. (20% is usually nothing to write home about.)
Your mileage may vary, of course.
Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
Yes at least with Oracle. Per Oracle® Database JDBC Developer's Guide Implicit Statement Caching (emphasis added),
When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.
Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string. You cannot use implicit Statement caching with OracleStatement. When you create an OraclePreparedStatement or OracleCallableStatement, the JDBC driver automatically searches the cache for a matching statement.
1. PreparedStatement allows you to write dynamic and parametric query
By using PreparedStatement in Java you can write parametrized sql queries and send different parameters by using same sql queries which is lot better than creating different queries.
2. PreparedStatement is faster than Statement in Java
One of the major benefits of using PreparedStatement is better performance. PreparedStatement gets pre compiled
In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation
3. PreparedStatement prevents SQL Injection attacks in Java
Read more: http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html#ixzz3LejuMnVL
Short answer:
PreparedStatement helps performance because typically DB clients perform the same query repetitively, and this makes it possible to do some pre-processing for the initial query to speed up the following repetitive queries.
Long answer:
According to Wikipedia, the typical workflow of using a prepared statement is as follows:
Prepare: The statement template is created by the application and sent
to the database management system (DBMS). Certain values are left
unspecified, called parameters, placeholders or bind variables
(labelled "?" below): INSERT INTO PRODUCT (name, price) VALUES (?, ?)
(Pre-compilation): The DBMS parses, compiles, and performs query optimization on the
statement template, and stores the result without executing it.
Execute: At a later time, the application supplies (or binds) values
for the parameters, and the DBMS executes the statement (possibly
returning a result). The application may execute the statement as many
times as it wants with different values. In this example, it might
supply 'Bread' for the first parameter and '1.00' for the second
parameter.
Prepare:
In JDBC, the "Prepare" step is done by calling java.sql.Connection.prepareStatement(String sql) API. According to its Javadoc:
This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed. This has no direct effect on users; however, it does affect which methods throw certain SQLException objects.
Since calling this API may send the SQL statement to database, it is an expensive call typically. Depending on JDBC driver's implementation, if you have the same sql statement template, for better performance, you may have to avoiding calling this API multiple times in client side for the same sql statement template.
Precompilation:
The sent statement template will be pre-compiled on database and cached in db server. The database will probably use the connection and sql statement template as the key, and the pre-compiled query and the computed query plan as value in the cache. Parsing query may need to validate table, columns to be queried, so it could be an expensive operation, and computation of query plan is an expensive operation too.
Execute:
For following queries from the same connection and sql statement template, the pre-compiled query and query plan will be looked up directly from cache by database server without re-computation again.
Conclusion:
From performance perspective, using prepare statement is a two-phase process:
Phase 1, prepare-and-precompilation, this phase is expected to be
done once and add some overhead for the performance.
Phase 2,
repeated executions of the same query, since phase 1 has some pre
processing for the query, if the number of repeating query is large
enough, this can save lots of pre-processing effort for the same
query.
And if you want to know more details, there are some articles explaining the benefits of PrepareStatement:
http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Prepared statements have some advantages in terms of performance with respect to normal statements, depending on how you use them. As someone stated before, if you need to execute the same query multiple times with different parameters, you can reuse the prepared statement and pass only the new parameter set. The performance improvement depends on the specific driver and database you are using.
As instance, in terms of database performance, Oracle database caches the execution plan of some queries after each computation (this is not true for all versions and all configuration of Oracle). You can find improvements even if you close a statement and open a new one, because this is done at RDBMS level. This kind of caching is activated only if the two subsequent queries are (char-by-char) the same. This does not holds for normal statements because the parameters are part of the query and produce different SQL strings.
Some other RDBMS can be more "intelligent", but I don't expect they will use complex pattern matching algorithms for caching the execution plans because it would lower performance. You may argue that the computation of the execution plan is only a small part of the query execution. For the general case, I agree, but.. it depends. Keep in mind that, usually, computing an execution plan can be an expensive task, because the rdbms needs to consult off-memory data like statistics (not only Oracle).
However, the argument about caching range from execution-plans to other parts of the extraction process. Giving to the RDBMS multiple times the same query (without going in depth for a particular implementation) helps identifying already computed structures at JDBC (driver) or RDBMS level. If you don't find any particular advantage in performance now, you can't exclude that performance improvement will be implemented in future/alternative versions of the driver/rdbms.
Performance improvements for updates can be obtained by using prepared statements in batch-mode but this is another story.
Ok finally there is a paper that tests this, and the conclusion is that it doesn't improve performance, and in some cases its slower:
https://ieeexplore.ieee.org/document/9854303
PDF: https://www.bib.irb.hr/1205158/download/1205158.Performance_analysis_of_SQL_Prepared_Statements_in_CRUD_operations_final.pdf