I have an old MySQL 4.1 database with a table that has a few millions rows and an old Java application that connects to this database and returns several thousand rows from this this table on a frequent basis via a simple SQL query (i.e. SELECT * FROM people WHERE first_name = 'Bob'. I think the Java application uses client side prepared statements but was looking at switching this to the server, and in the example mentioned the value for first_name will vary depending on what the user enters).
I would like to speed up performance on the select query and was wondering if I should switch to Prepared Statements or Stored Procedures. Is there a general rule of thumb of what is quicker/less resource intensive (or if a combination of both is better)
You do have an index of first_name, right? That will speed up your query a lot more than choosing between prepared statements and stored procedures.
If you have just one query to worry about, you should be able to implement the two alternatives (on your test platform of course!) and see which one gives you the best performance.
(My guess is that there won't be much difference though ...)
Looks like the best way is just to make the change and test it out in a test environment.
Thanks for the help.
Related
If a query is taking more time in db even after using indexed columns in join conditions then what can we do in code to minimize the execution time in Oracle and MySql.
I am feeling some daily in execution of query in Oracle from Java layer. Although I am using condition on the query on index column on numeric value column.
I am using Java Prepared Statement and execution executed from Java.
You are asking us to diagnose something without symptoms. You should provide output of EXPLAIN PLAN (or set autotrace on) and also the schema in question.
There is more to tuning than indexing columns. But without knowing, and I assume you've done all the optimization you can do, then it may be time to do pre-calculation with either tables, or materialized views.
Other options include solid state disk or parallelism (partitioning and/or parallel query) and so forth.
Not sure what you mean by "Java layer", I find that Java is often a hindrance to performance in Oracle. Stick with PL/SQL for stored procedures and daily jobs, if possible. To a Java programmer, every problem appears to be a Java problem. But Java brings little to the table as far as speeding up queries.
I am making a Java gui and web application which will use the same mysql database.
It's a DTh management system where all the information will be stored and retrieved dynamically depending on input.
I believe that views are static by nature and thus would be useless as all my queries will have a different where condition (userid).
Do I need to use triggers? I mean I could code the java to execute multiple statements instead of using a inbuilt trigger (e.g. Insert in customers name and family members name both will have a duplicate copy for head of the family). Is there a performance hit? Am I wrong in some way?
And same thing what is the use of stored procedures? Can't I use methods in java to do everything?
So, I am asking is it possible to shift all the calculation intensive stuff to java and web script instead of the sql. If yes, does this mean I only have to create the backend structure of Database(i.e. all the different tables and FK,PK) and do rest without using any sql stuff on mysql workbench?
Thank you for helping.
There is (as always) one correct answer: It depends.
If you only want to show and query some data, you probably won't need trigger or stored procedures.
Views are a different thing: They are pretty helpful if you want a static viesw to a join-table or something like that. If you don't need this, just don't use it.
Keys are really important. They make your data robust against wrong input.
What you shoud use is PrepearedStatement instead of Statement. If you only use PreparedStatements, you are (nearly ?) safe in the question of SQL-Injection.
We use Views because it just faster than select query and for just showing data (not edit-update) it is faster and preferable.
Trigger are fired at database side so it is faster because it just execute 2 or more queries in single execution.
Same in Stored procedures, because we can execute more than one queries in single database connection. If we execute different queries than it take more time on every execution for database connection (find database server, authenticate, find database,... etc.).
For a thick-client project I'm working on, I have to remotely connect to a database (IBM i-series) and perfom a number of SQL related tasks:
Download/Update a set of local/offline 'control' data - this data may have changed between runs unnoticed.
On command, download data from multiple (15-20) tables and store separately into a single Java object. The names of the tables are known, but the schema name changes between runs and can change inter-run (as far as I know, PreparedStatements do not allow one to dynamically insert the schema).
I had considered using joins/unions/etc to perform all of these queries as one, but the project requires me to have in-memory separations between table data (instead of one big joined lump).
Perform between 2 and 100+ repetitions of (2)
The last factor is that this needs to be run on high-latency (potentially dial-up) network connections using Java 1.5 on the oldest computers possible.
Currently I run 15-20 dynamically constructed PreparedStatements but I know this to be rather inefficient (I measured, so as to avoid premature optimization ala Knuth).
What would be the most efficient and error-tolerant method of performing these tasks?
My thoughts:
Regarding (1), I really have no idea other than checking the entire table against the new table, at which point I feel I might as well just download the new (potentially and likely unchanged) table and replace the old one, but this takes more time.
For (2): Ideally I'd be able to construct something similar to an array of SELECT statements, send them all at once, and have the database return one ResultSet per internal query. From what I understand, however, neither Statement nor PreparedStatement support returning multiple ResultSet objects.
Lastly, the best way I can think of doing (3) is to batch a number of (2) operations.
There is nothing special about having moving requirements, but the single most important thing to use when talking to most databases is having a connection pool in your Java application and use it properly.
This also applies here. The IBM i DB2/400 database is quite fast, and the database driver available in the jt400 project (type 4, no native code) is quite good, so you can pull over quite a bit of data in a short while simply by generating SQL on the fly.
Note that if you only have a single schema you can tell in the conneciton which one you need, and can then use non-qualified table names in your SQL statements. Read the JDBC properties in the InfoCenter very carefully - it is a bit tricky to get right. If you need multiple schemaes, the "naming=system" allows for library lists - i.e. a list of schemaes to look for the tables, which can be very useful when done correctly. The IBM i folks can help you here.
That said, if the connection is the limiting factor, you might have a very strong case for running the "create object from tables" Java code directly on the IBM i. You should already now prepare for being able to measure the traffic to the database - either with network monitoring tooling, using p6spy or simply going through a proxy (perhaps even a throtteling one)
Ideally, you would have the database group provide you with a set of stored procedures to optimize the access to the database.
Since you don't have access, you may want to ask them if they have timestamp data in the database at the row level to see when records were modified, this way you can select only the data that's changed since some point in time.
What #ThorbjørnRavnAndersen is suggesting is moving the database code on to the IBM host and connecting to it via RMI or JMS from the client. So the server code would be a RMI or JMS Server that accesses the database on your behalf and returns you java objects instead of bringing SQL resultsets across the wire.
I would pass along your requirements to the database team and see if they can't do something for you. I'm sure they don't want all these remote clients bringing all the data down each time, so it would benefit them as much as it would benefit you.
Our software is deployed at a client site where they are having performance issues, they've hired a SQL consultant to look at the databases and see where their bottlenecks are.
One of the things the consultant spotted was that a lot of our statements where converting to nvarchar.
After some investigation I discovered that it was the PreparedStatement that was doing it.
To give you an example:
PreparedStatement query = connection.prepareStatement("SELECT * FROM sys.tables WHERE"
+ " name LIKE ?");
query.setString(1, "%%");
Becomes
declare #p1 int
set #p1=1
exec sp_prepexec #p1 output,N'#P0 nvarchar(4000)',N'SELECT * FROM sys.tables WHERE name LIKE #P0 ',N'%%'
select #p1
the adhoc executeQuery method on the other hand just sends the SQL through
ResultSet rs = connection.createStatement().executeQuery("SELECT * FROM sys.tables WHERE name LIKE '%%'");
becomes
SELECT * FROM sys.tables WHERE name LIKE '%%'
What I'm trying to work out is how bad this actually is. Prepared Statements are used all over our application because their use is supposed to be more efficient (if executed multiple times) than the adhoc queries.
Additionally I can't see that the Microsoft guys would put something into their driver that they knew would cause performance issues.
Is this something that we should really be looking into, or is the consultant looking for issues where there aren't any?
The pepared statement forces the query to become a parameterized query (which is a good thing), whereas the ad-hoc query may or may not be auto-parameterized by the SQL engine. In short, it's not a major performance hit.
However, there's always a lot of debate about whether or not you should use parameterized stored procedures from a design perspective (not necessarilly a performance one). I think most DBA's (and probably database consultants) prefer stored procedures because they're easier to debug using traditional database management tools (as well as providing security benefits, and including parameterization by default).
Prepared statements come with more optimization options than that of ad-hoc query, and usually will be better from security point of view (no SQL injection by design).
The nvarchar usage usually comes from non-functional requirements. If you want your application to support any Unicode characters, which is very common nowadays, you need it. If that's in the requirement, don't change it.
I think LIKE query is the culprit, you should actually look at full text indexing capability in your backend database.
The query looks super simple, it's the LIKE '%' query that causes scanning of character sequences in each field.
The problem is, we have a huge number of records (more than a million) to be inserted into a single table from a Java application. The records are created by the Java code, it's not a move from another table, so INSERT/SELECT won't help.
Currently, my bottleneck is the INSERT statements. I'm using PreparedStatement to speed-up the process, but I can't get more than 50 recods per second on a normal server. The table is not complicated at all, and there are no indexes defined on it.
The process takes too long, and the time it takes will make problems.
What can I do to get the maximum speed (INSERT per second) possible?
Database: MS SQL 2008. Application: Java-based, using Microsoft JDBC driver.
Batch the inserts. That is, only send 1000 rows at a time, rather then one row at a time, so you hugely reduce round trips/server calls
Performing Batch Operations on MSDN for the JDBC driver. This is the easiest method without reengineering to use genuine bulk methods.
Each insert must be parsed and compiled and executed. A batch will mean a lot less parsing/compiling because a 1000 (for example) inserts will be compiled in one go
There are better ways, but this works if you are limited to generated INSERTs
Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.
Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.
The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.
Have you looked into bulk operations bulk operations?
Have you considered to use batch updates?
Is there any integrity constraint or trigger on the table ?
If so, droping it before inserts will help, but you have to be sure that you can afford the consequences.
Look into Sql Server's bcp utility.
This would mean a big change in your approach in that you'd be generating a delimited file and using an external utility to import the data. But this is the fastest method for inserting a large number of records into a Sql Server db and will speed up your load time by many orders of magnitude.
Also, is this a one-time operation you have to perform or something that will occur on a regular basis? If it's one time I would suggest not even coding this process but performing an export/import with a combination of db utilities.
I would recommend using an ETL engine for it. You can use Pentaho. It's free. The ETL engines are optimized for doing bulk loading on data and also any forms of transformation/validation that are required.