I'm having a problem with mysql queries taking a little too long to execute.
private void sqlExecute(String sql, Map<Integer, Object> params) {
try (Connection conn = dataSource.getConnection(); PreparedStatement statement = conn.prepareStatement(sql)) {
if (params.size() > 0) {
for (Integer key : params.keySet()) {
statement.setObject(key, params.get(key));
}
}
statement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
I've narrowed the problem down to the executeUpdate() line specifically. Everything else runs smoothly, but this particular line (and when I run executeQuery() as well) takes around 70ms to execute. This may not seem like an unreasonable amount of time, but currently this is a small test db table with under 100 rows. Columns are indexed, so a typical query is only looking at around 15 rows.
Ultimately however, we'll need to scan much larger tables with thousands of rows. Additionally, we're running numerous queries at a time (they can't really be batched because the results of each query are used for future queries), so all of those queries together are taking more like 7s.
Here's an example of a method for running a mysql query:
public void addRating(String db, int user_id, int item_id) {
parameters = new HashMap<>();
parameters.put(1, user_id);
parameters.put(2, item_id);
sql = "INSERT IGNORE INTO " + db + " (user_id, item_id) VALUES (?, ?)";
sqlExecute(sql, parameters);
}
A couple of things to note:
The column indexing is probably not the problem. When running the same mysql statements in our phpMyAdmin console, execution time is more like 0.3ms.
Also of note is that the execution time is consistently 70ms, regardless of the actual mysql statement.
I'm using connection pooling and wonder if this is possibly a source of the problem. In other tests, dataSource.getConnection() also takes about 70ms. I'm running this code locally.
The above example is for an INSERT using executeUpdate(), but the same problem happens for SELECT statements using executeQuery().
I have tried using /dev/urandom per Oracle's suggestion, but this made no difference.
Related
I have a blacklist stored in a table that contains approximately 1.5 billion records. My goal is to load the records into a HashSet so my program can later check if domain names are blacklisted (this is not the entire functionality of the program, just a piece). I currently have the following code to load the records:
HashSet<String> list = new HashSet<String>();
try {
Statement stmt = conn.createStatement();
stmt.setFetchSize(100000000);
try {
ResultSet rs = stmt.executeQuery("SELECT DNname FROM " + table);
try {
while (rs.next()) {
list.add(rs.getString(1));
}
} finally {
rs.close();
}
} finally {
stmt.close();
}
} catch (SQLException e) {
System.out.println("Error loading blacklist from DB");
e.printStackTrace();
}
However, this takes incredibly long to complete. Is there a more efficient way to accomplish my goal?
"this takes incredibly long to complete" - what takes long to complete? How long does the query take if you run it from a DB client console?
You've not provided any information on what indexes are present in the table, so that's something you'll want to look into.
setFetchSize is a hint to the JDBC driver - you really want a WHERE clause where you can limit the records fetched. Oracle DB can do that using ROWNUM; for other DBs, you'll have to look for yourself.
That said, I concur with #ScaryWombat; keeping 1.5 billion records in memory is not scalable design. An alternative would to be build up a client cache by caching the database look up for each domain; in all practicality, the cache will have fewer than 1.5 billion entries.
JDBC has been supporting bulk updates for a long time using addBatch and executeBatch. Why isn't there any support for adding a bunch of prepared statements and getting an array of result sets as response?
For example, if I wanted to load customer details, basic account details, basic card details, basic loan details etc. for a single view, I would prefer to create a bunch of prepared statements and append the prepared statements to an ArrayList and execute them as a batch. I would then loop through the result sets and process the data. Hopefully, several network round trips would be saved (assuming my queries are performant).
Sample bunch of queries:
SELECT custid, first, last, age FROM Customer where custid = ?
SELECT custid, acno, accountname, accounttype, status FROM Account where custid = ?
SELECT custid, cardno, cardname, cardtype, status FROM CreditCard where custid = ?
SELECT custid, loanno, principal, rate FROM Loan where custid = ?
I can imagine several hypothetical reasons why it could be a bad idea. But, I am not sure which is most likely true in the real world.
Hypothetical reasons against having bulk-fetch:
There is some fundamental networking/db stack/memory related issue
which prevents a bunch of select queries to be executed on the same
connection and result-sets kept open.
Response handling code would be too cumbersome, as there could be exceptions at call level and individual statement level. And, several statements would have to be closed correctly.
There is no significant performance gain in reducing the number of network-calls. Query execution is the main bottleneck and network round-trip cost is insignificant.
There could be misuse of such a feature. A single non-performant query batched up like this with other queries could cause application to perform poorly.
The reason I ask this is because often I see a lot of Join queries which merge parent-child relationships into a single SQL query, just for the sake of completing the loading in a single call.
However, as the number of tables grows, the query becomes complex. Also, the parent table information is repeated in every row of every child. So, there is huge amount of data redundancy in the single join-ed result set.
Sample join query:
SELECT custid, first, last, age, acno, accountname, accounttype, a.status, cardno, cardname, cardtype, c.status, loanno, principal, rate
FROM Customer cc, Account a, CreditCard c, Loan l
WHERE a.custid=CC.custid(+) and c.custid=CC.custid(+) and l.custid=CC.custid(+)
The JDBC API does support this.
Statement.getMoreResults() can tell you if the SQL statement you executed through execute() produced more than one ResultSet
Quote from the JavaDocs for getMoreResults():
Moves to this Statement object's next result, returns true if it is a ResultSet object, and implicitly closes any current ResultSet object(s) obtained with the method getResultSet.
There are no more results when the following is true:
// stmt is a Statement object<br>
((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1))
However it depends on the backend DBMS and the JDBC driver if you can use this. Some JDBC driver simply reject to run more than one statement with a single execute() call (mainly as a means to prevent SQL injenction), others don't.
So in e.g. Postgres you can do something like this:
boolean hasResult = stmt.execute(
"select * from table_1;\n" +
"select * from table_2;");
while (hasResult)
{
rs = stmt.getResultSet();
while (rs.next())
{
// process the result set
}
hasResult = stmt.getMoreResults();
}
This even allows mixing SELECT and e.g. UPDATE statements if you also check for getUpdateCount()
As far as I know you can also do this with SQL Server. It does not work with Oracle.
I haven't tried this with a PreparedStatement though. But as getMoreResults() is defined for Statement it is available for a PreparedStatement as well.
How about to put queries to a procedure and then use CallableStatement to execute that procedure?
A CallableStatement can return one ResultSet object or multiple
ResultSet objects. Multiple ResultSet objects are handled using
operations inherited from Statement.
try
{
CallableStatement stmt = con.prepareCall(/* call procedure */);
boolean results = stmt.execute();
int rsCount = 0;
while (results)
{
ResultSet rs = stmt.getResultSet();
while (rs.next())
{
}
rs.close();
results = stmt.getMoreResults();
}
stmt.close();
}
catch (Exception e) {
e.printStackTrace();
}
Relational databases are designed and optimized for retrieving data through SQL queries that JOIN data from multiple tables. Executing a single query that (correctly) JOINs data is likely always more efficient than getting the same data with separate queries.
When a single query gets too complex, it should be refactored into a VIEW -- from which you can query, joining data from other TABLEs and VIEWs, if required.
Given the above, I don't see a need for bulk queries.
I get the feeling you don't understand what a prepared statement is.
A prepared statement is an object you declare once, then reuse it all the time with different supplied parameters to it.
You're not telling me that you recreate a prepared statement from scratch each time you wish to execute it again?
Say you have four loops. before executing your loops you do this:
PreparedStatement statement1, statement2, statement3,statement4;
try {
con.setAutoCommit(false);//only needed when also doing updates/inserts
statement1 = con.prepareStatement("SELECT custid, first, last, age FROM Customer where custid = ?");
statement2 = con.prepareStatement("SELECT custid, acno, accountname, accounttype, status FROM Account where custid = ?");
// etc....
for (Map.Entry<String, Integer> e : customers.entrySet()) {
statement1.setInt(1, e.getValue().intValue());
ResultSet rs = statement1.executeQuery();
// do what you need to do
statement2.setInt(1, e.getValue().intValue());
ResultSet rs2 = statement2.executeQuery();
// do what you need to do
}
con.commit();//only needed when also doing updates/inserts
}
}
There is no need to recreate the prepared statements. That is why its calleda prepared statement. You just feed it the new values it needs to query.
This way you can add it to lists, itereate it the way you want to itereate it, etc.. and it's all optimised since the database engine will remember the query plans and the optimisations it makes for it. What you do with the prepared statement object is up to you.
It also does this if you recreate the objects constantly because it will remember the query, but you save the overhead of createing new objects over and over and the memory issues that come with that.
So, without a clearer question this is the best answer I can give you.
We are using JDBC batch update (Statement - void addBatch( String sql ) and int[] executeBatch()) in our Java code. The job is supposed to insert about 27k records in a table and then update about 18k records in a subsequent batch.
When our job runs at 6am, it is missing a few thousand records (we observed this from the database audit logs). We can see from the job logs that the update statements are being generated for all the 18k records. We understand that all the update statements get added in sequence to the batch, However, only records from the beginning of the batch seem to be missing. Also, it is not a fixed number everyday - one day, it skips out on the first 4534 update statements and another day it skips out on the first 8853 records and another day, it skips out on 5648 records.
We initially thought this could be a thread issue but have since moved away from that thought process as the block being skipped out does not always contain the same number of update statements. If we assume that the first few thousand updates are happening even before the insert, then the updates should at least show up in the database audit logs. However, this is not the case.
We are thinking this is due to a memory/heap issue as running the job at any other time picks up all the 18k update statements and they are executed successfully. We reviewed the audit logs from the Oracle database and noticed that the missing update statements are never executed on the table during the 6am run. At any other time, all the update statements are showing up in the database audit logs.
This job was running successfully for almost 3 years now and this behavior started only from a few weeks ago. We tried to look at any changes to the server/environment but nothing jumps out at us.
We are trying to pinpoint why this is happening, specifically, if there are any processes that are using up too much of the JVM heap and as a result, our update statements are getting overwritten/not being executed.
Database: Oracle 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Java: java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b11)
Java HotSpot(TM) Server VM (build 20.51-b01, mixed mode)
void main()
{
DataBuffer dataBuffer;//assume that all the selected data to be updated is stored in this object
List<String> TransformedList = transform(dataBuffer);
int status = bulkDML(TransformedList);
}
public List<String> transform(DataBuffer i_SourceData)
{
//i_SourceData has all the data selected from
//the source table, that has to be updated
List<Row> AllRows = i_SourceData.getAllRows();
List<String> AllColumns = i_SourceData.getColumnNames();
List<String> transformedList = new ArrayList<String>();
for(Row row: AllRows)
{
int index = AllColumns.indexOf("unq_idntfr_col");
String unq_idntfr_val = (String)row.getFieldValues().get(index);
index = AllColumns.indexOf("col1");
String val1 = (String)row.getFieldValues().get(index);
String query = null;
query = "UPDATE TABLE SET col1 = " + val1 + " where unq_idntfr_col=" + unq_idntfr_val;//this query is not the issue either - it is parameterized in our code
transformedList.add(query);
}
return transformedList;
}
public int bulkDML(List<String> i_QueryList)
{
Connection connection = getConnection();
Statement statement = getStatement(connection);
try
{
connection.setAutoCommit(false);
for (String Query: i_QueryList)
{
statement.addBatch(Query);
}
statement.executeBatch();
connection.commit();
}
//handle various exceptions and all of them return -1
//not pertinent to the issue at hand
catch(Exception e)
{
return -1;
}
CloseResources(connection, statement, null);
return 0;
}
Any suggestions would be greatly appreciated, thank you.
If you want to execute multiple updates on the same table then I suggest modifying your query to use binds and a PreparedStatement because that's really the only way to do real DML batching with the Oracle Database. For example your query would become:
UPDATE TABLE SET col1=? WHERE unq_idntfr_col=?
and then use JDBC batching with the same PreparedStatement. This change would require to you revisit your bulkDML method to make it take bind values as parameter instead of SQL.
The JDBC pseudo code would then look like this:
PreparedStatement pstmt = connection.prepareCall("UPDATE TABLE SET col1=? WHERE unq_idntfr_col=?");
pstmt.setXXX(1, x);
pstmt.setYYY(2, y);
pstmt.addBatch();
pstmt.setXXX(1, x);
pstmt.setYYY(2, y);
pstmt.addBatch();
pstmt.setXXX(1, x);
pstmt.setYYY(2, y);
pstmt.addBatch();
pstmt.executeBatch();
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I am working a Airsoft application.
I'm trying to add records to a MS Access Database via SQL in Java. I have established a link to the database, with the following:
try
{
//String Driver = "sun.java.odbc.JdbcOdbcDriver";
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
Connection conn = DriverManager.getConnection("jdbc:ucanaccess://" + URL,"","");
Statement stmt = conn.createStatement();
System.out.println("Connection Established!");
ResultSet rs = stmt.executeQuery("SELECT * FROM AirsoftGunRentals");
tblRent.setModel(DbUtils.resultSetToTableModel(rs));
}
catch(Exception ex)
{
JOptionPane.showMessageDialog(null, "Error");
}
I am using Ucanaccess to access my MS database. It is reading the database and is displaying to a JTable. However, I need to create three JButtons to add, delete and update the table. I have tried to code the add button, and I have tried to add a record, but it crashes and gives me errors.
try
{
//String Driver = "sun.java.odbc.JdbcOdbcDriver";
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
Connection conn = DriverManager.getConnection("jdbc:ucanaccess://" + URL,"","");
Statement stmt = conn.createStatement();
System.out.println("Connection Established!");
String Query= "INSERT INTO AirsoftGunRentals(NameOfGun, Brand, TypeOfGuns, NumberOfMagazines,Extras,NumberAvailable,UnitRent)"+
"VALUES('"+pName+"','"+pBrand+"','"+pTypeOfGun+"','"+pNumMags+"','"+pExtras+"','"+pNumberAvail+"','"+pRent+"');";
ResultSet rs = stmt.executeQuery(Query);
JOptionPane.showMessageDialog(null, "Success!");
}
catch(Exception ex)
{
JOptionPane.showMessageDialog(null, "Error");
}
I have attempted all three, hoping for a result. But am still getting big errors. The only difference between the buttons is that one adds, one deletes and one updates the table. Other then that, the code is the same, minus variables.
As Brahim mentionned it, you should use stmt.executeUpdate(Query) whenever you update / insert or delete data. Also with this particular query, given your String concatenation (see end of line), there is no space between the ")" and the "VALUES" which probably causes a malformed query.
However, I can see from your code that you are not very experienced with such use-cases, and I'd like to add some pointers before all hell breaks loose in your project :
Use PreparedStatement instead of Statement and replace variables by placeholders to prevent SQL Injection.
The code that you are using here is extremely prone to SQL injection - if any user has any control over any of the variables, this could lead to a full database dump (theft), destruction of data (vandalism), or even in machine takeover if other conditions are met.
A good advice is to never use the Statement class, better be safe than sorry :)
Respect Java Conventions (or be coherent).
In your example you define the String Query, while all the other variables start with lower-case (as in Java Conventions), instead of String query. Overtime, such little mistakes (that won't break a build) will lead to bugs due to mistaking variables with classnames etc :)
Good luck on your road to mastering this wonderful language ! :)
First add a space before the quotation marks like this :
String Query= "INSERT INTO AirsoftGunRentals(NameOfGun, Brand, TypeOfGuns, NumberOfMagazines,Extras,NumberAvailable,UnitRent) "+
" VALUES('"+pName+"','"+pBrand+"','"+pTypeOfGun+"','"+pNumMags+"','"+pExtras+"','"+pNumberAvail+"','"+pRent+"');";
And use stmt.executeUpdate(Query); instead of : stmt.executeQuery(Query);in your insert, update and delete queries. For select queries you can keep it.
I managed to find an answer on how to add, delete and update records to a MS Access DB. This is what I found, after I declared the connection, and the prepped statement. I will try to explain to the best I can. I had to add values individually using this:
(pstmt = Prepped Statement Variable)
pstmt.setWhatever(1,Variable);
And it works fine now. I use the same method to delete and update records.
This is the basic query format:
String SQLInsert = "INSERT INTO Tbl VALUES(NULL,?,?,?,?)";
The NULL in the statement is the autonumber in the table. and .setWhatever() clause replaces the question marks with the data types. Thus manipulating the database.
Thank you everyone for all your contributions. It helped a lot, and made this section a lot more understandable.