I have 26 CSV files that I want to grab from the internet on a nightly basis and upload them into a Postgresql table. I have this working using Java, PreparedStatement, and Batch. Despite this, performance is painfully slow. To grab the 6000 or so entries and put them into Postgresql, it's taking 30 minutes. This is my first time doing something like this, so I don't exactly have a reference point as to whether this is fast or slow.
To get the file, I am using this code.
URL grabberUrl = new URL(csvUrl);
URLConnection grabberConn = grabberUrl.openConnection();
BufferedReader grabberReader = new BufferedReader(new InputStreamReader(grabberConn.getInputStream()));
I am then using PreparedStatement to and taking values from the input stream and setting them
con = DriverManager.getConnection(url, user, password);
pst = con.prepareStatement("insert into blah(name, year) values(?, ?)");
pst.setString(1, name);
pst.setString(2, year);
I am then batching up the inserts. I've tried values from 100 to 1000 with no meaningful change to performance.
pst.addBatch();
if (count == 100) {
count = 0;
pst.executeBatch();
}
Has anyone got any suggestions as to what I can do to make things faster?
If you can access the files from the PostgreSQL server try using the copy statement. See link
http://www.postgresql.org/docs/9.3/static/sql-copy.html
Also, if you know the data quality you can temporarily remove any table constraints and drop any index's. You can add the constraints and the index's after loading the data.
Try the following:
PGConnection con = (PGConnection) DriverManager.getConnection(...);
CopyManager copyManager = con.getCopyAPI();
copyManager.copyIn("copy mytable from stdin with (format csv)", grabberReader);
If mytable is heavily indexed, then drop the indexes, load, and recreate the indexes.
Related
Hello there and thanks for reading.
I'm trying to retrieve the ID of the newly inserted data, but I always get an empty ResultSet.
Connection con = main.getCon();
String sqlCommand = "Insert Into Relations(name,explanation) values(?,?)";
PreparedStatement state =
con.prepareStatement(sqlCommand,Statement.RETURN_GENERATED_KEYS);
state.setString(1,name.getText());
state.setString(2,explanation.getText());
int affectedRows = state.executeUpdate();
assert (affectedRows>0);
ResultSet rs = state.getGeneratedKeys();
assert rs.next();
int instertedID= rs.getInt("ID");
Not sure what's wrong with it. Checked different samples online, but couldn't figure out what's my mistake.
I also tried it with Statement, but no luck with that either.
Point 1: the code runs smoothly and my data in inserted into the database.
Point 2: there are examples online for this very case, you can check it here:
https://www.baeldung.com/jdbc-returning-generated-keys
I just realized that my ResultSet wasn't empty, I had problem with using my debugger and that's why I thought it was empty.
As Mark Rotteveel mentioned in a comment, the problem was with "assert" statement.
The problem is your use of assert rs.next(). Assertions in Java are intended for checking invariants (eg during testing), but when you normally run Java, assert statements are not executed, they are only executed when explicitly enabling this with the -ea commandline option.
As a result, rs.next() is not called, so your result set is still positioned before the first row when you call rs.getInt(1). Instead use if (rs.next()) { ... }.
This is DB engine dependent. Some tips:
JDBC is low-level and not appropriate to program with
It's a complicated API. Use something that makes it easier: JDBI, or JOOQ. They may have abstractions over insertion that takes care of this stuff for you.
Some DB engines require that you list the column name
Try:
con.prepareStatement(sqlCommand, new String[] {"UNID"});
Some DB engines will only return generated values as direct resultset
Don't call .executeUpdate(); instead, call .executeQuery() which returns a ResultSet; check that one.
Something else
Post the exact table structure and DB engine you're working with if the above doesn't help.
Your code is broken
You can't create resource objects (once that must be closed) unless you do so safely, and you're not doing so safely. Use try-with-resources:
String sql = "INSERT INTO relations(name, explanation) VALUES (?, ?)";
try (Connection con = main.getCon();
PreparedStatement ps = con.prepareStatement(sql, new String[] {"unid"})) {
state.setString(1, name.getText());
state.setString(2, explanation.getText());
try (ResultSet rs = state.executeQuery()) {
if (!rs.next()) throw new SQLException("insert didn't return autogen?");
System.out.println(rs.getInt(1));
}
}
ResultSets, Statements, PreparedStatements, and Connections are all resources (must be closed!) - if you want to store one of those things in a field, you can do that, but only if the class that contains this field is itself a resource: It must have a close() method, it must implement AutoClosable, and you can then only make instances of this class with try-with-resources as above.
Failure to adhere to these rules means your app seems to work, but is leaking resources as it runs, thus, if you let it run long enough, it will start crashing. Also, your DB engine will grind to a halt as more and more connections are left open, stuck forever.
change the last line of code to this because the DBMS you are using may not support the getting value by column name so pass the index of that column:
int instertedID = rs.getInt(1);
String sqlCommand = "Insert Into Relations (name, explanation) values(?, ?)";
PreparedStatement state = con.prepareStatement(sqlCommand, Statement.RETURN_GENERATED_KEYS);
state.setString(1,name.getText());
state.setString(2,explanation.getText());
state.executeUpdate();
ResultSet resultSet = state.getGeneratedKeys();
if(resultSet.next()) {
System.out.println(resultSet.getInt(1)); //Indicate the corresponding column index value.
}
I'm having a weird problem with an Grails application accessing data. Going deeper I've isolated the problem to a plain java8 small application using PreparedStatement.executeQuery vs Statement.executeQuery.
Consider the following snippet of code:
// executes in milliseconds
directSql = "select top(10) * from vdocuments where codcli = 'CCCC' and serial = 'SSSS' ORDER BY otherField DESC;";
stmt = con.createStatement();
rs = stmt.executeQuery(directSql);
// More than 10 minutes
sqlPrepared = "select top(10) * from vdocuments where codCli = ? and serial = ? ORDER BY otherField DESC;";
PreparedStatement pStatement = con.prepareStatement( sqlPrepared );
pStatement.setString(1, "CCCC");
pStatement.setString(2, "SSSS");
rsPrepared = pStatement.executeQuery();
Same query.
Data comes from a view on SqlServer (2008, I think, have no access right now) from a table with more than 15 Million records. There are indexes for all needed fields and the same query (the first one) executed from console runs also quite fast.
If I execute the slow PreparedStatement query without the ORDER clause it also runs fast.
It looks clear to me that for any cause the database it's not using indexes and make a full scan when using preparedStatement, but maybe I'm wrong so I'm open to any idea.
I thought maybe the driver (sqlserver official latest and jtds has been tested) was holding the data waiting for any kind of EOF from connection but I've checked with tcpdump on my side and no data is received.
I can't find why this is happening so any idea will be welcomed.
Thank you in advanced!
I've finally found a solution, at least in for my case. I got it here http://mehmoodbluffs.blogspot.com.es/2015/03/hibernate-queries-are-slow-sql-servers.html . Telling (driver? sqlServer?) not to send parameters as Unicode have resolved the problem.
Current connection string it's now:
String connectionUrl = "jdbc:sqlserver://server:port;databaseName=myDataBase;sendStringParametersAsUnicode=false";
And now both direct queries and preparedStatements runs at millisecond speed.
Thank you #DanGuzman for your suggestions!
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I have a problem with a really slow connection between my Java code and a MySQL Database. I don't know where the bottle neck is.
My program is more or less a chatbot. The user types something in, my program splits the sentence into words and sends it word per word to the database. If it finds something there, the user gets an output.
The database is on an external Server, but I also tried to connect to a pc next to me. Both is slow.
I tried the connection once at another place then where I normally work and there it was fast, most of the time.
My SQL Code:
SELECT info.INFORMATION FROM INFORMATION info, INFO_SCHLUESSEL sch
WHERE LCASE(sch.SCHLUESSELWORT) LIKE '" + input + "%' AND info.ID_INFO = sch.ID_INFO
Order BY info.PRIORITAET DESC LIMIT 1;
(just remembered, if it helps to understand the sql code:
schluessel = key
Schluesselwort = key word
prioritaet = priority)
My Java Database Code is more or less standard stuff:
String driver = "com.mysql.jdbc.Driver";
String dbase = "jdbc:mysql://bla";
String dbuser = "bla";
String dbpw = "bla";
Class.forName(driver);
Connection con = DriverManager.getConnection(dbase, dbuser, dbpw);
Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(query);
while (rs.next())
{
ergebnis = rs.getString("info.INFORMATION");
}
rs.close();
stmt.close();
con.close();
edit:
I have tried this DBCP for a while now, and I can't seem to get it to work. It seems to be as slow as the old connection. This is the example provided by the website that I use:
GenericObjectPool connectionPool = new GenericObjectPool(null);
ConnectionFactory connectionFactory = new DriverManagerConnectionFactory("jdbc:mysql://bla", "bla", "bla");
PoolableConnectionFactory poolableConnectionFactory = new PoolableConnectionFactory(connectionFactory,connectionPool,null,null,false,true);
PoolingDriver driver = new PoolingDriver();
driver.registerPool("example",connectionPool);
Connection conn = DriverManager.getConnection("jdbc:apache:commons:dbcp:example");
I suspect that it's the connection setup that is causing the problem. It would be worth timing how long this takes:
Connection con = DriverManager.getConnection(dbase, dbuser, dbpw);
and if so, check out Apache Commons DBCP, which allows you to pool database connections.
Well I think this warrants a discussion on the design.There are a few things which you can do in order to improve the performance. Since you are not persisting anything here, its better to preload all the data in memory in some custom java object, a map, list or whatever and then do an in-memory lookup for the word and get the results. Another approach could be to use a batch statement so that you dont go ahead and create and release connections for each word. Oh and if using batch statements make sure you set the batch size to an appropriate number, preferably a prime number
I am working a Airsoft application.
I'm trying to add records to a MS Access Database via SQL in Java. I have established a link to the database, with the following:
try
{
//String Driver = "sun.java.odbc.JdbcOdbcDriver";
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
Connection conn = DriverManager.getConnection("jdbc:ucanaccess://" + URL,"","");
Statement stmt = conn.createStatement();
System.out.println("Connection Established!");
ResultSet rs = stmt.executeQuery("SELECT * FROM AirsoftGunRentals");
tblRent.setModel(DbUtils.resultSetToTableModel(rs));
}
catch(Exception ex)
{
JOptionPane.showMessageDialog(null, "Error");
}
I am using Ucanaccess to access my MS database. It is reading the database and is displaying to a JTable. However, I need to create three JButtons to add, delete and update the table. I have tried to code the add button, and I have tried to add a record, but it crashes and gives me errors.
try
{
//String Driver = "sun.java.odbc.JdbcOdbcDriver";
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
Connection conn = DriverManager.getConnection("jdbc:ucanaccess://" + URL,"","");
Statement stmt = conn.createStatement();
System.out.println("Connection Established!");
String Query= "INSERT INTO AirsoftGunRentals(NameOfGun, Brand, TypeOfGuns, NumberOfMagazines,Extras,NumberAvailable,UnitRent)"+
"VALUES('"+pName+"','"+pBrand+"','"+pTypeOfGun+"','"+pNumMags+"','"+pExtras+"','"+pNumberAvail+"','"+pRent+"');";
ResultSet rs = stmt.executeQuery(Query);
JOptionPane.showMessageDialog(null, "Success!");
}
catch(Exception ex)
{
JOptionPane.showMessageDialog(null, "Error");
}
I have attempted all three, hoping for a result. But am still getting big errors. The only difference between the buttons is that one adds, one deletes and one updates the table. Other then that, the code is the same, minus variables.
As Brahim mentionned it, you should use stmt.executeUpdate(Query) whenever you update / insert or delete data. Also with this particular query, given your String concatenation (see end of line), there is no space between the ")" and the "VALUES" which probably causes a malformed query.
However, I can see from your code that you are not very experienced with such use-cases, and I'd like to add some pointers before all hell breaks loose in your project :
Use PreparedStatement instead of Statement and replace variables by placeholders to prevent SQL Injection.
The code that you are using here is extremely prone to SQL injection - if any user has any control over any of the variables, this could lead to a full database dump (theft), destruction of data (vandalism), or even in machine takeover if other conditions are met.
A good advice is to never use the Statement class, better be safe than sorry :)
Respect Java Conventions (or be coherent).
In your example you define the String Query, while all the other variables start with lower-case (as in Java Conventions), instead of String query. Overtime, such little mistakes (that won't break a build) will lead to bugs due to mistaking variables with classnames etc :)
Good luck on your road to mastering this wonderful language ! :)
First add a space before the quotation marks like this :
String Query= "INSERT INTO AirsoftGunRentals(NameOfGun, Brand, TypeOfGuns, NumberOfMagazines,Extras,NumberAvailable,UnitRent) "+
" VALUES('"+pName+"','"+pBrand+"','"+pTypeOfGun+"','"+pNumMags+"','"+pExtras+"','"+pNumberAvail+"','"+pRent+"');";
And use stmt.executeUpdate(Query); instead of : stmt.executeQuery(Query);in your insert, update and delete queries. For select queries you can keep it.
I managed to find an answer on how to add, delete and update records to a MS Access DB. This is what I found, after I declared the connection, and the prepped statement. I will try to explain to the best I can. I had to add values individually using this:
(pstmt = Prepped Statement Variable)
pstmt.setWhatever(1,Variable);
And it works fine now. I use the same method to delete and update records.
This is the basic query format:
String SQLInsert = "INSERT INTO Tbl VALUES(NULL,?,?,?,?)";
The NULL in the statement is the autonumber in the table. and .setWhatever() clause replaces the question marks with the data types. Thus manipulating the database.
Thank you everyone for all your contributions. It helped a lot, and made this section a lot more understandable.