Increasing INSERT speed - java

I'm currently using the following query to insert into a table only if the record does not already exist, presumably this leads to a table scan. It inserts 28000 records in 10 minutes:
INSERT INTO tblExample(column)
(SELECT ? FROM tblExample WHERE column=? HAVING COUNT(*)=0)
If I change the query to the following, I can insert 98000 records in 10 minutes:
INSERT INTO tblExample(column) VALUES (?)
But it will not be checking whether the record already exists.
Could anyone suggest another way of querying such that my insert speed is faster?

One simple solution (but not recommended) could be to simply have insert statement, catch duplicate key exception and log them. Assuming that the table has unique key constraint.

Make sure that you have an index on the column[s] you're checking. In general, have a look at the query execution plan that the database is using - this should tell you where the time is going, and so what to do about it.
For Derby db this is how you get a plan and how to read it.
Derby also has a merge command, which can act as insert-if-not-there. I've not used it myself, so you'd need to test it to see if it's faster for your circumstances.

Related

Delete or Truncate if no conditions involved

I am peer reviewing a code.
I found lots of Delete statement without conditions where Developers are removing data from table and inserting fresh data.
public void deleteAll() throws Exception {
String sql = "DELETE FROM ERP_FND_USER";
entityManager.createNativeQuery(sql, FndUserFromErp.class).executeUpdate();
LOG.log(Level.INFO, "TERP_FND_USER all data deleted");
}
Shall i make it standard to always use Truncate when delete all data as Truncate is more efficient when delete all? (or shall i be suspicious that in future a condition will come and we would need to change statement)?
I think rollbacking thing also not implemented in code i.e not transactional .
Truncating a table means we have no way of recovering the data, once done.
I believe DELETE is a better option in this case, given that we are expecting the table size is not very big.
If we are expecting a table size to be very big in terms of volume of data we are planning to store, then even in that case I recommend to use to DELETE given that we do not want to delete tables without any conditions in such cases.
Also if we are using a table only for the session of the java program I believe we can use a TEMP table instead of main table, that will help you to not DELETE it explicitly and it will be purged once the session is over.
Truncate should only be used when you are absolutely sure of DELETING the entire table and you have no intention of recovering it at all.
There is no strict answer.
There are several differences between DELETE and TRUNCATE commands.
In general, TRUNCATE works faster - the reason is evident: it is unconditional and does not perform search on the table.
Another difference is the identity: TRUNCATE reseeds the table identity, DELETE does not.
For example, you have the users table with column ID defined as identity and column Name:
1 | John Doe
2 | Max Mustermann
3 | Israel Israeli
Suppose you delete user with ID=3 via the DELETE command (with WHERE clause or not - does not even matter). Inserting another user will NEVER create a user with ID=3, most probably the created ID will be 4 (but there are situations when it can be different).
Truncating the table will start the identity from 1.
If you do not worry about identity and there are no foreign keys which may prevent you from deleting records - I would use TRUNCATE.
Update: Dinesh (below) is right, TRUNCATE is irreversible. This should be also taken into consideration.
You should use TRUNCATE if you need to reset AUTO_INCREMENT fields.
DELETE of all rows will not.
Other difference is performance, TRUNCATE will be faster than DELETE all row.
Either TRUNCATE or DELETE will remove definitively rows,
contrary to what was mentioned in another answer, except if DELETE is execute inside a TRANCACTION which is ROLLBACK. But if TRANSACTION is commited, no recover is possible.

Efficiant way to check large number string existing in database

I have a very large table in the database, the table has a column called
"unique_code_string", this table has almost 100,000,000 records.
Every 2 minutes, I will receive 100,000 code string, they are in an array and they are unique to each other. I need to insert them to the large table if they are all "good".
The meaning of "good" is this:
All 100,000 codes in the array never occur in the database large table.
If one or more codes occur in the database large table, the whole array will not use at all,
it means no codes in the array will insert into the large table.
Currently, I use this way:
First I do a loop and check each code in the array to see if there is already same code in the database large table.
Second, if all code is "new", then, I do the real insert.
But this way is very slow, I must finish all thing within 2 minutes.
I am thinking of other ways:
Join the 100,000 code in a SQL "in clause", each code has 32 length, I think no database will accept this 32*100,000 length "in clause".
Use database transaction, I force insert the codes anyway, if error happens, the transaction rollback. This cause some performance issue.
Use database temporary table, I am not good at writing SQL querys, please give me some example if this idea may work.
Now, can any experts give me some advice or some solutions?
I am a non-English speaker, I hope you see the issue I am meeting.
Thank you very much.
Load the 100,000 rows into a table!
Create a unique index on the original table:
create unique index unq_bigtable_uniquecodestring on bigtable (unique_code_string);
Now, you have the tools you need. I think I would go for a transaction, something like this:
insert into bigtable ( . . . )
select . . .
from smalltable;
If any row fails (due to the unique index), then the transaction will fail and nothing is inserted. You can also be explicit:
insert into bigtable ( . . . )
select . . .
from smalltable
where not exists (select 1
from smalltable st join
bigtable bt
on st.unique_code_string = bt.unique_code_string
);
For this version, you should also have an index/unique constraint on smalltable(unique_code_string).
It's hard to find an optimal solution with so little information. Often this depends on the network latency between application and database server and hardware resources.
You can load the 100,000,000 unique_code_string from the database and use HashSet or TreeSet to de-duplicate in-memory before inserting into the database. If your database server is resource constrained or there is considerable network latency this might be faster.
Depending how your receive the 100,000 records delta you could load it into the database e.g. a CSV file can be read using external table. If you can get the data efficiently into a temporary table and database server is not overloaded you can do it very efficiently with SQL or stored procedure.
You should spend some time to understand how real-time the update has to be e.g. how many SQL queries are reading the 100,000,000 row table and can you allow some of these SQL queries to be cancelled or blocked while you update the rows. Often it's a good idea to create a shadow table:
Create new table as copy of the existing 100,000,000 rows table.
Disable the indexes on the new table
Load the delta rows to the new table
Rebuild the indexes on new table
Delete the existing table
Rename the new table to the existing 100,000,000 rows table
The approach here is database specific. It will depend on how your database is defining the indexes e.g. if you have a partitioned table it might be not necessary.

Huge Update or Inserts into DB2 using JAVA and JPA 1.0

I have a JAVA requirement where i have 1500 records that I have to update or insert into the database.
If a record exists with userId, then update it.
If a record does not exist with userId, then Insert it.
And, if there is an error in lets say, 10th record,,,I need to get
the error code for that record.
It looks like I have 2 options using JPA 1.0
A) Fire a select to check if record exists. If yes, then fire update. If not, fire insert.
B) Fire an insert always,,,but i get an uniqe record exception, only then fire an update query..
Are there any other more efficient ways ? how can this be done with as few queries and as quick as possible ?
ENV- JAVA, JPA 1.0, DB2
You did not specify which version of DB2 you use and on which system. Anyway, check if MERGE statement is available on your DB:
LUW from 9.5.0: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0010873.html
Z/OS from 10.0.0: http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_merge.html
Another way is to do delete + insert on every record (poor performance).
Third option is to create dynamic one delete statement with listed ID/KEY in where clause from data you are going to update, fire delete and then insert all data.
Performance of every option will depend on table specification, indexes etc.
you can write query in mysql as below
//suppose a as pk
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1,b=b+1;
here update will run when record with pk as a=1 is already present
refer below link http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

SQL - INSERT IF NOT EXIST, CHECK if the same OR UPDATE

I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)
I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.
Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;
Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.

Performance of SELECT query- Oracle/JDBC

I have a existing query in the system which is a simple select query as follows:
SELECT <COLUMN_X>, <COLUMN_Y>, <COLUMN_Z> FROM TABLE <WHATEVER>
Over time, <WHATEVER> is growing in terms of records. Is there any way possible to improve the performance here? The developer is using Statement interface. I believe PreparedStatement won't help here since the query is executed only once.
Is there any thing else that can be done? One of the columns is a primary key and others are VARCHAR (if the information helps)
Does you query have any predicates? Or are you always returning all of the rows from the table?
If you are always returning all the rows, a covering index on column_x, column_y, column_z would allow Oracle to merely scan the index rather than doing a table scan. The query will still slow down over time but the index should grow more slowly than the table.
If you are returning a subset of rows, there are potentially other indexes that would be more advantageous from a performance perspective.
Are there any optimization you can do outside of the SQL query tunning? If yes here are some suggestion:
Try putting the table in memory (like the MEMORY storage engine in MySQL) or any other optimization in the DB
Cache the ResultSet in java. query again only when the table content changes. If the table only has inserts and no updates or delete (wishful thinking), then you can use SELECT COUNT(*) FROM table. If the rows returned are different than the previous time then fire your original query and update cache only if needed.

Categories

Resources