How to check if I'm inserting in db from java - java

just a guide here, please!
I want to insert some values in my db, from java
I have my oracle prepared statement and stuff and it inserts ok, but my other requirement it's to send via email the fields that for some reason werent insert.
So, Im thinking making my method as int and return like 0 if theres an error 1 if its ok, 2 if some fields didn't insert ... etc... BUT
I'm lost figuring how to bind it, I mean if there's an error in the catch from sql in java i can put like
catch (SQLException e) {// TODO Auto-generated catch block
e.printStackTrace();
return result = 0;
And obviously i can put an error message and stuff where i call it, but how will i know in which item stops and then retrieve the rest of the fields that weren't insert...
any idea? anyone has an example?
Thanks in advance

First of all try to do as much validations possible in Java and identify the bad records before persisting them in database, you can catch those records before they hit database and send email.
In case you are worried about db level issues e.g. contrainst etc, then you need to choose Performance vs Flexibile Features
Option1 : Dont use batch insert one record at a time, whatever fails you can send email, this is bad from performance perspective but will solve your purpose
Option2 : Divide your records into batches of 50 or 100, whatever batch fails you can report that whole batch failed, this will also report some good records, but you will have information about what was not persisted.
Option3 : Divide your records into batches, if a batch fails try saving one record at a time for that particular batch, that ways you will be able to identify bad record and performance will be optimized, this will require more coding and testing.
So, you can choose what ever suits you, but I would recommend following:
Do maximum validations in Java
Split into smaller batches and use Option 3.
Cheers !!

When you have an insert you insert all fields included in the statement; or, none. So if throw an exception when running your statement, then you would send an email.
If no exception is thrown then you would assume that it has worked.
Alternatively, after running the insert, you could then attempt to do a select to check to see if the data you inserted was actually there.

If the statement failed then none of your field will be inserted, because oracle internally does a rollback to maintain consistency in your data.

Related

Workaround for merge using update in hibernate

I have a scenario where I am trying to insert and update some rows in my informix table. I am using hibernate.
I have observed that calling merge operations is not as efficient as calling update operations.
But I can't always use update operation as we may have a scenario where a particular item is inserted and then updated later. In this case if I use update operation then I get NonUniqueObjectException (different object with same identifier is already in the session).
I wrote a workaround for that and it seems to be working. I want to know if there is any problems with the below code which I may face in future or if I am missing something.
I have some way which tells me that a particular record needs to be updated or inserted.
My update code looks like this:
try{
session.update(entity);
}
catch(Exception e){
session.merge(entity);
}
I know that I am going to have very less scenario where the code in catch block will be executed so I am okay with this. Is this going to cause some unexpected problems?
Please keep in mind that I don't wanna use merge operation unless absolutely necessary since merge operations calls a select query before performing update and it is causing me performance issue.

Optimization for fetching from a bulky table

I have a PostgreSQL table that has millions of record. I need to process every row and for that I am using a column in that table namely 'isProcessed' so by default it's false and when I process it I change it to true.
Now the problem is that there are too many records and due to exceptions code bypasses some records leaving them isProcessed=false and that makes the execution really slow.
I was thinking to use indexing but with boolean it does not help.
Please provide some optimization technique or some better practice.
UPDATE:
I don't have the code, It just a problem my colleagues were asking for my opinion.
Normally an index on a Boolean isn't a good idea, but in PostgreSQL you can do an index where it contains only entries for one value using a partial index http://www.postgresql.org/docs/9.3/interactive/indexes-partial.html. It ends up being a queue of things for you to process, items drop off once done.
CREATE INDEX "yourtable_isProcessed_idx" ON "public"."yourtable"
USING btree ("isProcessed")
WHERE (isProcessed IS NOT TRUE);
This will make life easier when it is looking for the next item to process. Ideally you should be processing more than one at a time, particularly if you can do it in a single query, though doing millions at once may be prohibitive. In that situation, you might be able to do
update yourtable
set ....
where id in (select id from yourtable where isProcessed = false limit 100 )
If you have to do things one at a time, I'd still limit what you retrieve, so potentially retrieve
select id from yourtable where iProcessed = false limit 1
Without seeing your code, it would be tough to say what is really going on. Doing any processing row by row, which it sounds like is what is going on, is going to take a VERY long time.
Generally speaking, the best way to work with data is in sets. At the end of your process, you're going to ultimately have a set of records where isProcessed needs to be true (where the operation was successful), and a set where isProcessed needs to be false (where the operation failed). As you process the data, keep track of which records could be updated successfully, as well as which could not be updated. You could do this by making a list or array of the primary key or whatever other data you use to identify the rows. Then, after you're done processing your data, do one update to flag the records that were successful, and one to update the records that were not successful. This will be a bit more code, but updating each row individually after you process it is going to be awfully slow.
Again, seeing code would help, but if you're updating each record after you process it, I suspect this is what's slowing you down.
Here is approach I use. You should be able to store processing state including errors. It can be one column with values PENDING, PROCESSED, ERROR or two columns is_processed, is_error.
This is to be able skip records which couldn't be successfully processed and which if not skipped slow down processing of good tasks. You may try to reprocess them later or give DevOps possibility to move tasks from ERROR to PENDING state if the reason for failure for example was temporary unavailable resource.
Then you create conditional index on the table which include only PENDING tasks.
Processing is done using following algorithm (using spring: transaction and nestedTransaction are spring transaction templates):
while (!(batch = getNextBatch()).isEmpty()):
transaction.execute( (TransactionStatus status) -> {
for (Element element : batch) {
try {
nestedTransaction.execute( (TransactionStatuc status ) -> {
processElement(element);
markAsProcessed(element);
});
} catch (Exception e) {
markAsFailed(element);
}
}
});
Several important notes:
getting of records is done in batches - this at least saves round trips to database and is quicker then one by one retrieval
processing of individual elements is done in nested transaction (this is implemented using postgresql SAVEPOINTs). This is quicker then processing each element in own transaction but have the benefit that failure in processing of one element will not lose results of processing of others elements in batch.
This is good when processing is complex enough and cannot be done in sql by single query to process batch. If processElement rather simple update of element then whole batch may be updated via single update statement.
processing on elements of the batch may be done in parallel. This requires propagation of transaction to worker threads.

SQL - INSERT IF NOT EXIST, CHECK if the same OR UPDATE

I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)
I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.
Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;
Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.

Accessing database multiple times

I am working on solution of below mentioned but could not find any best practice/tool for this.
For a batch of requests(say 5000 unique ids and records) received in webservice, it has to fetch rows for those unique ids in database and keep them in buffer(or cache) and compare those with records received in webservice. If there is a change for a particular data(say column) that will be updated in table for that unique id. And in turn, the child tables of that table also get affected. For ex, if someone changes his laptop model number and country, model number will be updated in a table and country value in another table. Likewise it goes on accessing multiple tables in short time. The maximum records coming in a webservice call might reach 70K in one call in an hour.
I don't have any other option than implementing it in java. Is there any good practice of implementing this, or can it be achieved using any open source java tools. Please suggest. Thanks.
Hibernate is likely to be the first thing you should try. I tend to avoid because it is overkill for most of my applications but it is a standard tool for accessing database which anyone who knows Java should at least have an understanding of. There are dozens of other solutions you could use but Hibernate is the most often used.
JDBC is the API to use to access relational database. Useful performance and security tips:
use prepared statements
use where ... in () queries to load many rows at once, but beware on the limit in the number of values in the in clause (1000 max in Oracle)
use batched statements to make your updates, rather than executing each update separately (see http://download.oracle.com/javase/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html)
See http://download.oracle.com/javase/tutorial/jdbc/ for a tutorial on JDBC.
This sounds not that complicated. Of course, you must know (or learn):
SQL
JDBC
Then you can go through the web service data record by record and for each record do the following:
fetch corresponding database record
for each field in record
if updated
execute corresponding update SQL statement
commit // every so many records
70K records per hour should be not the slightest problem for a decent RDBMS.

MySQL Updates are taking forever

Hey, im trying to wirte about 600000 Tokens into my MySQL Database Table. The Engine I'm using is InnoDB. The update process is taking forever :(. So my best guess is that I'm totally missing something in my code and that what I'm doing is just plain stupid.
Perhaps someone has a spontaneous idea about what seems to eat my performance:
Here is my code:
public void writeTokens(Collection<Token> tokens){
try{
PreparedStatement updateToken = dbConnection.prepareStatement("UPDATE tokens SET `idTag`=?, `Value`=?, `Count`=?, `Frequency`=? WHERE `idToken`=?;");
for (Token token : tokens) {
updateToken.setInt(1, 0);
updateToken.setString(2, token.getWord());
updateToken.setInt(3, token.getCount());
updateToken.setInt(4, token.getFrequency());
updateToken.setInt(5, token.getNounID());
updateToken.executeUpdate();
}
}catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Thanks a lot!
I don't have a Java-specific answer for you, but wrap the whole shebang in a transaction. If you don't, then MySQL (when writing against InnoDB) starts and commits a new transaction per update statement.
Just execute START TRANSACTION before you start your calls, and execute COMMIT after all your updates/inserts are done. I also think that MySQL defers index updates until the end of the transaction, as well, which should help improve performance considerably if you're updating indexed fields.
If you have an index on one or more of the fields in your table, each update enforces a rebuild of those indices, which may in fact take a while as you approach several hundreds of thousands of entries.
PreparedStatement comes with an addBatch() method - I haven't used it but if I get it correctly, you can transmit several batches of records to your prepared statement and then update in one go. This reduces the number of index rebuilds from 600.000 to 1 - you should feel the difference :)
Each update statement requires a roundtrip to the database. This will give you a huge performance hit.
There are a couple of ways you insert this data into the database without performing hundreds of thousands of queries:
Use a bulk insert (LOAD DATA INFILE).
Use a single insert statement that inserts multiple rows at once. You could for example insert 100 rows per insert statement.
Then you can use a single update statement to copy the data into the target table. This will reduce the number of server roundtrips, improving the performance.

Categories

Resources