How change many records in an ORM database quickly?

How change many records in an ORM database quickly? - java

Situation: I need to change many records in database (10 000 records, in example), using ORMLite DAO. All records change only in one table, in one column and changing records, which have specified id.
Question: how update many records in database at once, using ORMLite DAO?
Now I update records, using this code:
imagesDao.update(imageOrmRecord);
But updating records in cycle very slow (100 records\sec).
I think that real update records, using SQL-code, but this is undesirable...

SQL is a set-oriented language. The whole point of an ORM is to abstract this away into objects.
So when you want to update a bunch of objects, you have to go through these objects.
(You have run into the object-relational impedance mismatch; also read The Vietnam of Computer Science.)
ORMLite gives you a backdoor to execute raw SQL:
someDao.executeRaw("UPDATE ...");
But if your only problem is performance, this is likely to be caused by the auto-commit mode, which adds transaction overhead to each single statement. Using callBatchTasks() would fix this.

Question: how update many records in database at once, using ORMLite DAO?
It depends a bit on what updates you are making. You can certainly use the UpdateBuilder which will make wholesale updates to objects.
UpdateBuilder<Account, String> updateBuilder = accountDao.updateBuilder();
// update the password to be "none"
updateBuilder.updateColumnValue("password", "none");
// only update the rows where password is null
updateBuilder.where().isNull(Account.PASSWORD_FIELD_NAME);
updateBuilder.update();
Or something like:
// update hasDog boolean to true if dogC > 0
updateBuilder.updateColumnExpression("hasDog", "dogC > 0");
You should be able to accomplish a large percentage of the updates that you would do using raw SQL this way.
But if you need to make per-entity updates then you will need to do dao.update(...) for each one. What I'd do then is to do it in a transaction to make the updates go faster. See this answer.

Related

How to write several item in table at the same time using ORMLite

I use ORMLite on a solution made by server and clients.
On server side I use PostgreSql, on client side I use SQLite.
In code, I use the same ORMLite methods, without taking care of the DB that is managed (PostgreSql or SQLite).
Let's say that:
Table A corresponds to class A
I have an Arraylist of objects A
I want to insert all items of ArrayList in DB.
Today I use a for() cycle, and I insert them one by one (doing it in Transaction Manager).
When the items are few, no problem, but now the items are becaming more and this is not probably the best way, also because I lock the DB for long time.
I'm searching a way to insert all the items in one step, so to go quickly, to not lock the DB for long time. I understood that it should be a sort of Stored Procedures (I'm not expert...).
To be noted that some items could be new (that is it not exist already an item with the same primary key id), then must be performed and INSERT; other items could be existing, so it should be performed an UPDATE.
Thank you

I'm searching a way to insert all the items in one step, so to go quickly, to not lock the DB for long time.
So there are two ways to do this that I know of: transactions and disabling auto-commit. If you are inserting into the database and it needs to all happen "at once" from a consistency standpoint, transactions are the only way to go. If you just want to insert and update a large number of records with higher performance then you can disable auto-commit, do the operations, and then commit. Depending on the database implementation, this is what the TransactionManager is really doing.
I understood that it should be a sort of Stored Procedures...
I don't see how stored procedures helps you at all. They aren't magic.
but now the items are becoming more and this is not probably the best way, also because I lock the DB for long time.
I don't think there is a magic solution to this. If you are pushing a large number of objects to the database and you need the data to be transactional, then locks are going to be have to be held during the updates. One thing to realize is that postgres should handle this a ton better than Sqlite. Sqlite does not (I don't think) have row level locking meaning that the whole DB is paused during transactions. Postgres has a much more mature locking system and should be more performant in this situation. This is also why Sqlite is so fast in many other operations because it doesn't have to burdened with the lock complexity.
One thing to consider is to rearchitect your schema. Try to figure out the minimal amount of data that needs to be transactionally inserted. For example, maybe just the object relationships needs to be changed transactionally but all of the data can be stored later. For example, you could have an AccountOwner object which just has 2 ids while all of information about the Account can be stored outside of the transaction. This makes your schema more complicated but maybe much faster.
Hope something here helps.

you can user entityManager.merge([list of items]);
the entityManager will insert the list in one shot.
Merge create the object if it doesn't exist in the database and updated if already exsit.

SQL - INSERT IF NOT EXIST, CHECK if the same OR UPDATE

I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)

I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.

Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;

Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.

How to make update faster with Hibernate while using huge number of records

I am facing issue while using the hibernate update (Session.update()) portion with huge number of records. it is becoming very slower. but there is no issue with the insert (Session.insert()) portion. is there any way to do the update portion while we do update on lakh's of records.is there any way to tune the sql server so that the update will become faster. while we add seperate indexes to all the primary fields then the delete portion is taking time. is there any better way to tune sql server so that it performs well with insert, delete and update.
Thank you,
Saif.

do a batch update instead of individual update for each record. this way you will only hit the database once for all the records.

When you do a save only the data is saved into the database whereas when your updating a record it has to first perform the search operation and then update the record that is why your facing issues on update and not on save when your are handling huge number of records can use hibernate's BATCH PROCESSING to update your records. Here is a good link for batch processing in hibernate from tutorials point:
http://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm

There may be other solutions to this but one way I know is:
Whenever you save or update an instance through session (e.g. session.save(), session.update(), session.saveOrUpdate() etc.), it also updates the instance FK associations.
So if your POJO has multiple FK associations, it will fire queries on those tables as well.
So instead of updating instance in this way, I would suggest to use HQL (if it applies to your requirement) to save or update instance.

is Select Before Update a good approach or vice versa?

I am developing an application using normal JDBC connection. The application is developed with Java-Java EE SpringsMVC 3.0 and SQL Server 08 as database. I am required to update a table based on a non primary key column.
Now, before updating the table we had to decide an approach for updating the table, as table may contain huge amount of data. The update Query will be executed in a batch and we are required to design application in a manner wherein it doesn't hog the system resources.
Now, We had to decide between either of the approaches,
1. SELECT DATA BEFORE YOU UPDATE or
2. UPDATE DATA AND THEN SELECT MISSING DATA.
Select data before update is only benificial if chances of failure are maximum, i.e. if a batch 100 Query update is executed, and out of which if only 20 rows are updated successfully, then this approach should be taken
Update data and then check missing data is benificial only when failure records are far less. By this ap[proach one database select call can be avoided, i.e after a batch update, the count of records updated can be taken and the select query should be executed if and only if theres is a count in mismatch w.r.t no of query.
We are totally unaware about the system on Production environment, but we want to counter for all possibilities and want a faster system. I need your inputs as which is a better approach.

Since there is 50:50 chance of successful updates or faster selects, its hard to tell from the current scenario mentioned. You probably would want a fuzzy logic approach, getting constant feedback of how many updates were successful over the period of time, and then decide on the basis of that data to either do an update before select or do a select before update.

Accessing database multiple times

I am working on solution of below mentioned but could not find any best practice/tool for this.
For a batch of requests(say 5000 unique ids and records) received in webservice, it has to fetch rows for those unique ids in database and keep them in buffer(or cache) and compare those with records received in webservice. If there is a change for a particular data(say column) that will be updated in table for that unique id. And in turn, the child tables of that table also get affected. For ex, if someone changes his laptop model number and country, model number will be updated in a table and country value in another table. Likewise it goes on accessing multiple tables in short time. The maximum records coming in a webservice call might reach 70K in one call in an hour.
I don't have any other option than implementing it in java. Is there any good practice of implementing this, or can it be achieved using any open source java tools. Please suggest. Thanks.

Hibernate is likely to be the first thing you should try. I tend to avoid because it is overkill for most of my applications but it is a standard tool for accessing database which anyone who knows Java should at least have an understanding of. There are dozens of other solutions you could use but Hibernate is the most often used.

JDBC is the API to use to access relational database. Useful performance and security tips:
use prepared statements
use where ... in () queries to load many rows at once, but beware on the limit in the number of values in the in clause (1000 max in Oracle)
use batched statements to make your updates, rather than executing each update separately (see http://download.oracle.com/javase/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html)
See http://download.oracle.com/javase/tutorial/jdbc/ for a tutorial on JDBC.

This sounds not that complicated. Of course, you must know (or learn):
SQL
JDBC
Then you can go through the web service data record by record and for each record do the following:
fetch corresponding database record
for each field in record
if updated
execute corresponding update SQL statement
commit // every so many records
70K records per hour should be not the slightest problem for a decent RDBMS.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.