Gentlemen/ladies,
I've got a problem with concurrent updates of the same entity.
Process 1 obtains collection of objects. This process doesn't use Hibernate to retrieve data for the sake of performance which sounds a bit far-fetched for me. This process also updates some fields of some objects from the collection using Hibernate.
Process 2 obtains an object similar to one of those in collection (basically the same row in DB) and updates it somehow. This process uses Hibernate.
Since process 1 and process 2 don't know about each other they can update the same entity, leaving it in non-consistent state.
For example:
process 1 obtains collection
process 2 obtains one entity and removes some of its properties along with an entity it was linking to
process 1 gets back and tries to save that entity and gets entity not found exception
I need to deal with this situation.
So what can be done?
For now I see two ways:
create layer above database that will keep track of every entity in the system effectively prohibiting from creating multiple instances of same entity
set up optimistic locks and since some entities are not obtained by Hibernate I need to implement it somehow different
Any ideas would be very helpful
Thanks in advance
Since process 1 and process 2 don't know about each other they can update the same entity, leaving it in non-consistent state.
I'd reformulate that: both processes can update the same data. Only Hibernate would know the entities while the other process seems to access the data via JDBC.
I'd go for option 2 which would involve a version column in your entities.
IIRC Hibernate would then add a WHERE version = x condition to the queries and check whether all rows have been updated and if not an OptimistictLockException would be thrown. You could do the same in your JDBC queries, i.e. UPDATE ... SET ... version = x + 1 ... WHERE version = x AND additionalConditions and check the number of updates rows that is returned by JDBC.
Related
Situation: I need to change many records in database (10 000 records, in example), using ORMLite DAO. All records change only in one table, in one column and changing records, which have specified id.
Question: how update many records in database at once, using ORMLite DAO?
Now I update records, using this code:
imagesDao.update(imageOrmRecord);
But updating records in cycle very slow (100 records\sec).
I think that real update records, using SQL-code, but this is undesirable...
SQL is a set-oriented language. The whole point of an ORM is to abstract this away into objects.
So when you want to update a bunch of objects, you have to go through these objects.
(You have run into the object-relational impedance mismatch; also read The Vietnam of Computer Science.)
ORMLite gives you a backdoor to execute raw SQL:
someDao.executeRaw("UPDATE ...");
But if your only problem is performance, this is likely to be caused by the auto-commit mode, which adds transaction overhead to each single statement. Using callBatchTasks() would fix this.
Question: how update many records in database at once, using ORMLite DAO?
It depends a bit on what updates you are making. You can certainly use the UpdateBuilder which will make wholesale updates to objects.
UpdateBuilder<Account, String> updateBuilder = accountDao.updateBuilder();
// update the password to be "none"
updateBuilder.updateColumnValue("password", "none");
// only update the rows where password is null
updateBuilder.where().isNull(Account.PASSWORD_FIELD_NAME);
updateBuilder.update();
Or something like:
// update hasDog boolean to true if dogC > 0
updateBuilder.updateColumnExpression("hasDog", "dogC > 0");
You should be able to accomplish a large percentage of the updates that you would do using raw SQL this way.
But if you need to make per-entity updates then you will need to do dao.update(...) for each one. What I'd do then is to do it in a transaction to make the updates go faster. See this answer.
I am running java application with multiple threads those will query from oracle database and if condition meets it will update row. But there are high chances that multiple threads gets same status for a row and then multiple thread try to update same row.
Lets say if status is "ACCEPTED" for any row then update it to "PROCESSING" status and then start processing, But processing should be done by only one thread who updated this record.
One approach is I query database and if status is "ACCEPTED" then update record, I need to write synchronized java method, but that will block multi-threading. So I wanted to use sql way for this situation.
Hibernate update method return type is void. So there is no way I can find if row got updated now or it was already updated. Is there any Select for Update or any locking thing in hibernate that can help me in this situation.
You can very well make use of Optimistic Locking through #Version.
Please look at the post below:
Optimistic Locking by concrete (Java) example
I think that your question is related to How to properly handle two threads updating the same row in a database
On top of this I woud say on top of the answer provided by #shankarsh that if you want to use a Query and not the entitymanager or the hibernate session you need to include the version field in your query like this:
update t.columntoUpdate,version = version + 1 from yourEntity where yourcondition and version = :version
This way the update will succeed only for a particular version and all the concurent updates will not update anything.
I have designed a Web based project.
I am using Mysql database. I will perform all persistence logic in java using hibernate. All the client side actions will be done in javascript.
Here my problem is,
If two users are trying to update same record simultaneously at different places.
Initially User-1 updates by giving full information related to a single object and called save information method.
At the other end User-2 updates same record by giving partial information and called save information method.
If User-1 information is saved first User-2 information will overwrite first given information. Hence some information might loss which user-1 given but he doesn't know some thing is loosed.
Please give some suggestions to overcome this problem.
I recommend you use the Optimistic Lock. Basically this technique is to have a field in the table to tell Hibernate which your version and thus whether an object with smaller version try to overwrite the data in a larger version hibernate will throw an exception. This versioning field is usually a numeric field that hibernate increases with every update or date field. The flow is something like:
1 - The record is inserted into the base. At this point the "version" field is set to zero.
2 - The X user query the record with version 0.
3 - The Y user query the record with version 0.
4 - The Y user updates the registry information. At that moment the hibernate automatically increments the version of record for 1.
5 - The X user updates the information on the version 0 and try to save. At that moment the hibernate finds that the record is already in version 1 that is greater than the version that the user X is using, in that it throws an exception stating the problem and not allowing overwriting the most current information.
To implement this strategy simply create a numeric field in your table and then apply #Version:
#Version
#Column(name = "version")
private Integer version;
What you need to consider, is a locking strategy for your data. Using Hibernate, by default you have no locking (a.k.a. Ostrich locking or "last save wins"). Roughly, the other two options are optimistic locking and pessimistic locking.
Optimistic locking means that you do not prevent users editing data concurrently, but you will inform a user if his edit failed because the data was saved from elsewhere after it was loaded from the DB.
Pessimistic locking means that you prevent multiple users for editing the data concurrently. This is a bit more complicated form of locking and is usually neither practical nor required.
More info on implementing a locking strategy can be found from Hibernate documentation. Which strategy you should choose depends a lot on your application and whether many users are expected to frequently edit the same information.
Before user 2 updates the DB, you can check if the information in the DB (e.g. the row) is the same as it was when user 2 reached the update/edit page. e.g. you could do a SELECT on the row when the user reaches the page and again after the user has made the changes (i.e. before the row is updated) and compare these before you update the DB.
If the row is the same, there was no changes. If the row is different, someone else had edited it.
I have a simple domain model as follows
Driver - key(string), run-count, unique-track-count
Track - key(string), run-count, unique-driver-count, best-time
Run - key(?), driver-key, track-key, time, boolean-driver-update, boolean-track-updated
I need to be able to update a Run and a Driver in the same transaction; as well as a Run and a Track in the same transaction (obviously to make sure i don't update the statistics twice, or miss out on an increment counter)
Now I have tried assigning as run key, a key made up of driver-key/track-key/run-key(string)
This will let me update in one transaction the Run entity and the Driver entity.
But if I try updating the Run and Track entities together, it will complain that it cannot transact over multiple groups. It says that it has both the Driver and the Truck in the transaction and it can't operate on both...
tx.begin();
run = pmf.getObjectById(Run.class, runKey);
track = pmf.getObjectById(Track.class, trackKey);
//This is where it fails;
incrementCounters();
updateUpdatedFlags();
tx.commit();
Strangely enough when I do a similar thing to update Run and Driver it works fine.
Any suggestions on how else I can map my domain model to achieve the same functionality?
With Google App Engine, all of the datastore operations must be on entities in the same entity group. This is because your data is usually stored across multiple tables, and Google App Engine cannot do transactions across multiple tables.
Entities with owned one-to-one and one-to-many relationships are automatically in the same entity group. So if an entity contains a reference to another entity, or a collection of entities, you can read or write to both in the same transactions. For entities that don't have an owner relationship, you can create an entity with an explicit entity group parent.
You could put all of the objects in the same entity group, but you might get some contention if too many users are trying to modify objects in an entity group at the same time. If every object is in its own entity group, you can't do any meaningful transactions. You want to do something in between.
One solution is to have Track and Run in the same entity group. You could do this by having Track contain a List of Runs (if you do this, then Track might not need run-count, unique-driver-count and best-time; they could be computed when needed). If you do not want Track to have a List of Runs, you can use an unowned one-to-many relationship and specify the entity group parent of the Run be its Track (see "Creating Entities With Entity Groups" on this page). Either way, if a Run is in the same entity group as its track, you could do transactions that involve a Run and some/all of its Tracks.
For many large systems, instead of using transactions for consistency, changes are done by making operations that are idempotent. For instance, if Driver and Run were not in the same entity group, you could update the run-count for a Driver by first doing a query to get the count of all runs before some date in the past, then, in a transaction, update the Driver with the new count and the date when it was last computed.
Keep in mind when using dates that machines can have some kind of a clock drift, which is why I suggested using a date in the past.
I think I found a lateral but still clean solution which still makes sense in my domain model.
The domain model changes slightly as follows:
Driver - key(string-id), driver-stats - ex. id="Michael", runs=17
Track - key(string-id), track-stats - ex. id="Monza", bestTime=157
RunData - key(string-id), stat-data - ex. id="Michael-Monza-20101010", time=148
TrackRun - key(Track/string-id), track-stats-updated - ex. id="Monza/Michael-Monza-20101010", track-stats-updated=false
DriverRun - key(Driver/string-id), driver-stats-updated - ex. id="Michael/Michael-Monza-20101010", driver-stats-updated=true
I can now update atomically (i.e. precisely) the statistics of a Track with the statistics from a Run, immediately or in my own time. (And same with the Driver / Run statistics).
So basically I have to expand a little bit the way I model my problem, in a non-conventional relational way. What do you think?
realize this is late, but..
Have you seen this method for Bank Account transfers?
http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine
It seems to me that you could do something similar by breaking out your increment counters into two steps as a IncrementEntity and process that, picking up the pieces later if a transaction fails etc.
From the blog:
In a transaction, deduct the required
amount from the paying account, and
create a Transfer child entity to
record this, specifying the receiving
account in the 'target' field, and
leaving the 'other' field blank for
now.
In a second transaction, add the
required amount to the receiving
account, and create a Transfer child
entity to record this, specifying the
paying account in the 'target' field,
and the Transfer entity created in
step 1 in the 'other' field.
Finally,
update the Transfer entity created in
step 1, setting the 'other' field to
the Transfer we created in step 2.
The blog has code examples in Python, but is should be easy to adapt
There's an interesting google io session on this topic http://www.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html
I guess you could update the Run stats and then fire two tasks to update the Driver and the Track individually.
I have a relatively simple object model:
ParentObject
Collection<ChildObject1>
ChildObject2
The MySQL operation when saving this object model does the following:
Update the ParentObject
Delete all previous items from the ChildObject1 table (about 10 rows)
Insert all new ChildObject1 (again, about 10 rows)
Insert ChildObject2
The objects / tables are unremarkable - no strings, rather mainly ints and longs.
MySQL is currently saving about 20-30 instances of the object model per second. When this goes into prodcution it's going to be doing upwards of a million saves, which at current speeds is going to take 10+ hours, which is no good to me...
I am using Java and Spring. I have profiled my app and the bottle neck is in the calls to MySQL by a long distance.
How would you suggest I increase the throughput?
You can get some speedup by tracking a dirty flag on your objects (especially your collection of child objects). You only delete/update the dirty ones. Depending on what % of them change on each write, you might save a good chunk.
The other thing you can do is do bulk writes via batch updating on the prepared statement. (Look at PreparedStatement.addBatch()) This can be an order of magnitude faster, but might not be record by record,e.g. might look something like:
delete all dirty-flagged children as a single batch command
update all parents as a single batch command
insert all dirty-flagged children as a single batch command.
Note that since you're dealing with millions of records you're probably not going to be able to load them all into a map and dump them at once, you'll have to stream them into a batch handler and dump the changes to the db 1000 records at a time or so. Once you've done this the actual speed is sensitive to the batch size, you'll have to determine the defaults by trial-and-error.
Deleting any existing ChildObject1 records from the table and then inserting the ChildObject1 instances from the current state of your Parent object seems unnecessary to me. Are the values of the all of the child objects different than what was previously stored?
A better solution might involve only modifying the database when you need to, i.e. when there has been a change in state of the ChildObject1 instances.
Rolling your own persistence logic for this type of thing can be hard (your persistence layer needs to know the state of the ChildObject1 objects when they were retrieved to compare them with the versions of the objects at save-time). You might want to look into using an ORM like Hibernate for something like this, which does an excellent job of knowing when it needs to update the records in the database or not.