I am using Java EE 6 with JBOSS7 and JPA2 + Hibernate. For my client I provide a REST api.
My concern is how to efficiently ensure that no resources where modified concurrently. Should happen too often, but in case it happens I would like to ensure proper handling.
My approaches so far:
Map<String, ReentrantLock> to store the locks. (my ids are always
UUIDs) Locks are created on demand if missing in map. On this
approach i like that concurrent access will be blocked and i can
control how long the other thread tries to lock the resource.
Use JPA2 optimistic locking.
Which one would you recommend? Or is there an even better approach?
seems error-prone, plus it might not scale. I've never seen such
design and would discourage it.
transactions with optimistic
locking is a viable option. In this case, some transaction might
fail and you will need to deal with errors and retry.
transactions with pessimistic locking is another viable option. It's
like 1) but using the database to lock and order operations. AFAIK,
JPA support pessimistic locking as well. Otherwise you can use
SELECT FOR UPDATE(supported by most DBMS) to explicitely acquire row locks. Make sure you
figure out a scheme were locks are acquired in consistent order, to
avoid deadlocks.
The choice between 2-3 depends on the use case, e.g. if contention is expected to be high or not, or whether it is easy to retry a failed transaction.
Related
In case of hibernate jpa LockModeType.OPTIMISTIC_FORCE_INCREMENT, is this lock taken at application level or database level.
I am using following snippet for taking optimistic locks:
setting = this.entityManager.find(Setting.class, setting.getId(),
LockModeType.OPTIMISTIC_FORCE_INCREMENT);
setting.setUpdateTimestamp(new Date());
newSettingList.add(setting);
Suppose there are two jvm's running and both have same methods and there is conflict, will this locking mechanism work in this case?
My observation is that, whenever I was debugging and "newSettingList.add(setting);" at this line in code, I was not seeing any changes in database at that point. So how locking is ensured at database level?
Optimistic Locking is a strategy where you read a record, use version number to check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic.
Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks.
Better explanation here.
This means that since you use optimistic locking you don't intervene in the locks at the database level. What you you do is simply use the database to keep versioning of the objects-entities. For example:
a) You open a T1 transaction from the 1st jvm and read an object with version v1.
b) You open a T2 transaction from the 2nd jvm and read the same object with v1.(no update in this object has been made).
c) You update in T1 transaction the object and setting its version v2. you commit the transaction.
d) You try to have access again in db for the object but you get an exception because of the versioning.
But there is no need to have access from the same jvm for the 2 transactions
Why do I need Transaction in Hibernate for read-only operations?
Does the following transaction put a lock in the DB?
Example code to fetch from DB:
Transaction tx = HibernateUtil.getCurrentSession().beginTransaction(); // why begin transaction?
//readonly operation here
tx.commit() // why tx.commit? I don't want to write anything
Can I use session.close() instead of tx.commit()?
Transactions for reading might look indeed strange and often people don't mark methods for transactions in this case. But JDBC will create transaction anyway, it's just it will be working in autocommit=true if different option wasn't set explicitly. But there are practical reasons to mark transactions read-only:
Impact on databases
Read-only flag may let DBMS optimize such transactions or those running in parallel.
Having a transaction that spans multiple SELECT statements guarantees proper Isolation for levels starting from Repeatable Read or Snapshot (e.g. see PostgreSQL's Repeatable Read). Otherwise 2 SELECT statements could see inconsistent picture if another transaction commits in parallel. This isn't relevant when using Read Committed.
Impact on ORM
ORM may cause unpredictable results if you don't begin/finish transactions explicitly. E.g. Hibernate will open transaction before the 1st statement, but it won't finish it. So connection will be returned to the Connection Pool with an unfinished transaction. What happens then? JDBC keeps silence, thus this is implementation specific: MySQL, PostgreSQL drivers roll back such transaction, Oracle commits it. Note that this can also be configured on Connection Pool level, e.g. C3P0 gives you such an option, rollback by default.
Spring sets the FlushMode=MANUAL in case of read-only transactions, which leads to other optimizations like no need for dirty checks. This could lead to huge performance gain depending on how many objects you loaded.
Impact on architecture & clean code
There is no guarantee that your method doesn't write into the database. If you mark method as #Transactional(readonly=true), you'll dictate whether it's actually possible to write into DB in scope of this transaction. If your architecture is cumbersome and some team members may choose to put modification query where it's not expected, this flag will point you to the problematic place.
All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (e.g., BEGIN, COMMIT, ROLLBACK).
If you don't declare transaction boundaries explicitly, then each statement will have to be executed in a separate transaction (autocommit mode). This may even lead to opening and closing one connection per statement unless your environment can deal with connection-per-thread binding.
Declaring a service as #Transactional will give you one connection for the whole transaction duration, and all statements will use that single isolation connection. This is way better than not using explicit transactions in the first place.
On large applications, you may have many concurrent requests, and reducing database connection acquisition request rate will definitely improve your overall application performance.
JPA doesn't enforce transactions on read operations. Only writes end up throwing a TransactionRequiredException in case you forget to start a transactional context. Nevertheless, it's always better to declare transaction boundaries even for read-only transactions (in Spring #Transactional allows you to mark read-only transactions, which has a great performance benefit).
Transactions indeed put locks on the database — good database engines handle concurrent locks in a sensible way — and are useful with read-only use to ensure that no other transaction adds data that makes your view inconsistent. You always want a transaction (though sometimes it is reasonable to tune the isolation level, it's best not to do that to start out with); if you never write to the DB during your transaction, both committing and rolling back the transaction work out to be the same (and very cheap).
Now, if you're lucky and your queries against the DB are such that the ORM always maps them to single SQL queries, you can get away without explicit transactions, relying on the DB's built-in autocommit behavior, but ORMs are relatively complex systems so it isn't at all safe to rely on such behavior unless you go to a lot more work checking what the implementation actually does. Writing the explicit transaction boundaries in is far easier to get right (especially if you can do it with AOP or some similar ORM-driven technique; from Java 7 onwards try-with-resources could be used too I suppose).
It doesn't matter whether you only read or not - the database must still keep track of your resultset, because other database clients may want to write data that would change your resultset.
I have seen faulty programs to kill huge database systems, because they just read data, but never commit, forcing the transaction log to grow, because the DB can't release the transaction data before a COMMIT or ROLLBACK, even if the client did nothing for hours.
By clustered environment I mean same code running on multiple server machines.My scenario what I can think of is as follows
Multiple request come to update Card details based on expiry time from different threads at the same time. A snippet of code is following
synchronized(card) { //card object
if(card.isExpired())
updateCard()
}
My understanding is synchronized block works at jvm level so how in multiserver environment it is achieved.
Please suggest edit to rephrase question. I asked what I can recollect from a question asked to me.
As you said, synchronized block is only for "local JVM" threads.
When it comes to cluster, it is up to you how you drive your distributed transaction.
It really depends where your objects (e.g. card) are stored.
Database - You will probably need to use some locking strategy. Very likely optimistic locking that stores a version of entity and checks it when every change is made. Or more "safe" pessimistic locking where you lock the whole row when making changes.
Memory - You will probably need some memory grid solution (e.g. Hazelcast...) and make use of its transaction support or implement it by yourself
Any other? You will have specify...
See, in a clustered environment, you will usually have multiple JVMs running the same code. If traffic is high, then actually the number of JVMs could auto-scale and increase (new instances could be spawned). This is one of the reasons why you should be really careful when using static fields to keep data in a distributed environment.
Next, coming to your actual question, if you have a single jvm serving requests, then all other threads will have to wait to get that lock. If you have multiple JVMs running, then lock acquired by one thread on oneJVM will not prevent acquisition of the (in reality, not same, but conceptually same) lock by another thread in a different jvm.
I am assuming you want to handle that only one thread can edit the object or perform the action (based on the method name i.e updatecard) I suggest you implement optimistic locking (versioning), hibernate can do this quite easily, to prevent dirty read.
Why do I need Transaction in Hibernate for read-only operations?
Does the following transaction put a lock in the DB?
Example code to fetch from DB:
Transaction tx = HibernateUtil.getCurrentSession().beginTransaction(); // why begin transaction?
//readonly operation here
tx.commit() // why tx.commit? I don't want to write anything
Can I use session.close() instead of tx.commit()?
Transactions for reading might look indeed strange and often people don't mark methods for transactions in this case. But JDBC will create transaction anyway, it's just it will be working in autocommit=true if different option wasn't set explicitly. But there are practical reasons to mark transactions read-only:
Impact on databases
Read-only flag may let DBMS optimize such transactions or those running in parallel.
Having a transaction that spans multiple SELECT statements guarantees proper Isolation for levels starting from Repeatable Read or Snapshot (e.g. see PostgreSQL's Repeatable Read). Otherwise 2 SELECT statements could see inconsistent picture if another transaction commits in parallel. This isn't relevant when using Read Committed.
Impact on ORM
ORM may cause unpredictable results if you don't begin/finish transactions explicitly. E.g. Hibernate will open transaction before the 1st statement, but it won't finish it. So connection will be returned to the Connection Pool with an unfinished transaction. What happens then? JDBC keeps silence, thus this is implementation specific: MySQL, PostgreSQL drivers roll back such transaction, Oracle commits it. Note that this can also be configured on Connection Pool level, e.g. C3P0 gives you such an option, rollback by default.
Spring sets the FlushMode=MANUAL in case of read-only transactions, which leads to other optimizations like no need for dirty checks. This could lead to huge performance gain depending on how many objects you loaded.
Impact on architecture & clean code
There is no guarantee that your method doesn't write into the database. If you mark method as #Transactional(readonly=true), you'll dictate whether it's actually possible to write into DB in scope of this transaction. If your architecture is cumbersome and some team members may choose to put modification query where it's not expected, this flag will point you to the problematic place.
All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (e.g., BEGIN, COMMIT, ROLLBACK).
If you don't declare transaction boundaries explicitly, then each statement will have to be executed in a separate transaction (autocommit mode). This may even lead to opening and closing one connection per statement unless your environment can deal with connection-per-thread binding.
Declaring a service as #Transactional will give you one connection for the whole transaction duration, and all statements will use that single isolation connection. This is way better than not using explicit transactions in the first place.
On large applications, you may have many concurrent requests, and reducing database connection acquisition request rate will definitely improve your overall application performance.
JPA doesn't enforce transactions on read operations. Only writes end up throwing a TransactionRequiredException in case you forget to start a transactional context. Nevertheless, it's always better to declare transaction boundaries even for read-only transactions (in Spring #Transactional allows you to mark read-only transactions, which has a great performance benefit).
Transactions indeed put locks on the database — good database engines handle concurrent locks in a sensible way — and are useful with read-only use to ensure that no other transaction adds data that makes your view inconsistent. You always want a transaction (though sometimes it is reasonable to tune the isolation level, it's best not to do that to start out with); if you never write to the DB during your transaction, both committing and rolling back the transaction work out to be the same (and very cheap).
Now, if you're lucky and your queries against the DB are such that the ORM always maps them to single SQL queries, you can get away without explicit transactions, relying on the DB's built-in autocommit behavior, but ORMs are relatively complex systems so it isn't at all safe to rely on such behavior unless you go to a lot more work checking what the implementation actually does. Writing the explicit transaction boundaries in is far easier to get right (especially if you can do it with AOP or some similar ORM-driven technique; from Java 7 onwards try-with-resources could be used too I suppose).
It doesn't matter whether you only read or not - the database must still keep track of your resultset, because other database clients may want to write data that would change your resultset.
I have seen faulty programs to kill huge database systems, because they just read data, but never commit, forcing the transaction log to grow, because the DB can't release the transaction data before a COMMIT or ROLLBACK, even if the client did nothing for hours.
At one of presentations about Spring/Hibernate transactions I brought up an opinion that synchronized keyword on a method and #Transactional logically have many similarities. Sure enough they are totally different beasts but yet they both applied as aspects to the method and both control access to some resources via some kind of shared monitor (record in db, for example).
There were couple of people in the crowd who immediately opposed and claimed that my comparison is fatally wrong. I don't remember specific arguments but I can see some point here as well. For example, synchronized works for the entire method from the beginning and transaction will only have effect when statement to access DB is reached. Plus synchronized does not offer any read/write locking pattern.
So the question is, is my comparison totally wrong and I should never ever use it, or, with proper wording, would it make sense to present it to experienced engineers who know well how synchronized works but yet trying to learn about AOP transactions? What this wording should be?
A bit of update.
Apparently my question sounded like comparing DB transactions vs entering synchronized method in Java. That's not the case. My idea is more about comparing similarities in semantics of #Transactional and synchronized.
One of the reason I brought it up also was to illustrate propagation behavior. For example, if #Transactional is PROPAGATION_REQUIRED it will have many similarities to entering synchronized block. For transaction: if transaction is present we just continue using it and if not, we will create one. For synchronized, if we already have monitor we proceed with it and if not we will attempt to acquire it. Of course for #Transactional we are not going to lock on method boundary.
If we look at #Transactional as denoting a method that locks a database resource (because it is used in the transaction) - then the comparison makes some sense.
However this is all they have in common. synchronized is defined on an object monitor (and protects only it), which is known at the time of usage of the keyword, while a transaction may lock multiple resources (that are not known when the transaction starts), or may not lock any resources at all (optimistic locking, read-only transactions).
So ultimately - don't use that comparison, there are a lot more things that they differ in than they have in common.
The concepts embodied in #Transactional annotation are much more complex than those embodied in synchronized keyword. I agree with JB Nizet's comment that your comparison is counter-intuitive and would confuse your audience.
With Java synchronization, you always know exactly what is being locked, from which point in the code and to which point. You have built-in the concept of threads and a queue of threads competing for the same resource. Also, you are in effect locking code, not locking data. It may seem like a nuance, but the difference could be substantial.
With #Transactional, first you have the issue of transactions demarcation. You don't know exactly when a transaction begins, since you might reach this method after already having opened a transaction. For the same reason, you don't know if the transaction will end when you exit the method.
Secondly, transaction isolation semantics are much more complex then just synchronization (read-only, read-write, etc.). Many times isolation answers a concern about data integrity, and not intrinsically a concern about queuing access to a resource. Sometimes just one record is locked, sometimes a whole table (again, this is data, not code). Further more, transactions can be rolled-back, a concept that is important for data integrity and doesn't exist with synchronized.