I have read about transaction isolation levels. It is used to prevent parallel transaction executions errors. Its quite obvious.
There are also locking modes available for entities. I understand how they work.
But I cant find the reason why I need locking? I have used transaction isolation levels already. Why do I have to use locking?
Do isolation levels and locking make the same job?
Both transaction isolation and JPA Entity locking are concurrency control mechanisms.
The transaction isolation is applied on a JDBC Connection level and the scope is the transaction life-cycle itself (you can't change the transaction isolation from your current running transactions). Modern databases allow you to use both 2PL (two-phase locking) isolation levels and MVCC ones (SNAPSHOT_ISOLATION or PostgreSQL isolation levels). In MVCC, readers do not block writers and writers do not block readers (only writers block writers).
The Java Persistence Locking API offers both database-level and application-level concurrency control, which can be split in two categories:
Explicit Optimistic lock modes:
OPTIMISTIC
OPTIMISTIC_FORCE_INCREMENT
PESSIMISTIC_FORCE_INCREMENT
The optimistic locking uses version checks in UPDATE/DELETE statements and fail on version mismatches.
Explicit pessimistic lock modes:
PESSIMISTIC_READ
PESSIMISTIC_WRITE
The pessimistic lock modes use a database-specific lock syntax to acquire read (shared) or write (exclusive) locks (eg. SELECT ... FOR UPDATE).
An explicit lock mode is suitable when you run on a lower-consistency isolation level (READ_COMMITTED) and you want to acquire locks whose scope are upgraded from query life-time to a transaction life-time.
Introduction
There are different locking types and isolation levels. Some of the locking types (OPTIMISTIC*) are implemented on the JPA-level (eg. in EclipseLink or Hibernate), and other (PESSIMISTIC*) are delegated by the JPA-provider to the DB level.
Explanation
Isolation levels and locking are not the same, but they may intersect somewhere. If you have the SERIALIZED isolation level (which is performance-greedy), then you do not need any locking to do in JPA, as it is already done by the DB. On the other side, if you choose READ_COMMITTED, then you may need to make some locking, as the isolation level alone will not guarantee you e.g that the entry is not changed in the meanwhile by another transaction.
Related
I found that the #Transactional is used to ensure transaction on repository method or on a service method.
#Lock is used on repository method to ensure locking of entity to provide isolation.
Some questions are raised in my mind:
What are major difference/relations in these two annotations ?
When to use #Transactional and when to use #Lock ?
Is #Lock useful in distributed database system to provide data concurrency and consistency ?
Transactional: Whenever you put #Transactional annotation, it enables transactional behavior which qualifies ACID properties
ACID: ACID (Atomicity, Consistency, Isolation, Durability) is a set of
properties of database transactions intended to guarantee the validity
even in the event of errors.
Atomic
Guarantees that all operations in a transaction are treated as a single “unit”, which either succeeds completely or fails completely.
Consistent
Ensures that a transaction can only bring the database from one valid state to another by preventing data corruption.
Isolation
Determines how and when changes made by one transaction become visible to the other. Serializable and Snapshot Isolation are the top 2 isolation levels from a strictness standpoint.
Durable
Ensures that the results of the transaction are permanently stored in the system. The modifications must persist even in case of power loss or system failures.
Lock: It should not be confused with transactional,#Lock enables locking behavior during a transaction
JPA has two main lock types defined.
Pessimistic Locking
Optimistic Locking
If you want to know more about Pessimistic and Obtimistic locking you can explore the internet, below is explanation from Baeldung,
Pessimistic Locking When we are using Pessimistic Locking in a
transaction and access an entity, it will be locked immediately. The
transaction releases the lock either by committing or rolling back the
transaction.
Optimistic Locking In Optimistic Locking, the transaction doesn't lock
the entity immediately. Instead, the transaction commonly saves the
entity's state with a version number assigned to it.
When we try to update the entity's state in a different transaction,
the transaction compares the saved version number with the existing
version number during an update.
At this point, if the version number differs, it means that the entity
can't be modified. If there is an active transaction then that
transaction will be rolled back and the underlying JPA implementation
will throw an OptimisticLockException.
Apart from the version number approach, we can use other approaches
such as timestamps, hash value computation, or serialized checksum,
depending on which approach is the most suitable for our current
development context.
There are also other lock types available in spring
NONE: No lock.
OPTIMISTIC: Optimistic lock.
OPTIMISTIC_FORCE_INCREMENT: Optimistic lock, with version update.
PESSIMISTIC_FORCE_INCREMENT: Pessimistic write lock, with version update
PESSIMISTIC_READ: Pessimistic read lock.
PESSIMISTIC_WRITE: Pessimistic write lock.
READ: Synonymous with OPTIMISTIC.
WRITE: Synonymous with OPTIMISTIC_FORCE_INCREMENT.
Now answer to your questions
What are the major differences/relations in these two annotations?
You will understand after reading above
When to use #Transactional and when to use #Lock?
If you want transactional behavior then add #transactional and if your usecase requires locking and as per use case use appropriate locking
Is #Lock useful in the distributed database system to provide data
concurrency and consistency?
The two main tools we use to cope with concurrency are database transactions and distributed locks. These two are not interchangeable. You can't use a transaction when you need a lock. You can't use a lock when you need a transaction. source
I have a #Transactional method where I request a record using findById() from a Spring Data repository.
Now I want to lock this object in a way that other #Transactional methods if executed in parallel will wait until the lock is released.
What I tried: I went through the documentation of #Lock and LockModeType but I still can't figure out if it covers my case.
Another option would be select for update on a DB level, but I am not sure it is the best option.
Q: How to lock an Entity object (db row) in a spring data transaction and make other transactions wait?
What you want is called Pessimistic locking. Spring Data through JPA specification supports that. There are two modes of pessimistic locking available:
I think you are mixing up Transaction isolation with Locking. Through #Transactional annotation you can specify isolation which is applied to the connection, while through spirng-data #Locking you can define the locking mode. The Lock mode is not related to the transaction isolation specified on #Transactional. Effectively if the isolation is SERIALIZEABLE you would not need locking via JPA or spring-data, because the database will take care of it. The locks will be DB controlled in this case.
When you define locking the control is in your hands. Here is a description of the pessimistic lock types.
PESSIMISTIC_WRITE - all transactions including the read only ones will be blocked until the transaction holding the lock is complete. Dirty reads are not allowed.
PESSIMISTIC_READ - when you have this mode , concurrent transactions can read the data of the locked row. All updates need to obtain PESSIMISTIC_WRITE lock.
On top of this you can specify a timeout for the locking transaction completion.
In case of hibernate jpa LockModeType.OPTIMISTIC_FORCE_INCREMENT, is this lock taken at application level or database level.
I am using following snippet for taking optimistic locks:
setting = this.entityManager.find(Setting.class, setting.getId(),
LockModeType.OPTIMISTIC_FORCE_INCREMENT);
setting.setUpdateTimestamp(new Date());
newSettingList.add(setting);
Suppose there are two jvm's running and both have same methods and there is conflict, will this locking mechanism work in this case?
My observation is that, whenever I was debugging and "newSettingList.add(setting);" at this line in code, I was not seeing any changes in database at that point. So how locking is ensured at database level?
Optimistic Locking is a strategy where you read a record, use version number to check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic.
Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks.
Better explanation here.
This means that since you use optimistic locking you don't intervene in the locks at the database level. What you you do is simply use the database to keep versioning of the objects-entities. For example:
a) You open a T1 transaction from the 1st jvm and read an object with version v1.
b) You open a T2 transaction from the 2nd jvm and read the same object with v1.(no update in this object has been made).
c) You update in T1 transaction the object and setting its version v2. you commit the transaction.
d) You try to have access again in db for the object but you get an exception because of the versioning.
But there is no need to have access from the same jvm for the 2 transactions
Why do I need Transaction in Hibernate for read-only operations?
Does the following transaction put a lock in the DB?
Example code to fetch from DB:
Transaction tx = HibernateUtil.getCurrentSession().beginTransaction(); // why begin transaction?
//readonly operation here
tx.commit() // why tx.commit? I don't want to write anything
Can I use session.close() instead of tx.commit()?
Transactions for reading might look indeed strange and often people don't mark methods for transactions in this case. But JDBC will create transaction anyway, it's just it will be working in autocommit=true if different option wasn't set explicitly. But there are practical reasons to mark transactions read-only:
Impact on databases
Read-only flag may let DBMS optimize such transactions or those running in parallel.
Having a transaction that spans multiple SELECT statements guarantees proper Isolation for levels starting from Repeatable Read or Snapshot (e.g. see PostgreSQL's Repeatable Read). Otherwise 2 SELECT statements could see inconsistent picture if another transaction commits in parallel. This isn't relevant when using Read Committed.
Impact on ORM
ORM may cause unpredictable results if you don't begin/finish transactions explicitly. E.g. Hibernate will open transaction before the 1st statement, but it won't finish it. So connection will be returned to the Connection Pool with an unfinished transaction. What happens then? JDBC keeps silence, thus this is implementation specific: MySQL, PostgreSQL drivers roll back such transaction, Oracle commits it. Note that this can also be configured on Connection Pool level, e.g. C3P0 gives you such an option, rollback by default.
Spring sets the FlushMode=MANUAL in case of read-only transactions, which leads to other optimizations like no need for dirty checks. This could lead to huge performance gain depending on how many objects you loaded.
Impact on architecture & clean code
There is no guarantee that your method doesn't write into the database. If you mark method as #Transactional(readonly=true), you'll dictate whether it's actually possible to write into DB in scope of this transaction. If your architecture is cumbersome and some team members may choose to put modification query where it's not expected, this flag will point you to the problematic place.
All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (e.g., BEGIN, COMMIT, ROLLBACK).
If you don't declare transaction boundaries explicitly, then each statement will have to be executed in a separate transaction (autocommit mode). This may even lead to opening and closing one connection per statement unless your environment can deal with connection-per-thread binding.
Declaring a service as #Transactional will give you one connection for the whole transaction duration, and all statements will use that single isolation connection. This is way better than not using explicit transactions in the first place.
On large applications, you may have many concurrent requests, and reducing database connection acquisition request rate will definitely improve your overall application performance.
JPA doesn't enforce transactions on read operations. Only writes end up throwing a TransactionRequiredException in case you forget to start a transactional context. Nevertheless, it's always better to declare transaction boundaries even for read-only transactions (in Spring #Transactional allows you to mark read-only transactions, which has a great performance benefit).
Transactions indeed put locks on the database — good database engines handle concurrent locks in a sensible way — and are useful with read-only use to ensure that no other transaction adds data that makes your view inconsistent. You always want a transaction (though sometimes it is reasonable to tune the isolation level, it's best not to do that to start out with); if you never write to the DB during your transaction, both committing and rolling back the transaction work out to be the same (and very cheap).
Now, if you're lucky and your queries against the DB are such that the ORM always maps them to single SQL queries, you can get away without explicit transactions, relying on the DB's built-in autocommit behavior, but ORMs are relatively complex systems so it isn't at all safe to rely on such behavior unless you go to a lot more work checking what the implementation actually does. Writing the explicit transaction boundaries in is far easier to get right (especially if you can do it with AOP or some similar ORM-driven technique; from Java 7 onwards try-with-resources could be used too I suppose).
It doesn't matter whether you only read or not - the database must still keep track of your resultset, because other database clients may want to write data that would change your resultset.
I have seen faulty programs to kill huge database systems, because they just read data, but never commit, forcing the transaction log to grow, because the DB can't release the transaction data before a COMMIT or ROLLBACK, even if the client did nothing for hours.
Why do I need Transaction in Hibernate for read-only operations?
Does the following transaction put a lock in the DB?
Example code to fetch from DB:
Transaction tx = HibernateUtil.getCurrentSession().beginTransaction(); // why begin transaction?
//readonly operation here
tx.commit() // why tx.commit? I don't want to write anything
Can I use session.close() instead of tx.commit()?
Transactions for reading might look indeed strange and often people don't mark methods for transactions in this case. But JDBC will create transaction anyway, it's just it will be working in autocommit=true if different option wasn't set explicitly. But there are practical reasons to mark transactions read-only:
Impact on databases
Read-only flag may let DBMS optimize such transactions or those running in parallel.
Having a transaction that spans multiple SELECT statements guarantees proper Isolation for levels starting from Repeatable Read or Snapshot (e.g. see PostgreSQL's Repeatable Read). Otherwise 2 SELECT statements could see inconsistent picture if another transaction commits in parallel. This isn't relevant when using Read Committed.
Impact on ORM
ORM may cause unpredictable results if you don't begin/finish transactions explicitly. E.g. Hibernate will open transaction before the 1st statement, but it won't finish it. So connection will be returned to the Connection Pool with an unfinished transaction. What happens then? JDBC keeps silence, thus this is implementation specific: MySQL, PostgreSQL drivers roll back such transaction, Oracle commits it. Note that this can also be configured on Connection Pool level, e.g. C3P0 gives you such an option, rollback by default.
Spring sets the FlushMode=MANUAL in case of read-only transactions, which leads to other optimizations like no need for dirty checks. This could lead to huge performance gain depending on how many objects you loaded.
Impact on architecture & clean code
There is no guarantee that your method doesn't write into the database. If you mark method as #Transactional(readonly=true), you'll dictate whether it's actually possible to write into DB in scope of this transaction. If your architecture is cumbersome and some team members may choose to put modification query where it's not expected, this flag will point you to the problematic place.
All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (e.g., BEGIN, COMMIT, ROLLBACK).
If you don't declare transaction boundaries explicitly, then each statement will have to be executed in a separate transaction (autocommit mode). This may even lead to opening and closing one connection per statement unless your environment can deal with connection-per-thread binding.
Declaring a service as #Transactional will give you one connection for the whole transaction duration, and all statements will use that single isolation connection. This is way better than not using explicit transactions in the first place.
On large applications, you may have many concurrent requests, and reducing database connection acquisition request rate will definitely improve your overall application performance.
JPA doesn't enforce transactions on read operations. Only writes end up throwing a TransactionRequiredException in case you forget to start a transactional context. Nevertheless, it's always better to declare transaction boundaries even for read-only transactions (in Spring #Transactional allows you to mark read-only transactions, which has a great performance benefit).
Transactions indeed put locks on the database — good database engines handle concurrent locks in a sensible way — and are useful with read-only use to ensure that no other transaction adds data that makes your view inconsistent. You always want a transaction (though sometimes it is reasonable to tune the isolation level, it's best not to do that to start out with); if you never write to the DB during your transaction, both committing and rolling back the transaction work out to be the same (and very cheap).
Now, if you're lucky and your queries against the DB are such that the ORM always maps them to single SQL queries, you can get away without explicit transactions, relying on the DB's built-in autocommit behavior, but ORMs are relatively complex systems so it isn't at all safe to rely on such behavior unless you go to a lot more work checking what the implementation actually does. Writing the explicit transaction boundaries in is far easier to get right (especially if you can do it with AOP or some similar ORM-driven technique; from Java 7 onwards try-with-resources could be used too I suppose).
It doesn't matter whether you only read or not - the database must still keep track of your resultset, because other database clients may want to write data that would change your resultset.
I have seen faulty programs to kill huge database systems, because they just read data, but never commit, forcing the transaction log to grow, because the DB can't release the transaction data before a COMMIT or ROLLBACK, even if the client did nothing for hours.