At one of presentations about Spring/Hibernate transactions I brought up an opinion that synchronized keyword on a method and #Transactional logically have many similarities. Sure enough they are totally different beasts but yet they both applied as aspects to the method and both control access to some resources via some kind of shared monitor (record in db, for example).
There were couple of people in the crowd who immediately opposed and claimed that my comparison is fatally wrong. I don't remember specific arguments but I can see some point here as well. For example, synchronized works for the entire method from the beginning and transaction will only have effect when statement to access DB is reached. Plus synchronized does not offer any read/write locking pattern.
So the question is, is my comparison totally wrong and I should never ever use it, or, with proper wording, would it make sense to present it to experienced engineers who know well how synchronized works but yet trying to learn about AOP transactions? What this wording should be?
A bit of update.
Apparently my question sounded like comparing DB transactions vs entering synchronized method in Java. That's not the case. My idea is more about comparing similarities in semantics of #Transactional and synchronized.
One of the reason I brought it up also was to illustrate propagation behavior. For example, if #Transactional is PROPAGATION_REQUIRED it will have many similarities to entering synchronized block. For transaction: if transaction is present we just continue using it and if not, we will create one. For synchronized, if we already have monitor we proceed with it and if not we will attempt to acquire it. Of course for #Transactional we are not going to lock on method boundary.
If we look at #Transactional as denoting a method that locks a database resource (because it is used in the transaction) - then the comparison makes some sense.
However this is all they have in common. synchronized is defined on an object monitor (and protects only it), which is known at the time of usage of the keyword, while a transaction may lock multiple resources (that are not known when the transaction starts), or may not lock any resources at all (optimistic locking, read-only transactions).
So ultimately - don't use that comparison, there are a lot more things that they differ in than they have in common.
The concepts embodied in #Transactional annotation are much more complex than those embodied in synchronized keyword. I agree with JB Nizet's comment that your comparison is counter-intuitive and would confuse your audience.
With Java synchronization, you always know exactly what is being locked, from which point in the code and to which point. You have built-in the concept of threads and a queue of threads competing for the same resource. Also, you are in effect locking code, not locking data. It may seem like a nuance, but the difference could be substantial.
With #Transactional, first you have the issue of transactions demarcation. You don't know exactly when a transaction begins, since you might reach this method after already having opened a transaction. For the same reason, you don't know if the transaction will end when you exit the method.
Secondly, transaction isolation semantics are much more complex then just synchronization (read-only, read-write, etc.). Many times isolation answers a concern about data integrity, and not intrinsically a concern about queuing access to a resource. Sometimes just one record is locked, sometimes a whole table (again, this is data, not code). Further more, transactions can be rolled-back, a concept that is important for data integrity and doesn't exist with synchronized.
Related
Suppose I have a method that checks for a id in the db and if the id doesn't exit then inserts a value with that id. How do I know if this is thread safe and how do I ensure that its thread safe. Are there any general rules that I can use to ensure that it doesn't contain race conditions and is generally thread safe.
public TestEntity save(TestEntity entity) {
if (entity.getId() == null) {
entity.setId(UUID.randomUUID().toString());
}
Map<String, TestEntity > map = dbConnection.getMap(DB_NAME);
map.put(entity.getId(), entity);
return map.get(entity.getId());
}
This is a how long is a piece of string question...
A method will be thread safe if it uses the synchronized keyword in its declaration.
However, even if your setId and getId methods used synchronized keyword, your process of setting the id (if it has not been previously initialized) above is not. .. but even then there is an "it depends" aspect to the question. If it is impossible for two threads to ever get the same object with an uninitialised id then you are thread safe because you would never be attempting to concurrently modifying the id.
It is entirely possible, given the code in your question, that there could be two calls to the thread safe getid at the same time for the same object. One by one they get the return value (null) and immediately get pre-empted to let the other thread run. This means both will then run the thread safe setId method - again one by one.
You could declare the whole save method as synchronized, but if you do that the entire method will be single threaded which defeats the purpose of using threads in the first place. You tend to want to minimize the synchronized code to the bare minimum to maximize concurrency.
You could also put a synchronized block around the critical if statement and minimise the single threaded part of the processing, but then you would also need to be careful if there were other parts of the code that might also set the Id if it wasn't previously initialized.
Another possibility which has various pros and cons is to put the initialization of the Id into the get method and make that method synchronized, or simply assign the Id when the object is created in the constructor.
I hope this helps...
Edit...
The above talks about java language features. A few people mentioned facilities in the java class libraries (e.g. java.util.concurrent) which also provide support for concurrency. So that is a good add on, but there are also whole packages which address the concurrency and other related parallel programming paradigms (e.g. parallelism) in various ways.
To complete the list I would add tools such as Akka and Cats-effect (concurrency) and more.
Not to mention the books and courses devoted to the subject.
I just reread your question and noted that you are asking about databases. Again the answer is it depends. Rdbms' usually let you do this type of operation with record locks usually in a transaction. Some (like teradata) use special clauses such as locking row for write select * from some table where pi_cols = 'somevalues' which locks the rowhash to you until you update it or certain other conditions. This is known as pessimistic locking.
Others (notebly nosql) have optimistic locking. This is when you read the record (like you are implying with getid) there is no opportunity to lock the record. Then you do a conditional update. The conditional update is sort of like this: write the id as x provided that when you try to do so the Id is still null (or whatever the value was when you checked). These types of operations are usually down through an API.
You can also do optimistics locking in an RDBMs as follows:
SQL
Update tbl
Set x = 'some value',
Last_update_timestamp = current_timestamp()
Where x = bull AND last_update_timestamp = 'same value as when I last checked'
In this example the second part of the where clause is the critical bit which basically says "only update the record if no one else did and I trust that everyone else will update the last update to when they do". The "trust" bit can sometimes be replaced by triggers.
These types of database operations (if available) are guaranteed by the database engine to be "thread safe".
Which loops me back to the "how long is a piece of string" observation at the beginning of this answer...
Test-and-set is unsafe
a method that checks for a id in the db and if the id doesn't exit then inserts a value with that id.
Any test-and-set pair of operations on a shared resource is inherently unsafe, vulnerable to a race condition. If the two operations are separate (not atomic), then they must be protected as a pair. While one thread completes the test but has not yet done the set, another thread could sneak in and do both the test and the set. The first thread now completes its set without knowing a duplicate action has occurred.
Providing that necessary protection is too broad a topic for an Answer on Stack Overflow, as others have said here.
UPSERT
However, let me point out that an alternative approach to to make the test-and-set atomic.
In the context of a database, that can be done using the UPSERT feature. Also known as a Merge operation. For example, in Postgres 9.5 and later we have the INSERT INTO … ON CONFLICT command. See this explanation for details.
In the context of a Boolean-style flag, a semaphore makes the test-and-set atomic.
In general, when we say "a method is thread-safe" when there is no race-condition to the internal and external data structure of the object it belongs to. In other words, the order of the method calls are strictly enforced.
For example, let's say you have a HashMap object and two threads, thread_a and thread_b.
thread_a calls put("a", "a") and thread_b calls put("a", "b").
The put method is not thread-safe (refer to its documentation) in the sense that while thread_a is executing its put, thread_b can also go in and execute its own put.
A put contains reading and writing part.
thread_a.read("a")
thread_b.read("a")
thread_b.write("a", "b")
thread_a.write("a", "a")
If above sequence happens, you can say ... a method is not thread-safe.
How to make a method thread-safe is by ensuring the state of the whole object cannot be perturbed while the thread-safe method is executing. An easier way is to put "synchronized" keyword in method declarations.
If you are worried about performance, use manual locking using synchronized blocks with a lock object. Further performance improvement can be achieved using a very well designed semaphores.
Imagine a situation where multiple processes try to use a shared resource.
You can protect it by using a java monitor ( for example - synchronized methods).
But what if your classes must obey to that protocol?
request method - critical section - end method
Any process is the only one executing the request and end methods simultaneously, thanks to the synchronized blocks, but what about the core of the critical section?
Using other constructs like Semaphores or Lock/Condition you can make it easily, but with native monitor you are bonded to the fact that a synchronization is identified by a block that cannot cross multiple methods.
If you use a boolean that tells you whether the resource is busy (calling wait() right after) or not, deadlock can occurr!
So, what could be a good solution for this?
Imagine a situation where...
There's a name for that, it's long transaction, and if you think you need to implement it, that's a sign that it may be time to re-think your design.
Why it's bad, and how to avoid it is a book-level topic.
Here's one book that covers it pretty well:
https://www.amazon.com/Patterns-Enterprise-Application-Architecture-Martin/dp/0321127420
By clustered environment I mean same code running on multiple server machines.My scenario what I can think of is as follows
Multiple request come to update Card details based on expiry time from different threads at the same time. A snippet of code is following
synchronized(card) { //card object
if(card.isExpired())
updateCard()
}
My understanding is synchronized block works at jvm level so how in multiserver environment it is achieved.
Please suggest edit to rephrase question. I asked what I can recollect from a question asked to me.
As you said, synchronized block is only for "local JVM" threads.
When it comes to cluster, it is up to you how you drive your distributed transaction.
It really depends where your objects (e.g. card) are stored.
Database - You will probably need to use some locking strategy. Very likely optimistic locking that stores a version of entity and checks it when every change is made. Or more "safe" pessimistic locking where you lock the whole row when making changes.
Memory - You will probably need some memory grid solution (e.g. Hazelcast...) and make use of its transaction support or implement it by yourself
Any other? You will have specify...
See, in a clustered environment, you will usually have multiple JVMs running the same code. If traffic is high, then actually the number of JVMs could auto-scale and increase (new instances could be spawned). This is one of the reasons why you should be really careful when using static fields to keep data in a distributed environment.
Next, coming to your actual question, if you have a single jvm serving requests, then all other threads will have to wait to get that lock. If you have multiple JVMs running, then lock acquired by one thread on oneJVM will not prevent acquisition of the (in reality, not same, but conceptually same) lock by another thread in a different jvm.
I am assuming you want to handle that only one thread can edit the object or perform the action (based on the method name i.e updatecard) I suggest you implement optimistic locking (versioning), hibernate can do this quite easily, to prevent dirty read.
Here's my thinking:
Even though a HTTP request cycle is essentially handled by a 'single thread', each time a HTTP request is processed for that same session it is likely to be processed by a different thread from the thread pool.
Without the volatile keyword being used on a domain model object, whose lifecycle extends across multiple HTTP requests for the same session, then, according to my understanding, isn't it possible that the attribute could be thread local cached (an optimization by the compiler) in the thread that serviced the first HTTP request? If the second HTTP request is serviced by another thread then that second thread may not see the changes in that attribute that were made by the first thread.
Does this spell "Danger Will Robinson"? Or am I missing a vital plot point about the use (or not) of the volatile keyword?
I think you are forgetting that the threads handling the HTTP request first need to retrieve the instance of the domain model object from the HttpSession provided by your application server. The thread handling request 2 in the scenario you describe does not already have an instance of this domain model - it has to retrieve it from the session implementation at the start of handling each and every request.
I think it is completely reasonable to assume that the session-handling implementation in your application server is handling session data in such a way that memory model visibility issues are avoided. Apache Tomcat's default (non-clustered) HttpSession implementation, for example, stores the session attributes in a ConcurrentHashMap.
Adding volatile seems completely unnecessary to me. I have never seen this done for domain model objects handled by HTTP requests in a Servlet environment in any project I have worked in.
This would be a different story if thread-1 and thread-2 had references to the same object instance simulatenously while processing two different requests, and you were concerned about changes in one thread being visible to the other as each are processing the request, but this does not sound like what you are asking about.
Yes, if you are sharing an object between different threads, you may have race conditions. Without a happens before relationship, writes made by one thread may not be seen by a read in another thread.
Doing a volatile write in one thread and doing a volatile read of the same field in another thread establishes a happens before relationship between the two threads, and ensures visibility of the write.
This is a complicated problem, simply using a volatile keyword is probably not a good solution.
I think your understanding of it is correct. Given your description I would say it should be used. If its something more than a primitive type I would rather synchronize.
Good information on volatile:
http://www.javamex.com/tutorials/synchronization_volatile_when.shtml
If you have a mutable object in session, that is trouble. But usually the solution is not to guard individual fields; rather the entire object should be swapped.
Say you have the user object in the session. Most requests simply retrieve it, read it and display it.
There is a request that can modify user information. It would be a really bad idea to retrieve the user object, modify it. It's better to create complete new user object, and insert it into session.
In that case, fields in User don't need any protection; thread safety is guaranteed by session setAttribute() - getAttribute()
If you have concurrency issues, just adding 'volatile' probably won't help you.
As for keeping the object as an attribute of Session, I'd recommend you to keep just the object's ID, and use it to retrieve a 'live' instance when you need it (if you use Hibernate, successive retrieves will return the same object, so this shouldn't cause performance problems). Encapsulate all modification logic to this specific object into a single façade, and do the control concurrency there, using dababase locking.
Or, if you really, really, really want to use memory-based locking, and are really sure that you'll never have two instances of the application running in a cluster, make sure that your façade logic is synchronized at the right level. If your synchronization is too fine grained (low-level operations, such as volatile variables), it probably won't be enough to make your code thread-safe. For example, java.util.Hashtable is fully synchronized, but it doesn't mean anything if you have logic like this:
01 if (!hashtable.containsKey(key)) {
02 hashtable.put(key, calculate(key));
03 }
If two threads, say, t1 and t2, hit this block at the same time, t1 may execute line 01, then t2 may also execute 01, and then 02, and t1 then will execute 02, overwriting what t2 had done. The operations containsKey() and put() are atomic individually, but what should be atomic is the whole block.
Sometimes recalculating a value doesn't matter, but sometimes it does, and it will break.
When it comes to concurrency, there's no magic. I mean, seam some crappy frameworks try to sell you the idea that they solve this problem for you. They don't. Even if it works 99% of the time, it will break spectacularly when you go to production and start to get heavy traffic. Or (much, much) worse, it will silently generate wrong results.
Concurrency is one of the most complex problems in programming. And the only way to handle it is to avoid it. All this functional programming trend is not about dealing with concurrency, is about avoiding it altogether.
It turns out that volatile was not needed in the end. The problem that "appeared" to be fixed with volatile was actually a very subtle timing sensitive bug that was fixed in a much more elegant and proper way ;)
So sbrigdes was correct when he said "simply using a volatile keyword is probably not a good solution."
Java 6 API question. Does calling LockSupport.unpark(thread) have a happens-before relationship to the return from LockSupport.park in the just-unparked thread? I strongly suspect the answer is yes, but the Javadoc doesn't seem to mention it explicitly.
I have just found this question because I was asking myself the same thing. According to this article by Oracle researcher David Dice, the answer seems to be no. Here's the relevant part of the article:
If a thread is blocked in park() we're guaranteed that a subsequent
unpark() will make it ready. A perfectly legal but low-quality
implementation of park() and unpark() would be empty methods, in which
the program degenerates to simple spinning. An in fact that's the
litmus test for correct park()-unpark() usage.
Empty park() and unpark() methods do not give you any happens-before relationship guarantees, so for your program to be 100% portable, you should not rely on them.
Then again, the Javadoc of LockSupport says:
These methods are designed to be used as tools for creating
higher-level synchronization utilities, and are not in themselves
useful for most concurrency control applications. The park method is
designed for use only in constructions of the form:
while (!canProceed()) { ... LockSupport.park(this); }
Since you have to explicitly check some condition anyway, which will either involve volatile or properly synchronized variables, the weak guarantees of park() should not actually be problem, right?
If it isn't documented as such then you CANNOT rely on it creating a happens before relationship.
Specifically LockSupport.java in Hotspot code simply calls Unsafe.park and .unpark!
The happens-before relationship will generally come from a write-read pair on a volatile status flag or something similar.
Remember, if it isn't documented as creating a happens-before relationship then you must treat it as though it does not even if you can prove that it does on your specific system. Future systems and implementations may not. They left themselves that freedom for good reason.
I have looked though the JDK code and it looks like LockSupport methods are normally called outside of synchronization blocks. So, your assumption seems to be correct.