Java 6 API question. Does calling LockSupport.unpark(thread) have a happens-before relationship to the return from LockSupport.park in the just-unparked thread? I strongly suspect the answer is yes, but the Javadoc doesn't seem to mention it explicitly.
I have just found this question because I was asking myself the same thing. According to this article by Oracle researcher David Dice, the answer seems to be no. Here's the relevant part of the article:
If a thread is blocked in park() we're guaranteed that a subsequent
unpark() will make it ready. A perfectly legal but low-quality
implementation of park() and unpark() would be empty methods, in which
the program degenerates to simple spinning. An in fact that's the
litmus test for correct park()-unpark() usage.
Empty park() and unpark() methods do not give you any happens-before relationship guarantees, so for your program to be 100% portable, you should not rely on them.
Then again, the Javadoc of LockSupport says:
These methods are designed to be used as tools for creating
higher-level synchronization utilities, and are not in themselves
useful for most concurrency control applications. The park method is
designed for use only in constructions of the form:
while (!canProceed()) { ... LockSupport.park(this); }
Since you have to explicitly check some condition anyway, which will either involve volatile or properly synchronized variables, the weak guarantees of park() should not actually be problem, right?
If it isn't documented as such then you CANNOT rely on it creating a happens before relationship.
Specifically LockSupport.java in Hotspot code simply calls Unsafe.park and .unpark!
The happens-before relationship will generally come from a write-read pair on a volatile status flag or something similar.
Remember, if it isn't documented as creating a happens-before relationship then you must treat it as though it does not even if you can prove that it does on your specific system. Future systems and implementations may not. They left themselves that freedom for good reason.
I have looked though the JDK code and it looks like LockSupport methods are normally called outside of synchronization blocks. So, your assumption seems to be correct.
Related
Imagine a situation where multiple processes try to use a shared resource.
You can protect it by using a java monitor ( for example - synchronized methods).
But what if your classes must obey to that protocol?
request method - critical section - end method
Any process is the only one executing the request and end methods simultaneously, thanks to the synchronized blocks, but what about the core of the critical section?
Using other constructs like Semaphores or Lock/Condition you can make it easily, but with native monitor you are bonded to the fact that a synchronization is identified by a block that cannot cross multiple methods.
If you use a boolean that tells you whether the resource is busy (calling wait() right after) or not, deadlock can occurr!
So, what could be a good solution for this?
Imagine a situation where...
There's a name for that, it's long transaction, and if you think you need to implement it, that's a sign that it may be time to re-think your design.
Why it's bad, and how to avoid it is a book-level topic.
Here's one book that covers it pretty well:
https://www.amazon.com/Patterns-Enterprise-Application-Architecture-Martin/dp/0321127420
From this post: http://www.javamex.com/tutorials/synchronization_volatile_typical_use.shtml
public class StoppableTask extends Thread {
private volatile boolean pleaseStop;
public void run() {
while (!pleaseStop) {
// do some stuff...
}
}
public void tellMeToStop() {
pleaseStop = true;
}
}
If the variable were not declared volatile (and without other
synchronization), then it would be legal for the thread running the
loop to cache the value of the variable at the start of the loop and
never read it again.
In Java 5 or later:
is the last paragraph correct?
So, exactly at what moment can a thread cache the value of the pleaseStop variable (and for how long)? just before calling one of StoppableTask's functions (run, tellMeTpStop) of the object? (and the thread must update the variable when exiting the function at the latest?)
can you point me to a documentation/tutorial reference about this (Java 5 or later)?
Update: here it is my compilation of answers posted on this question:
Without using volatile nor synchronized, there are actually two problems with the above program:
1- Threads can cache the variable pleaseStop since the very first moment that the thread starts and don't update it never again. so, the loop would keep going forever. this can be solved by either using volatile or synchronized. This thread cache mechanism does not exist in C.
2- The java compiler can optimise the code, and replace while(!pleaseStop) {...} to if (!pleaseStop) { while (true) {...}}. so, the loop would keep going forever. again, this can be solved by either using volatile or synchronized. This compiler optimisation exists also in C.
Some more info:
https://www.ibm.com/developerworks/library/j-5things15/
When can it cache?
As for your question about "when can it cache" the value, the answer to that is "always". To understand what that means, read on. Processors have storage called caches, which make it possible for the running thread to access values in memory by reading from the cache rather than from memory. The running thread can also write to this cache as if it were writing the value to memory. Thus, so long as the thread is running, it could be using the cache to store the data it's using. Something has to explicitly happen to flush the value from the cache to memory. For a single-threaded process, this is all well and dandy, but if you have another thread, it might be trying to read the data from memory while the other thread is plugging away reading and writing it to the processor cache without flushing to memory.
How long can it cache?
As for the "for how long" part- the answer is unfortunately forever unless you do something about it. Synchronizing on the data in question is one way to force a flush from the cache so that all threads see the updates to the value. For more detail about ways to cause a flush, see the next section.
Where's some Documentation?
As for the "where's the documentation" question, a good place to start is here. For specifically how you can force a flush, java refers to this by discussing whether one action (such as a data write) appears to "happen before" another (like a data read). For more about this, see here.
What about volatile?
volatile in essence prevents the type of processor caching described above. This ensures that all writes to a variable are visible from other threads. To learn more, the tutorial you linked to in your post seems like a good start.
The relevant documentation is on the volatile keyword (Java Language Specification, Chapter 8.3.1.4) here and the Java memory model (Java Language Specification, Chapter 17.4) here
Declaring the parameter volatile ensures that there is some synchronization of actions by other threads that might change its value. Without declaring volatile, Java can reorder operations taken on a parameter by different threads.
As the Spec says (see 8.3.1.4), for parameters declared volatile,"accesses ... occur exactly as many times, and in exactly the same order, as they appear to occur during execution of the program text by each thread..."
So the caching you speak of can happen anytime if the parameter is not volatile. But there is enforcement of consistent access to that parameter by the Java memory model if the parameter is declared volatile. But no such enforcement would take place if not (unless the threads are synchronized).
The official documentation is in section 17 of the Java Language Specification, especially 17.4 Memory Model.
The correct viewpoint is to start by assuming multi-threaded code won't work, and try to force it to work whether it likes it or not. Without the volatile declaration, or similar, there would be nothing forcing the read of pleaseStop to ever see the write if it happens in another thread.
I agree with the Java Concurrency in Practice recommendation. It is a good distillation of the implications of the JLS material for practical Java programming.
At one of presentations about Spring/Hibernate transactions I brought up an opinion that synchronized keyword on a method and #Transactional logically have many similarities. Sure enough they are totally different beasts but yet they both applied as aspects to the method and both control access to some resources via some kind of shared monitor (record in db, for example).
There were couple of people in the crowd who immediately opposed and claimed that my comparison is fatally wrong. I don't remember specific arguments but I can see some point here as well. For example, synchronized works for the entire method from the beginning and transaction will only have effect when statement to access DB is reached. Plus synchronized does not offer any read/write locking pattern.
So the question is, is my comparison totally wrong and I should never ever use it, or, with proper wording, would it make sense to present it to experienced engineers who know well how synchronized works but yet trying to learn about AOP transactions? What this wording should be?
A bit of update.
Apparently my question sounded like comparing DB transactions vs entering synchronized method in Java. That's not the case. My idea is more about comparing similarities in semantics of #Transactional and synchronized.
One of the reason I brought it up also was to illustrate propagation behavior. For example, if #Transactional is PROPAGATION_REQUIRED it will have many similarities to entering synchronized block. For transaction: if transaction is present we just continue using it and if not, we will create one. For synchronized, if we already have monitor we proceed with it and if not we will attempt to acquire it. Of course for #Transactional we are not going to lock on method boundary.
If we look at #Transactional as denoting a method that locks a database resource (because it is used in the transaction) - then the comparison makes some sense.
However this is all they have in common. synchronized is defined on an object monitor (and protects only it), which is known at the time of usage of the keyword, while a transaction may lock multiple resources (that are not known when the transaction starts), or may not lock any resources at all (optimistic locking, read-only transactions).
So ultimately - don't use that comparison, there are a lot more things that they differ in than they have in common.
The concepts embodied in #Transactional annotation are much more complex than those embodied in synchronized keyword. I agree with JB Nizet's comment that your comparison is counter-intuitive and would confuse your audience.
With Java synchronization, you always know exactly what is being locked, from which point in the code and to which point. You have built-in the concept of threads and a queue of threads competing for the same resource. Also, you are in effect locking code, not locking data. It may seem like a nuance, but the difference could be substantial.
With #Transactional, first you have the issue of transactions demarcation. You don't know exactly when a transaction begins, since you might reach this method after already having opened a transaction. For the same reason, you don't know if the transaction will end when you exit the method.
Secondly, transaction isolation semantics are much more complex then just synchronization (read-only, read-write, etc.). Many times isolation answers a concern about data integrity, and not intrinsically a concern about queuing access to a resource. Sometimes just one record is locked, sometimes a whole table (again, this is data, not code). Further more, transactions can be rolled-back, a concept that is important for data integrity and doesn't exist with synchronized.
Is it possible to write a class such that other programmers cannot acquire a lock on an instance of the class?
Lock-abuse, if there's a term like that, can be a serious killer. unfortunately, programmers torn between the disastrous forces of delivering thread-safe code and limited knowledge of concurrency, can wreak havoc by adopting the approach of locking instances even when they're invoking operations which really don't require the instance's resources to be blocked
The only way to do this is to ensure that the classes instances are not visible. For example, you could declare is as a private nested class, and make sure that the enclosing class does not leak reference instances.
Basically, if something can get hold of a reference to an instance, there is nothing to stop it from locking it.
Normally, it is sufficient to ensure that the reference to the lock object doesn't leak ... and not worry about the class visibility.
FOLLOW UP - in response to the OP's comments.
You cannot stop someone else's code from taking a lock an instance of one of your classes. But you can design your class so that this won't interfere with your classes internal synchronization. Simply arrange that your class uses a private object (even an Object instance) for synchronizing.
In the more general sense, you cannot stop application programmers from using your classes in ways that you don't like. Other examples I've heard of (here) include trying force people to override methods or provide particular constructors. Even declaring your class fields private won't stop a determined (or desperate) programmer from using reflection to get at them.
But the flip-side is that those things you are trying to prevent might actually not be stupid after all. For example, there could actually be a sound reason for an application to use your class as a lock object, notwithstanding your objection that it reduces concurrency. (And it in general it does, It is just that this may not be relevant in the particular case.)
My general feeling is that is a good idea to document the way your class is designed to be used, and design the API to encourage this. But it is unwise to try to force the issue. Ultimately it is the responsibility of the people who code against your classes to use them sensibly ... not yours.
If a class has members that require protection from concurrent access, locking should be done internally. Otherwise, you're forcing those who use it to understand the details of its implementation when they shouldn't be able to see past its interface.
When creating a new instance, also create a new thread which immediately synchronizes on the instance and goes to sleep (with Thread.sleep()). Any code trying to synchronize on the instance will just deadlock, thus the developer has to rethink his approach.
Disclaimer:
Don't vote me done because my suggestion is insane. I know it is. I am just answering the question. Do not actually do this!!!
I was reading some of the concurrency patterns in Brian Goetze's Java Concurrency in Practice and got confused over when is the right time to make the code thread safe.
I normally write code that's meant to run in a single thread so I do not worry too much about thread safety and synchronization etc. However, there always exists a possibility that the same code may be re-used sometime later in a multi-threaded environment.
So my question is, when should one start thinking about thread safety? Should I assume the worst at the onset and always write thread-safe code from the beginning or should I revisit the code and modify for thread safety if such a need arises later ?
Are there some concurrency patterns/anti-patterns that I must always be aware of even while writing single-threaded applications so that my code doesn't break if it's later used in a multi-threaded environment ?
You should think about thread safety when your code will be used in a multithreaded environment. There is no point in tackling the complexity if it will only be run in a singlethreaded environment.
That being said, there are simple things you can do that are good practices anyway and will help with multithreading:
As Josh Bloch says, Favor Immutability. Immutable classes are threadsafe almost by definition;
Only use data members or static variables where required rather than a convenience.
Making your code thread safe can be as simple as adding a comment that says the class was not designed for concurrent use by multiple threads. So, in that sense: yes, all of your classes should be thread safe.
However, in practice, many, many types are likely to be used by only a single thread, often only referenced as local variables. This can be true even if the program as a whole is multi-threaded. It would be a mistake to make every object safe for multi-threaded access. While the penalty may be small, it is pervasive, and can add up to be a significant, hard-to-fix performance problem.
I advise you to obtain a copy of "Effective Java", 2nd Ed. by Joshua Bloch. That book devotes a whole chapter to concurrency, including a solid exploration of the issue of when (and when not) to synchronize. Note, for example, the title of item 67 in "Effective Java": 'Avoid excessive synchronization', which is elaborated over five pages.
As was stated previously, you need thread safety when you think your code will be used in a multithreaded environment.
Consider the approach taken by the Collections classes, where you provide a thread-unsafe class that does all its work without using synchronize, and you also provide another class that wraps the unsynchonized class and providing all of the same public methods but making them synchronize on the underlying object.
This gives your clients a choice of using the multi-threaded or the single-threaded version of your code. It may also simplify your coding by isolating all of the threading/locking logic in a separate class.