Imagine a situation where multiple processes try to use a shared resource.
You can protect it by using a java monitor ( for example - synchronized methods).
But what if your classes must obey to that protocol?
request method - critical section - end method
Any process is the only one executing the request and end methods simultaneously, thanks to the synchronized blocks, but what about the core of the critical section?
Using other constructs like Semaphores or Lock/Condition you can make it easily, but with native monitor you are bonded to the fact that a synchronization is identified by a block that cannot cross multiple methods.
If you use a boolean that tells you whether the resource is busy (calling wait() right after) or not, deadlock can occurr!
So, what could be a good solution for this?
Imagine a situation where...
There's a name for that, it's long transaction, and if you think you need to implement it, that's a sign that it may be time to re-think your design.
Why it's bad, and how to avoid it is a book-level topic.
Here's one book that covers it pretty well:
https://www.amazon.com/Patterns-Enterprise-Application-Architecture-Martin/dp/0321127420
Related
In our Netty application. We are moving all blocking calls in our code to run in a special backgroundThreadGroup.
I'd like to be able to log in production the threadName and the lineNumber of the java code that is about to execute a blocking operation. (i.e. sync File and Network IO)
That way I can grep for the logs looking at places were we might have missed to move our blocking code to the backgroundThreadGroup.
Is there a way to instrument the JVM so that it can tell me that?
Depends on what you mean by a "blocking operation".
In a broad sense, any operation that causes a voluntary context switch is blocking. Trying to do something special about them is absolutely impractical.
For example, in Java, any method containing synchronized is potentially blocking. This includes
ConcurrentHashMap.put
SecureRandom.nextInt
System.getProperty
and many more. I don't think you really want to avoid calling all these methods that look normal at a first glance.
Even simple methods without any synchronization primitives can be blocking. E.g., ByteBuffer.get may result in a page fault and a blocking read on the OS level. Furthermore, as mentioned in comments, there are JVM level blocking operations that are not under your control.
In short, it's impractical if not impossible to find all places in the code where a blocking operation happens.
If, however, you are interested in finding particular method calls that you believe are bad (like Thread.sleep and Socket.read), you can definitely do so. There is a BlockHound project specifically for this purpose. It already has a predefined list of "bad" methods, but can be customized with your own list.
There is a library called BlockHound, that will throw an exception unless you have configured BlockHound to ignore that specific blocking call
This is how you configure BlockHound for Netty: https://github.com/violetagg/netty/blob/625f9d5781ed85bfaca6fa4e826d0d46d70fdbd8/common/src/main/java/io/netty/util/internal/Hidden.java
(You can improve the above code by replacing the last line with builder.nonBlockingThreadPredicate(
p -> p.or(thread -> thread instanceof FastThreadLocalThread)); )
see https://github.com/reactor/BlockHound
see https://blog.frankel.ch/blockhound-how-it-works/
I personally used it to find all blocking call within our Netty based service.
Good Luck
I've read about Semaphore class there and now what I'd like to understand is how can I use that class in a real code? What is the usefulness of the Semaphores? I got the thing that we could use semaphores in order to improve performance, reducing a concurrency for a resource. Is it the main usage of the Semaphore?
tl;dr Answer: Semaphores let you limit access to some code path to a certain number of threads - without controlling the mechanism handling those threads. A sample use case is a webservice that offers some resource-intensive task - using a Semaphore, you can limit that task to i.e. 5 threads while using the app server's larger thread pool handle both this and some other types of requests.
Long answer: See Furkan Omay's comment.
Semaphores are a part of the concurrency package in java. So as the package says, it is used to leverage the flow of concurrent access to the code. Unlike 'Synchronized' and 'Lock' in java using semaphores you can control the access of your code to a certain number of users.
Consider it as a bouncer in the pub who allows people whether they can enter or not. If the pub is full and cant take any more people he stops until someone leaves. Semaphores can be used to do something like this to your code!!
Hope it helps!!
Due to the deprecated nature of the Thread stop() and suspend() methods, I have become accustomed to implementing a standard cooperative suspension method using the well tested wait/notify methodology. Unfortunately my current project includes an initialisation thread that duplicates a recursive directory structure via a single call to an external method that doesn't return until it has finished and does not implement any kind of wait/notify cooperation.
I'm curious to know what other programmers are tempted to do in this situation, save perhaps reimplementing the external method, as I'm quite tempted to use the Thread.suspend() method and hope the file operations contained within the external method don't hold on to anything critical whilst suspended.
Hmmm...this is a tricky one.
Well do not even try stop() or suspend(). They were deprecated and there are reasons for rightly so. Ideally you shouldn't even be trying wait or notify when you have so many excellent libraries available in java.util.concurrent package.
In your case, you should check the documentation of the external method you are calling to know about the shutdown policy of that library. If none is mentioned then you can probably try interrupting. interrupt will surely work if the external method call makes some blocking calls. Other than it, I see no other way.
Using suspend will only lead to instability rather than aiding anything. Not using it will take more computational power but will be stable atleast.
Here's my thinking:
Even though a HTTP request cycle is essentially handled by a 'single thread', each time a HTTP request is processed for that same session it is likely to be processed by a different thread from the thread pool.
Without the volatile keyword being used on a domain model object, whose lifecycle extends across multiple HTTP requests for the same session, then, according to my understanding, isn't it possible that the attribute could be thread local cached (an optimization by the compiler) in the thread that serviced the first HTTP request? If the second HTTP request is serviced by another thread then that second thread may not see the changes in that attribute that were made by the first thread.
Does this spell "Danger Will Robinson"? Or am I missing a vital plot point about the use (or not) of the volatile keyword?
I think you are forgetting that the threads handling the HTTP request first need to retrieve the instance of the domain model object from the HttpSession provided by your application server. The thread handling request 2 in the scenario you describe does not already have an instance of this domain model - it has to retrieve it from the session implementation at the start of handling each and every request.
I think it is completely reasonable to assume that the session-handling implementation in your application server is handling session data in such a way that memory model visibility issues are avoided. Apache Tomcat's default (non-clustered) HttpSession implementation, for example, stores the session attributes in a ConcurrentHashMap.
Adding volatile seems completely unnecessary to me. I have never seen this done for domain model objects handled by HTTP requests in a Servlet environment in any project I have worked in.
This would be a different story if thread-1 and thread-2 had references to the same object instance simulatenously while processing two different requests, and you were concerned about changes in one thread being visible to the other as each are processing the request, but this does not sound like what you are asking about.
Yes, if you are sharing an object between different threads, you may have race conditions. Without a happens before relationship, writes made by one thread may not be seen by a read in another thread.
Doing a volatile write in one thread and doing a volatile read of the same field in another thread establishes a happens before relationship between the two threads, and ensures visibility of the write.
This is a complicated problem, simply using a volatile keyword is probably not a good solution.
I think your understanding of it is correct. Given your description I would say it should be used. If its something more than a primitive type I would rather synchronize.
Good information on volatile:
http://www.javamex.com/tutorials/synchronization_volatile_when.shtml
If you have a mutable object in session, that is trouble. But usually the solution is not to guard individual fields; rather the entire object should be swapped.
Say you have the user object in the session. Most requests simply retrieve it, read it and display it.
There is a request that can modify user information. It would be a really bad idea to retrieve the user object, modify it. It's better to create complete new user object, and insert it into session.
In that case, fields in User don't need any protection; thread safety is guaranteed by session setAttribute() - getAttribute()
If you have concurrency issues, just adding 'volatile' probably won't help you.
As for keeping the object as an attribute of Session, I'd recommend you to keep just the object's ID, and use it to retrieve a 'live' instance when you need it (if you use Hibernate, successive retrieves will return the same object, so this shouldn't cause performance problems). Encapsulate all modification logic to this specific object into a single façade, and do the control concurrency there, using dababase locking.
Or, if you really, really, really want to use memory-based locking, and are really sure that you'll never have two instances of the application running in a cluster, make sure that your façade logic is synchronized at the right level. If your synchronization is too fine grained (low-level operations, such as volatile variables), it probably won't be enough to make your code thread-safe. For example, java.util.Hashtable is fully synchronized, but it doesn't mean anything if you have logic like this:
01 if (!hashtable.containsKey(key)) {
02 hashtable.put(key, calculate(key));
03 }
If two threads, say, t1 and t2, hit this block at the same time, t1 may execute line 01, then t2 may also execute 01, and then 02, and t1 then will execute 02, overwriting what t2 had done. The operations containsKey() and put() are atomic individually, but what should be atomic is the whole block.
Sometimes recalculating a value doesn't matter, but sometimes it does, and it will break.
When it comes to concurrency, there's no magic. I mean, seam some crappy frameworks try to sell you the idea that they solve this problem for you. They don't. Even if it works 99% of the time, it will break spectacularly when you go to production and start to get heavy traffic. Or (much, much) worse, it will silently generate wrong results.
Concurrency is one of the most complex problems in programming. And the only way to handle it is to avoid it. All this functional programming trend is not about dealing with concurrency, is about avoiding it altogether.
It turns out that volatile was not needed in the end. The problem that "appeared" to be fixed with volatile was actually a very subtle timing sensitive bug that was fixed in a much more elegant and proper way ;)
So sbrigdes was correct when he said "simply using a volatile keyword is probably not a good solution."
Java 6 API question. Does calling LockSupport.unpark(thread) have a happens-before relationship to the return from LockSupport.park in the just-unparked thread? I strongly suspect the answer is yes, but the Javadoc doesn't seem to mention it explicitly.
I have just found this question because I was asking myself the same thing. According to this article by Oracle researcher David Dice, the answer seems to be no. Here's the relevant part of the article:
If a thread is blocked in park() we're guaranteed that a subsequent
unpark() will make it ready. A perfectly legal but low-quality
implementation of park() and unpark() would be empty methods, in which
the program degenerates to simple spinning. An in fact that's the
litmus test for correct park()-unpark() usage.
Empty park() and unpark() methods do not give you any happens-before relationship guarantees, so for your program to be 100% portable, you should not rely on them.
Then again, the Javadoc of LockSupport says:
These methods are designed to be used as tools for creating
higher-level synchronization utilities, and are not in themselves
useful for most concurrency control applications. The park method is
designed for use only in constructions of the form:
while (!canProceed()) { ... LockSupport.park(this); }
Since you have to explicitly check some condition anyway, which will either involve volatile or properly synchronized variables, the weak guarantees of park() should not actually be problem, right?
If it isn't documented as such then you CANNOT rely on it creating a happens before relationship.
Specifically LockSupport.java in Hotspot code simply calls Unsafe.park and .unpark!
The happens-before relationship will generally come from a write-read pair on a volatile status flag or something similar.
Remember, if it isn't documented as creating a happens-before relationship then you must treat it as though it does not even if you can prove that it does on your specific system. Future systems and implementations may not. They left themselves that freedom for good reason.
I have looked though the JDK code and it looks like LockSupport methods are normally called outside of synchronization blocks. So, your assumption seems to be correct.